Can you explain to readers the importance of language data for training SMT and hybrid systems?
The so-called “blue ocean” opportunity for MT – to explosively grow the size of the translation market by making it possible to translate material that has not been translated before, will require relevant training data examples to learn from. And the quantities of data needed are necessary for every language pair direction and domain. Where there are abundant sources of bilingual data, this has enabled development of high quality MT systems for languages and subject areas that are already frequently translated. However, in translation for information gathering, often users want to translate less commonly taught languages, and less standard text types (blogs, eBay offerings). Finding even a million words of translated data for these is currently almost impossible. Commercial users often want to introduce MT to translate material that they have not been able to translate before – for which they also do not have training data. There’s the impression that there has been a disconnect betwe
Related Questions
- Everyone knows the importance of accurate, high quality health data, but what systems, packages or organisations are in place to assess quality?
- Can the challenge participants use external data sources (in addition to the challenge training data) in developing their systems?
- Can you please explain the implementation of query language and OLAP technology in data warehouse and data mining?