Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Can you explain to readers the importance of language data for training SMT and hybrid systems?

April 26, 2017Data explain hybrid importance Language readers smt SYSTEMS TRAINING

0

Posted

Can you explain to readers the importance of language data for training SMT and hybrid systems?

1 Answer

0

Posted

The so-called “blue ocean” opportunity for MT – to explosively grow the size of the translation market by making it possible to translate material that has not been translated before, will require relevant training data examples to learn from. And the quantities of data needed are necessary for every language pair direction and domain. Where there are abundant sources of bilingual data, this has enabled development of high quality MT systems for languages and subject areas that are already frequently translated. However, in translation for information gathering, often users want to translate less commonly taught languages, and less standard text types (blogs, eBay offerings). Finding even a million words of translated data for these is currently almost impossible. Commercial users often want to introduce MT to translate material that they have not been able to translate before – for which they also do not have training data. There’s the impression that there has been a disconnect betwe