Wednesday, April 25, 2007

Statistical machine translation recipe

Text-to-text statistical machine translation systems have become easy to rustle up. For instance, Proceedings of the European Parliament have been collected into the Europarl corpus, which provides the same text in 11 languages. Most importantly, a state-of-the-art statistical machine translation system, Moses, is being developed under the LGPL and it handles both training of the system and translating new sentences. These are the two main ingredients that you need to have. At least for now, the Moses system is not optimized, so don't be stingy with memory.

During the final seminar of FENIX, an interactive computing programme by Tekes, I was presenting an online demo of a Finnish-to-Swedish statistical machine translation system based on Europarl and Moses. Our research interests include improving translation quality in an unsupervised manner when translating from or into Finnish as well as other languages with compounding and agglutinative nature.

No comments: