Interviews & Articles
Being There: Live Versus Recorded Audio Description – What is the difference?
Multilingual Radio in Australia: How a "Rare Beast" Celebrates Diversity
"Once a gamer, always a gamer" - Training Video Game Translators
Making Music TV Accessible to the Hard-of-Hearing Audience
Multilingual Subtitling – The Machine Translation Revolution
Multilingual Subtitling – The Machine Translation Revolution

In countries where foreign-language films and television series are routinely subtitled rather than dubbed, there is considerable demand for efficiently produced subtitle translations. One possibility is using Machine Translation systems, computer systems that automatically translate from one language into another. The last two decades have seen a revolution in building Machine Translation systems. A revolutionary approach advocated by Martin Volk, Professor of Computational Linguistics at the University of Zurich and at Stockholm University is Statistical Machine Translation (SMT). This approach allows for the fully automatic construction of Machine Translation systems which draw on large amounts of human-translated text. Google Translate is the most prominent example. At Languages & The Media, Martin Volk will show why SMT is well suited to the translation of subtitles, how it helps to increase translator productivity and reduce subtitle translation costs.
By Martin Volk
The new statistical approach automatically “learns” translation correspondences from examples of human translation. For instance, given a large collection of English texts plus their German translations, the computer system will first establish cross-language correspondences between English and German sentences. Subsequently, the system computes word correspondences. These word alignments are the basis for creating word sequences from parallel sentences. All this is done in the preparation phase of the Machine Translation system. During the actual translation phase, the sentence input is also segmented, and the corresponding target language segments are reassembled and, if necessary, reordered into possible translations. Finally, a statistical target language model helps in the ranking of these translation alternatives in order to determine the best translation.
Machine Translation works with varying degrees of success on different text types. Statistical Machine Translation appears to be well-suited to the translation of subtitles: they are relatively short textual units, can be aligned easily because of the time codes and they are surprisingly repetitive. In a collection of 1 million English TV subtitles, we found that 10% of the subtitles occurred more than once.
This raises the question as to whether Google Translate could be used to translate subtitles. Google Translate is attractive, since it is a free web-based service offering a large number of languages. For this reason and because of its speed and ease of use it has arguably become the world’s best known translation system in recent years. But Google Translate remains a general-purpose Machine Translation system that cannot match the quality of special-purpose translation systems for subtitles.
Special-purpose Machine Translation systems based on high-quality human-translated subtitles produce a higher Machine Translation quality than general Machine Translation systems. Experience tells us that 1 million subtitles (about 10 million words) of translated text is an ideal starting point for building a Statistical Machine Translation system. If only smaller amounts are available, this can be partially compensated if other resources, e.g. other parallel corpora or bilingual word lists, are available.
We have built special-purpose translation systems for Scandinavian languages (Swedish, Norwegian and Danish), and also for English to Swedish translation. We have found that these systems can increase productivity by as much as 25% for subtitle translators. We are aware that post-editing Machine Translation output changes the translators’ working conditions considerably. Checking and correcting machine output is often seen as restricting the translator’s creativity and freedom. On the other hand, the machine frees the translator from repetitive tasks such as translating the same subtitle over again. For instance, in 1 million subtitles we found “Are you okay?” 102 times, “What are you saying?” 39 times and “It wasn’t your fault” 10 times.
There is a trend towards commercial providers combining translation memory systems with Machine Translation systems. When the translation of the complete unit cannot be found in the memory and the fuzzy matching score drops below a certain threshold, then Machine Translation takes over. A number of initiatives (like tausdata.org) and companies offer web-based translation memory services. This facilitates the exchange of translation memories and access to large collections. However, issues of quality control must be resolved and, ultimately, trust needs to be established over time.
In times of pressure on subtitling prices, the integration of Machine Translation is an opportunity to increase translator productivity or – from a management perspective – to reduce subtitle translation costs. Systems that reuse and reassemble large amounts of human-translated subtitles can be built quickly and be profitably integrated into the subtitle translation workflow.
September 6th, 2010
Martin Volk will present his approach on Friday, October 8th from 11:45 to 13:15.
References
Koehn, Philipp. 2010. Statistical Machine Translation. Cambridge University Press.



