Sifting out the meaning
Page 2: Navigating the word labyrinth
Most translation programs draw on parallel collections of texts in the relevant languages. Computational linguists have laboriously constructed endless lists of rules to “explain” to the computer how it show go about converting the source text into the target language. “To eliminate the errors that slipped in, some rules had to be modified or supplemented by new ones,” Fraser explains. Needless to say, this was extremely time-consuming. Fraser‘s approach, on the other hand, is based on a statistical model. Every possible/permissible equivalent is assigned a truth value, which quantifies the likelihood that it is correct. In the first instance, every conceivable variant is permissible, but not equally likely, because it occurs only rarely, or not at all, in the corpus of stored texts.
Every succeeding word in a sentence opens up new possibilities, while at the same time ruling out more and more of the nonsensical variants, and when the next full stop is reached, the program has calculated which of the possible equivalents is most likely to convey the original meaning. The process can be compared to searching for the way out of a labyrinth. “The algorithm progressively searches for the best available solution,” says Fraser. This is also what distinguishes it from purely ruled-based systems, which enforce either/or decisions and imply the existence of a single correct solution. “We, on the other hand, do not strive for perfection,” Fraser says. Translation always involves choices based many variables, and perfection is unattainable – unless one possesses quasi-supernatural powers.”
Training on official translations
All machine translations are pieced together with the aid of a database made up of existing versions translated from the corresponding source and target languages, which the program uses to guide its choice of words. The quality of the resulting machine translation is therefore strongly dependent on the linguistic richness of the comparative material available for the relevant area of knowledge. “For computational linguists, the standard translations of texts issued by the European Union, with its diverse official languages are a godsend,” says Fraser. “As a rule, SMT can only translate into another language text blocks that it already ‘knows’, i.e. that are represented in its database.” Official translations issued by States that have more than one official language – such as Switzerland and Canada – also provide useful training material, as do those prepared by international organizations like the UN.
Fraser is now trying to extend the usefulness of these applications in the context of two ongoing projects. His ERC project focuses more on fundamental issues, insofar as its objective to develop innovative and widely applicable tools. In contrast, “Health in my Language (HimL)” tackles a very practical problem. HimL is a collaborative project that involves the CIS, the Scottish Health Service NSH24 and the Cochrane Institute, an international network of researchers and physicians devoted to promoting the use of evidence-based decision-making in healthcare systems. “Our new approach makes it possible to produce translations even in cases where few parallel texts are available,” Fraser explains. And medicine is one of the areas in which the corpus of texts available for diverse language pairs is comparatively modest. But the primary goal of this project is simply to help non-natives who are ill to find the right physician to consult.
Although the quality of computerized translations can undoubtedly be improved, Fraser does not believe that this will lead to the demise of the profession of translator. “But programs will take over routine translation jobs,” he says, and such tools can be used by professional translators and will increase their productivity. Translations of multivalent literary texts will nevertheless remain the province of the well-educated human translator.
By Hubert Filser / Translation: Paul Hardy
Dr. Alexander Fraser heads a research group in automated translation at LMU‘s Center for Information and Language Processing (CIS). In 2015, he was awarded a highly endowed Starting Grant by the European Research Council (ERC).