Machine Translation is a quickly growing field of Natural Language Processing (NLP) that automatically translates a text from one language to another without human involvement.
In recent years, the use of machine translation systems has increased dramatically. Today machine translation is used not only by translation agencies but also by people using online machine translation engines, such as Google Translate or DeepL. So, with the everyday use of machine translation, the question arises about how good these translations are.
Unfortunately, translation quality still varies significantly not only across different machine translation systems but also across translations produced by the same system. Modern machine translation systems usually generate fluent translations, but some of these translations can miss crucial details or completely misrepresent the original sentence. Thus, we need to evaluate each translation of each system to make sure that the translation does not distort the meaning of the original sentence.
In the case of translation agencies, professional translators edit the outputs of machine translation systems. However, in some scenarios, for example, online machine translation systems, it is not possible to measure translation quality with human editors. That is why quality estimation (automatic measurement of translation quality) is a crucial part of the machine translation pipeline.
An essential feature of quality estimation metrics is that they do not require any reference translations to assess the quality of machine-translation output. That makes them valuable for evaluating translation quality at run-time.
In this thesis, we considered the distribution of attention—one of the internal parameters of modern neural machine translation systems—as an indicator of translation quality. Before the advent of the attention mechanism, neural machine translation systems did not cope with the translation of long sentences well, “forgetting” the beginning of the original sentence. It happened because all information about the whole source sentence was captured into one vector, which led to the fact that the beginning of the source sentence carried less information than its end. The attention mechanism allowed a machine translation system to overcome this problem by assigning higher attention weights to a particular part of the source sentence in the overall representation.
One of the main advantages of using attention distribution (attention weights) as a quality estimation metric is that they are a by-product of modern neural machine translation systems. That means we do not need additional resources to create quality estimation models.
Over the series of experiments, we explored how well these attention weights can indicate translation quality. Firstly, we studied the behaviour of attention weights in supervised settings. Getting labels for supervised experiments is a time-consuming and relatively costly task, that is why we examined unsupervised scenarios as well as using synthetic labelled data for training supervised models.