Open In App

Statistical Machine Translation of Languages in Artificial Intelligence

Improve
Improve
Like Article
Like
Save
Share
Report

Introduction :

Statistical machine translation (SMT) is a type of machine translation (MT) that uses statistical models to translate text from one language to another. It is a subfield of natural language processing (NLP) that involves analyzing large amounts of bilingual text to build models that can accurately translate between languages.

SMT works by analyzing large bilingual corpora, such as parallel texts or sentence-aligned translation pairs, to identify patterns and relationships between words and phrases in different languages. These patterns are then used to build probabilistic models that can be used to generate translations for new sentences or documents.

One of the key advantages of SMT is its ability to handle a wide range of language pairs and translation tasks, from simple phrase-based translations to more complex neural machine translations. SMT has been used in a wide range of applications, including language localization for software and websites, content translation for multilingual websites, and cross-border communication for businesses and organizations.

Given how difficult translation may be, it should come as no surprise that the most effective machine translation systems are created by training a probabilistic model with statistics acquired from a vast corpus of text. This method does not need a complicated ontology of interlingua ideas, handcrafted source, and destination language grammars, or a hand-labeled treebank. It just requires data in the form of example translations from which a translation model may be learned. We determine the string of words f^{*}       that maximizes 

f^{*}=\underset{f}{\operatorname{argmax}} P(f \mid e)=\operatorname{argmax} P(e \mid f) P(f)

why need Statistical Machine Translation of Languages in Artificial Intelligence ?

There are several reasons why statistical machine translation (SMT) is needed in artificial intelligence (AI):

  1. Efficiency: SMT is faster and more efficient than traditional human translation methods, making it a cost-effective solution for businesses and organizations that need to translate large amounts of text.
  2. Scale: SMT can handle a large volume of translation tasks, making it an ideal solution for global businesses and organizations that need to communicate with customers and stakeholders in multiple languages.
  3. Quality: With advances in machine learning and deep learning algorithms, SMT has become more accurate and reliable, producing translations that are increasingly comparable to those produced by human translators.
  4. Accessibility: SMT can make digital content more accessible to users who speak different languages, improving the user experience and expanding the reach of digital products and services.
  5. Language learning: SMT can be a valuable tool for language learners, helping them to understand the meaning of unfamiliar words and phrases and improving their language skills.

To convert a phrase from English (e) to French (f)

The target language model for French is P(f)      , which indicates how likely a particular sentence is in French. The translation model, P(e \mid f)      , indicates how likely an English sentence is to be translated into a particular French sentence. Similarly, P(f \mid e)       is an English-to-French translation model.
Should we work on P(f \mid e)       directly, or should we use Bayes’ rule and work on P(e|f)P(f)? It is easier to model the domain in the causal direction in diagnostic applications like medicine: P(\text { symptoms } \mid \text { disease })       rather than P(\text { disease } \mid \text { symptoms })      . However, both approaches are equally simple to translate. The researchers used Bayes’ rule in the early work in statistical machine translation, in part because they had a decent language model, P(f)      , and wanted to utilize it, and in part, because they came from a background in voice recognition, which is a diagnostic problem. In this chapter, we follow their path, although we should point out that recent work in statistical machine translation frequently optimizes P(f \mid e)       directly, employing a more complex model that incorporates many of the language model’s properties.

The language model, P(f)      , might address any level(s) on the right-hand side of the figure above, but the simplest and most frequent technique, as we’ve seen before, is to develop an n-gram model from a French corpus. This just catches a partial, local sense of French phrases, but it’s typically enough for a rudimentary translation.

A bilingual corpus

A collection of parallel texts, each with an English/French pair—is used to train the translation model. If we had an endlessly huge corpus, translating a sentence would just be a lookup task: we’d already seen the English sentence in the corpus, so we’d just return the corresponding French sentence. However, our resources are limited, and the majority of the sentences we will be required to translate will be unfamiliar. They will, however, be made up of terms that we have seen previously (even if some phrases are as short as one word). For example, “in this exercise we shall,” “size of the state space,” “as a function of,” and “notes at the conclusion of the chapter” are all prevalent terms in this book. We should be able to break the novel sentence “In this exercise, we will compute the size of the state space as a function of the number of actions.” into phrases, find the corresponding phrases and French phrases in the English corpus and from the French translation respectively, and then reassemble the French phrases into an order that makes sense in French. To put it another way, given an English sentence e, finding a French translation f is a three-step process:

  1. Divide the English sentence into e_{1}, \ldots, e_{n}
  2. 2 Choose a French phrase f_{i}       for each phrase e_{i}      . For the phrasal probability that fi is a translation of e_{i}      , we use the notation P\left(f_{i} \mid e_{i}\right)      .
  3. Pick a combination of the terms f_{1}, \ldots, f_{n}      . This permutation will be specified in a style that appears difficult but is supposed to have a simple probability distribution: We pick a distortion d_{i}       for each f_{i}      , which is the number of words that phrase fi has moved with regard to f_{i-1}      ; positive for travelling to the right, negative for moving to the left, and zero if f_{i}       follows f_{i-1}       directly.

The figure above shows an example of the process explained in the previous article. The line “There is a stinky wumpus sleeping in 2 2″ is broken down into five phrases at the top, e_{1}, \ldots, e_{5}      . Each one is translated into a f_{i}       phrase, which is then permuted into the following order: f_{1}, f_{3}, f_{4}, f_{2}, f_{5}      . The permutation is defined as 

d_{i}=\operatorname{START}\left(f_{i}\right)-\operatorname{END}\left(f_{i-1}\right)-1

where \operatorname{START}\left(f_{i}\right)       is the ordinal number of the first word of phrase f_{i}       in the French sentence and \operatorname{END}\left(f_{i-1}\right)       is the ordinal number of the last word of phrase f_{i-1}      . The figure above shows thatf_{5} \text {, "à } 2 \text { 2," }       comes right after f_{4}      , “qui dort,” hence d_{5}=0      d2 = 1       because phrase f_{2}       has shifted one word to the right of f1. We have d1 = 0       as a special case since f1 begins at position 1 and \operatorname{END}\left(f_{0}\right)       is specified to be 0. (even though f_{0}       does not exist).

We can define the probability distribution for distortion, P      , now that we’ve established the distortion, d_{i}      . Because we have \left|d_{i}\right| \leq n       for phrases of length n      , the whole probability distribution \mathbf{P}\left(d_{i}\right)       only contains 2n + 1       element, significantly fewer numbers to remember than the number of permutations, n!      . That is why the permutation was specified in such a convoluted manner. Of course, this is a fairly rudimentary distortion model. When translating from English to French, it doesn’t state that adjectives are frequently altered to appear after the noun—that fact is reflected in the French language model, P (f)      . The chance of distortion is unaffected by the words in the sentences, relying simply on the integer value di. The probability distribution summarizes the permutation’s volatility; for example, how often is a distortion of P(d = 2)       compared to P(d = 0)      .

Now it’s time to put it all together: The probability that the series of words f with distortions d is a translation of the sequence of phrases e may be defined as P(f, d \mid e)      . We assume that each phrase translation and distortion is independent of the others, and so the expression may be factored as

P(f, d \mid e)=\prod P\left(f_{i} \mid e_{i}\right) P\left(d_{i}\right)

This allows us to calculate the probability P(f, d \mid e)       for a given candidate translation f       and distortion d      . But we can’t merely enumerate sentences to discover the optimal f and d; with about 100 French phrases for every English phrase in the corpus, there are 1005 distinct 5-translations and 5!       reorderings for each of them. We’ll need to look for a decent solution. Finding a nearly-most-probable translation using a local beam search with a heuristic that assesses likelihood has proved to be beneficial.

The only thing left is to figure out the phrasal and distortion probability. We sketch the procedure for further information, read the notes at the conclusion of the chapter

  1. Look for texts that are similar: To begin, compile a bilingual corpus in parallel. A Hansard9, for example, is a record of legislative discourse. Bilingual Hansards are produced in Canada, Hong Kong, and other nations, the European Union publishes official papers in 11 languages, and the United Nations produces multilingual publications. Online, bilingual text is also available; some Web sites publish parallel material with parallel URLs, such as /en/ for the English page and /fr/ for the French page, for example. Hundreds of millions of words of parallel text and billions of words of monolingual material are used to train statistical translation algorithms.
  2. Break down into sentences: Because a sentence is the unit of translation, we’ll need to divide the corpus into sentences. Periods are strong markers of the conclusion of a sentence, yet in the line “Dr. J. R. Smith of Rodeo Dr. paid $29.99 on September 9, 2009,” just the final period finishes the sentence. One method for determining if a period terminates a sentence is to train a model that includes the surrounding words and their parts of speech as features. This method has a 98 percent accuracy rate.
  3. Align sentences: For each sentence in the English version, figure out which sentence(s) in the French version it relates to. In most cases, the following English sentence corresponds to the next French sentence in a 1:1 match, but there are exceptions: one sentence in one language may be broken into a 2:1 match, or the order of two sentences may be changed, resulting in a 2:2 match. Using a version of the Viterbi method, it is feasible to align them (1:1, 1:2, or 2:2, etc.) with accuracy in the 90 percent to 99 percent range just by looking at the sentence lengths (i.e. short sentences should align with short sentences). Even greater alignment can be obtained by employing common landmarks in both languages, such as numbers, dates, proper names, or words with a clear translation from a multilingual dictionary. If the third English and fourth French sentences both include the string “1989,” but the adjacent sentences do not, the sentences should be placed together.
  4. Align phrases within a sentence: A procedure similar to that used for sentence alignment may be utilized to align phrases within a sentence, although iterative improvement is required. We have no way of knowing that “qui dort” aligns with “sleeping” when we begin, but we can arrive at that connection through a process of evidence accumulation. We can observe that “qui dort” and “sleeping” co-occur often in all of the example sentences and that no other phrase other than “qui dort” co-occurs as frequently in other sentences with “sleeping” in the pair of aligned sentences. The phrasal probabilities are determined by a complete phrase alignment across our corpus (after appropriate smoothing).
  5. Define distortion probability: Once we have a phrase alignment, we may define distortion probabilities. Simply count how many times the corpus is distorted for each distance d=0, \pm 1, \pm 2, \ldots      , then smooth it out.
  6. Use EM to improve estimates: To enhance the estimations of P(f \mid e) \text { and } P(d)       values, use expectation–maximization. In the E       phase, we compute the best alignments using current parameter values, then update the estimates in the M       step and iterate the procedure until convergence.

Issues of Statistical Machine Translation of Languages in Artificial Intelligence :

Statistical machine translation (SMT) has some challenges and limitations that can affect the quality and accuracy of translations. Here are some of the key issues:

  1. Data quality and availability: SMT relies on large bilingual corpora to train models, but the quality and availability of these data sets can be an issue, particularly for rare or low-resource languages.
  2. Domain-specific knowledge: SMT models may not perform well in specialized domains or technical areas that require domain-specific knowledge, such as legal or medical translations.
  3. Linguistic complexity: SMT models may struggle to handle complex linguistic structures, such as idiomatic expressions or ambiguous syntax, which can lead to errors in translations.
  4. Accuracy vs fluency: SMT models may prioritize accuracy over fluency, resulting in translations that are grammatically correct but awkward or unnatural.
  5. Bias and cultural differences: SMT models can reflect biases and cultural differences in the training data, resulting in translations that are inaccurate or offensive.
  6. Lack of context: SMT models may struggle to account for context and produce translations that are contextually inappropriate or misleading.
  7. Post-editing requirements: Even with the best SMT models, post-editing by human translators is often required to ensure the quality and accuracy of translations.


Last Updated : 03 May, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads