Theoretical Probability: A Real-Life Example (from Artificial Intelligence)

• People in different jobs, like insurance, software development, public health, and Artificial Intelligence, use theoretical probability for calculations.

• This article is for middle and high school students. It explains how theoretical probability learned in school applies in real life.

• It provides a real example of how Artificial Intelligence specialists use theoretical probability in their work.

• Published: June 16, 2023

• Last update: April 3, 2024

• Grades: 7th and older

Theoretical Probability in Machine Translation

Introduction

Machine translation is a fascinating part of Artificial Intelligence technology, where computers learn to understand and translate words and sentences from one language to another. You’ve likely used machine translation websites like Google Translate or DeepL.

One of the main challenges in machine translation is selecting the right word, as words often have multiple meanings, particularly in fiction. So, how can we reliably translate classics by authors such as Edgar Allan Poe, Mark Twain, William Shakespeare, Jack London, and others into other languages?

Calculating theoretical probability becomes a valuable tool in solving this problem. Let’s explore a specific example.

Calculating Theoretical Probability: Example

Let’s explore how to translate the sentence ‘The dog developed a large bark’ into Spanish, for instance. For us humans, it’s obvious that the word ‘bark’ in this sentence is associated with ‘dog bark’ (not ‘tree bark’). But how could the computer make a difference? Should it be translated into Spanish as ‘ladrido’ (dog bark) or ‘corteza’ (tree bark)?

One method to create a computer algorithm that distinguishes between ‘ladrido’ (dog bark) and ‘corteza’ (tree bark) is to define the theoretical probability of word pairs appearing together. For example, the computer can analyze sentences mentioning both ‘dog’ and ‘bark’ and then decide: if the word ‘bark’ is more frequently associated with ‘dog’ than with ‘tree,’ the computer assigns a higher probability to the word ‘ladrido’ (dog bark).

To illustrate this further, let’s take the story ‘A Yellow Dog’ by Bret Harte, in Spanish and English.

In the original (English) story, the word ‘dog’ appears 22 times, and ‘bark’ is used 3 times. In the Spanish translation, ‘bark’ is translated 2 times as ‘dog bark’ and 1 time as ‘tree bark.’

The word ‘bark’:

Total use: 3 times;
Translated as ‘dog bark’ (‘ladrido’): 2 times;
Translated as ‘tree bark’ (‘corteza’): 1 time.

Therefore, we can calculate the theoretical probability of the translation as ‘dog bark’ as \( \frac{2}{3} \), and the probability of the translation as ‘tree bark’ as \( \frac{1}{3} \):

\[ P(\text{dog bark}) = \frac{2}{3} \]

\[ P(\text{tree bark}) = \frac{1}{3} \]

Based on this probability, the computer can decide that when the words ‘dog’ and ‘bark’ are used together, ‘bark’ most likely means ‘dog bark’ ( \( \frac{2}{3} \) vs. \( \frac{1}{3} \) ).

Pay attention to that ‘most likely’ remark; this is very important. While translation software calculates probability and constantly learns, it still assigns theoretical probability that the proposed translation is the correct one. This theoretical probability may be erroneous, and translation software may still generate a wrong translation. This is why we sometimes see awkward machine translations.

Conclusion

This example illustrates how computer algorithms can use calculated theoretical probability when selecting a word from multiple options. What you’ve just seen is a simple demonstration of how Artificial Intelligence works. Theoretical probability proves valuable in various Artificial Intelligence applications, guiding computer programs in decision-making through probability calculations. Contrary to popular belief, AI doesn’t ‘think’ like humans; instead, it executes tasks based on pre-programmed rules, and calculating theoretical probability is one of these fundamental rules.

References

For creating this article, we gathered information from various scientific publications, including:

Koehn, Philipp, and Kevin Knight. ‘Estimating word translation probabilities from unrelated monolingual corpora using the EM algorithm.’ AAAI/IAAI, 2000.

Additionally, check the books mentioned in the article (A Yellow Dog by Bret Harte):

Video Version

The video version of this blog post provides more examples of calculating theoretical probability based on real books, including those by Jack London. It also explains why Artificial Intelligence algorithms require a vast amount of data for training and showcases why machine translation websites sometimes make amusing mistakes. As always, the video version presents the story in an animated and easy-to-follow manner. Check the preview below or subscribe to gain access to all our full videos.

Want to receive a notification when we publish new article?

Like this article?