The third step in natural language processing takes development into the realm of semantics. If the word has the same tagging and syntactical function, it’s still possible that the word has a variety of different possible meanings. This is best illustrated with a simple example:
Chop the carrots on the board
She’s the chairman of the board
Now somebody with a good enough understanding of the English language would be able to recognize straight away that the first example refers to a chopping board and the second to a board of directors or similar. But this isn’t so easy for a computer. In fact, it’s actually rather difficult for a computer to learn the necessary knowledge required to decipher when the noun ‘board’ refers to a board used to chop vegetables and other food on, and when it refers to a collection of people charged with making important decisions (or any of the other many uses of the noun ‘board’ for that matter).
As a result, computers mostly attempt to define a word by using the words that appear before and after it. This means that a computer can learn that if the word ‘board’ is preceded or followed by the word ‘carrots’, then it probably refers to a chopping board, and if the word ‘board’ is preceded or followed by the word ‘chairman’, that it most likely refers to a board of directors. This learning process succeeds with the help of text corpora that show every possible meaning of the given word reproduced correctly through many different examples.
All in all, natural language processing remains a complicated subject matter: computers have to process a huge amount of data on individual cases to get to grips with the language, and the introduction of words with a double meaning quickly increases the chance of a computer misinterpreting a given sentence. There’s lots of room for improvement, most notably in the area of pragmatics, because this concerns the context that surrounds a sentence. Most sentences conform to a context that requires a general understanding of the human world and human emotions, which can be difficult to teach a computer. Some of the greatest challenges lie in attempts to understand moods like irony, sarcasm, and humorous metaphors – although attempts have already been made to try and classify these.