Analysis required in Natural Language Generation (NLG) and Understanding (NLU)
Language is the method to share and communicate our understanding and knowledge with one another. Language plays an essential factor when it comes to sharing our knowledge, ideas, and vision. Hence, if we can discover a computational approach of language, we can develop a very sturdy means of communication. We adopt various techniques to completely utilize the knowledge and merge these with language and grammatical facts and come up with a computational language system. While understanding the language we come across various flaws in the language which we try to curb but sometimes these flaws are indeed what makes the language much more dynamic and influential. Language can be spoken as well as written. Therefore there can be two ways to process it. However, the processing of spoken form of language can be much more demanding as we can come across various challenges, like the presence of disturbance in the speech and also the way or the accent by which the speech comes out. Hence processing of written statement is comparatively much simple. To derive information from the written language text we go over lexical, syntactic and semantic analysis techniques. This process to derive and understand the written language is described as written language processing. Natural language processing comprises of understanding and generation, it also deciphers in different languages. Hence this makes it an essential aspect for understanding it.
Communication can be in any form. It can be written or spoken. To have a complete end to end two-way communication both the end objects must have similar knowledge of their communication language. The common language used is processed into knowledge. The processing of spoken language can be difficult and challenging than processing the written form of language. This happens because there can be a countless number of factors that have to be taken into account to process this language. Hence processing the written part of language there arises the need for additional information which is used to treat uncertainty and vagueness that can arise in the language. Written language processing is called Natural language processing (NLP). For Natural language processing, it is easier as it takes into account the lexical, syntactic and semantic knowledge of the language. During the processing of the language we come across different difficulties, but many times these difficulties represent the downside of the language which makes it sturdy and powerful.
The difficulties that we come across are-
- Difficulty– The language only provides a limited explanation of the information.
For example, some guys are eating.
Some guys are eating a sandwich.
Jack and Harley are eating sandwiches.
Plus Point– Language allows us to accurate and inaccurate at the same time. We can convey only the information that we require.
- Difficulty– The language does not explain the circumstances giving rise to uncertainty.
For example, I am playing a game. (A board game)
I am playing a game. (An outdoor game)
Plus point– Unlimited information can be conveyed using a finite symbol.
- Difficulty– One can never complete the mastery of language as new words are always defined and it is never-ending.
For example- The two show a unique relation of bromance.
Plus point– The language can be ever-evolving and jargons can always be added as we see fit.
These are the difficulties faced when it comes to language and how the weakness can be turned into a strength.
Likewise, Natural language processing helps us learn and gives a better understanding of the language. It also helps in translation from one language to another. The process of understanding the language includes aligning the input into a more useful form of data that can convert raw facts into information that can be used to strengthen knowledge. Understanding the language requires a representation of the situation. But because of the wide variety of situations, it is very hard to understand a situation that can correctly fit in the situation. Hence to develop a computer program that could process the Natural language, we first have to define the elemental task and target representation.
Yet still, it may appear like an effortless domain to align the map sentences for understanding the meaning, but this is not completely right. There still exist various challenges that we confronted when processing this. The primary challenge is uncertainty and the presence of ambiguity in the data. When we communicate in English, the sentence may not lead to that thing. For example consider a sentence, after lifting heavy weights, Ram gets bent out of shape. This doesn’t mean that Ram got his posture angled, this is an expression to represent that ram got upset. Also, there is another kind of uncertainty that may appear when we talk about the words that have several meanings. Like, Ram went to the bank. Here Bank can be where the money is stored or it could be the River Bank as well. One more kind of uncertainty might arrive due to affixes. For example, Ram had many friends. It was his friend’s birthday party. The first friend refers to a plural noun and the second refers to denote a third person. Hence all these challenges require a sturdy and efficient system for processing. Each of these issues is required to be focussed so that the computer can correctly process and work with the Natural language.
The process of Natural Language understanding comprises of five analytical phases. These Phases are:
- Morphological analysis
- Syntactic analysis
- Semantic analysis
- Discourse integration
- Pragmatic analysis
All these phases have their own desired boundaries, but these boundaries are not completely simple to comprehend. They occasionally follow a proper sequence, or sometimes all at once. When one process enrols in a sequence, this process may request for assistance to another one. To understand this properly we need to understand what these phases are.
- Morphological Analysis:
While performing the morphological analysis, each particular word is analyzed. Non-word tokens such as punctuation are removed from the words. Hence the remaining words are assigned categories. For instance, Ram’s iPhone cannot convert the video from .mkv to .mp4. In Morphological analysis, word by word the sentence is analyzed.
So here, Ram is a proper noun, Ram’s is assigned as possessive suffix and .mkv and .mp4 is assigned as a file extension.
As shown above, the sentence is analyzed word by word. Each word is assigned a syntactic category. The file extensions are also identified present in the sentence which is behaving as an adjective in the above example. In the above example, the possessive suffix is also identified. This is a very important step as the judgement of prefixes and suffixes will depend on a syntactic category for the word. For example, swims and swim’s are different. One makes it plural, while the other makes it a third-person singular verb. If the prefix or suffix is incorrectly interpreted then the meaning and understanding of the sentence are completely changed. The interpretation assigns a category to the word. Hence, discard the uncertainty from the word.
- Syntactic Analysis:
There are different rules for different languages. Violation of these rules will give a syntax error. Here the sentence is transformed into the structure that represents a correlation between the words. This correlation might violate the rules occasionally. The syntax represents the set of rules that the official language will have to follow. For example, “To the movies, we are going.” Will give a syntax error. The syntactic analysis uses the results given by morphological analysis to develop the description of the sentence. The sentence which is divided into categories given by the morphological process is aligned into a defined structure. This process is called parsing. For example, the cat chases the mouse in the garden, would be represented as:
Here the sentence is broken down according to the categories. Then it is described in a hierarchical structure with nodes as sentence units. These parse trees are parsed while the syntax analysis run and if any error arises the processing stops and it displays syntax error. The parsing can be top-down or bottom-up.
- Top-down: Starts with the first symbol and parse the sentence according to the grammar rules until each of the terminals in the sentence is parsed.
- Bottom-up: Starts with the sentence which is to be parsed and apply all the rules backwards till the first symbol is reached.
- Semantic Analysis:
The semantic analysis looks after the meaning. It allocates the meaning to all the structures built by the syntactic analyzer. Then every syntactic structure and the objects are mapped together into the task domain. If mapping is possible the structure is sent, if not then it is rejected. For example, “hot ice-cream” will give a semantic error. During semantic analysis two main operations are executed:
- First, each separate word will be mapped with appropriate objects in the database. The dictionary meaning of every word will be found. A word might have more than one meaning.
- Secondly, all the meanings of each different word will be integrated to find a proper correlation between the word structures. This process of determining the correct meaning is called lexical disambiguation. It is done by associating each word with the context.
This process defined above can be used to determine the partial meaning of a sentence. However semantic and syntax are two completely contrasting concepts. It might be possible that a syntactically correct sentence is semantically incorrect.
For example, “A rock smelled the colour nine.” It is syntactically correct as it obeys all the rules of English, but is semantically incorrect. The semantic analysis verifies that a sentence is abiding by the rules and creates correct information.
The above example shows the Semantic parsing.
- Disclosure Integration:
While processing a language there can arise one major ambiguity known as referential ambiguity. Referential ambiguity is the ambiguity that can arise when a reference to a word cannot be determined. For example,
Ram won the race.
Mohan ate half of a pizza.
He liked it.
In the above example, “He” can be Ram or Mohan. This creates an ambiguity. The word “He” shows dependency on both sentences. This is known as disclosure integration. It means when an individual sentence relies upon the sentence that comes before it. Like in the above example the third sentence relies upon the sentence before it. Hence the goal of this model is to remove referential ambiguity.
- Pragmatic Analysis:
The pragmatic analysis means handling the situation in a much more practical or realistic manner than using a theoretical approach. As we know that a sentence can have different meanings in various situations. For example, The average is 18.
The average is 18. (average may be of sequence)
The average is 18. (average may be of a vehicle)
The average is 18. (average may be of a mathematical term)
We can see that for the same input there can be different perceptions. To interpret the meaning of the sentence we need to understand the situation. To tackle such problems we use pragmatic analysis. The pragmatic analysis tends to make the understanding of the language much more clear and easy to interpret.
The five phases discussed above for Language processing are required to follow an order. Each phase takes its input from the previous phase’s output and sends it along to the next phase for processing. While this process input can get rejected half-way if it does not follow the rules defining it for the next phase.
Also, More than one phase can start processing together. This may happen due to ambiguity between the phases. For instance, consider the sentence
Is the electric vehicle Tesla car?
The above sentence has four noun phrases at the end which will be required to form noun phrases to give the sentence of the form:
“Is the A B?” where A & B represents the noun phrases we require. While syntax analysis there will be the following choices available:
While performing the syntactic analysis all of these choices look applicable, but to get the correct phrases we require to analyze the semantics. When we apply semantic analysis the only options making sense are “electric vehicle” and “tesla car”. Hence, we can say that these processes are separated but they can communicate in different ways.
Language is a structure which follows different rules. Natural Language processes the written form of language concerning the rules developed. The main focus is to erase ambiguity & uncertainty from the language to make the communication much easier.