Artificial Intelligence, defined as intelligence exhibited by machines, has many applications in today’s society. One of its application, most widely used is natural language generation.
Natural Language Generation (NLG) simply means producing text from computer data. It acts as a translator and converts the computerized data into natural language representation. In this, a conclusion or text is generated on the basis of collected data and input provided by the user. It is the natural language processing task of generating natural language from a machine representation system. Natural Language Generation in a way acts contrary to Natural language understanding. In natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, whereas in Natural Language Generation the system needs to make decisions about how to put a concept into words.
The process to generate text can be as simple as keeping a list of readymade text that is copied and pasted. Consequences can either be satisfactory in simple applications such as horoscope machines or generators of personalized business letters. But in a sophisticated NLG system, it is required to include stages of planning and merging of information generates text that looks natural and does not become repetitive.
Example of a simple NLG system is the Pollen Forecast for Scotland system that could essentially be a template. NLG system takes as input six numbers, which predicts the pollen levels in different parts of Scotland. From these numbers, a short textual summary of pollen levels is generated by the system as its output.
For example, using the historical data for 1-July-2005, the software produces Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country. However, pollen levels will be moderate with values of 4, in Northern areas. In contrast, the actual forecast, which was written by a human meteorologist, from this data was Pollen counts are expected to remain high at level 6 over most of Scotland, and even level 7 in the south-east. The only relief is in the Northern Isles and far northeast of mainland Scotland with medium levels of pollen count.
The typical stages of natural language generation are:
- Content determination: Deciding the main content to be represented in a sentence or the information to mention in the text. For instance, in the pollen example above, deciding whether to explicitly mention that pollen level is 7 in the south-east.
- Document structuring: Deciding the structure or organization of the conveyed information. For example, deciding to describe the areas with high pollen levels first, instead of the areas with low pollen levels.
- Aggregation: Putting of similar sentences together to improve understanding and readability. For instance, merging the two sentences Grass pollen levels for Friday have increased from the moderate to high levels of yesterday and Grass pollen levels will be around 6 to 7 across most parts of the country into the single sentence Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country.
- Lexical choice: Using appropriate words that convey the meaning clearly. For example, deciding whether medium or moderate should be used when describing a pollen level of 4.
- Referring expression generation: Creating such referral expressions that help in identification of a particular object and region. For example, deciding to use in the Northern Isles and far northeast of mainland Scotland to refer to a certain region in Scotland. This task also includes making decisions about pronouns and other types of anaphora.
- Realisation: Creating and optimizing the text that should be correct as per the rules of grammar. For example, using will be for the future tense of to be.
There are three basic techniques for evaluating NLG systems:
- Task-based evaluation: It includes human-based evaluation, who assess how well it helps him perform a task. For example, a system which generates summaries of medical data can be evaluated by giving these summaries to doctors and assessing whether the summaries help doctors make better decisions.
- Human ratings: It assess the generated text on the basis of ratings given by a person on the quality and usefulness of the text.
- Metrics: It compares generated texts to texts written by professionals.
An example of an interactive use of natural language generation is the WYSIWYM framework, which stands for “What you see is what you meant ” It allows users to see and manipulate the continuously rendered view (NLG output) of an underlying formal language document (NLG input), thereby editing the formal language without learning it.
Another example includes Content generation systems that assist human writers and makes the writing process more efficient and effective. A content generation tool based on web mining using search engines APIs has been built. The tool imitates the cut-and-paste writing scenario where a writer forms its content from various search results.
So far, the most successful NLG applications have been Data-to-Text systems, which generate textual summaries of databases and data sets; these systems usually perform data analysis as well as text generation. In particular, several systems have been built that produce textual weather forecasts from weather data.
Reference : Wikipedia