Readability Index in Python(NLP)
Readability is the ease with which a reader can understand a written text. In natural language, the readability of text depends on its content (the complexity of its vocabulary and syntax). It focuses on the words we choose, and how we put them into sentences and paragraphs for the readers to comprehend.
Our main objective in writing is to pass along information that both the writer and the reader think is worthwhile. If we fail to convey that information, our efforts are wasted. In order to engage the reader, it’s critical to present information to them that they’ll gladly keep reading and be able to understand clearly. So, it is required that the content be easy enough to read and understand this is as readable as possible. There are various available Difficulty Scales with their own difficulty determining formulae.
This article illustrates various traditional readability formulae available for readability score evaluation. In Natural Language Processing, sometimes it is required to analyze words and sentences to determine the difficulty of the text. Readability Scores are generally grade levels on particular scales, which rates the text as to whats the difficulty of that particular text. It assists the writer in improving the text to make it understandable for a larger audience, thus making content engaging.
Various available Readability Score Determination Methods/Formulae:
- The Dale–Chall formula
- The Gunning fog formula
- Fry readability graph
- McLaughlin’s SMOG formula
- The FORECAST formula
- Readability and newspaper readership
- Flesch Scores
Read about more available Readability Formulae from here.
The implementation of the readability formulae is shown below.
The Dale Chall Formula:
To apply the formula:
Select several 100-word samples throughout the text.
Compute the average sentence length in words (divide the number of words by the number of sentences).
Compute the percentage of words NOT on the Dale–Chall word list of 3, 000 easy words.
Compute this equation
Raw score = 0.1579*(PDW) + 0.0496*(ASL) + 3.6365 Here, PDW = Percentage of difficult words not on the Dale–Chall word list. ASL = Average sentence length
The Gunning fog Formula
Grade level= 0.4 * ( (average sentence length) + (percentage of Hard Words) ) Here, Hard Words = words with more than two syllables.
SMOG grading = 3 + √(polysyllable count). Here, polysyllable count = number of words of more than two syllables in a sample of 30 sentences.
Reading Ease score = 206.835 - (1.015 × ASL) - (84.6 × ASW) Here, ASL = average sentence length (number of words divided by number of sentences) ASW = average word length in syllables (number of syllables divided by number of words)
Advantages of Readability Formulae:
1. Readability formulas measure the grade-level readers must have to be to read a given text. Thus provides the writer of the text with much-needed information to reach his target audience.
2. Know beforehand if the target audience can understand your content.
4. A readable text attracts more audience.
Disadvantages of Readability Formulae:
1. Due to many readability formulas, there is an increasing chance of getting wide variations in results of a same text.
2. Applies Mathematics to Literature which isn’t always a good idea.
3. Cannot measure the complexity of a word or phrase to pinpoint where you need to correct it.