Readability is the ease with which a reader can understand a written text. In natural language, the readability of text depends on its content (the complexity of its vocabulary and syntax). It focuses on the words we choose, and how we put them into sentences and paragraphs for the readers to comprehend.
Our main objective in writing is to pass along information that both the writer and the reader think is worthwhile. If we fail to convey that information, our efforts are wasted. In order to engage the reader, it’s critical to present information to them that they’ll gladly keep reading and be able to understand clearly. So, it is required that the content be easy enough to read and understand thus is as readable as possible.There are various available Difficulty Scales with their own difficulty determining formulae.
This article illustrates various traditional readability formulae available for readability score evaluation. In Natural Language Processing, sometimes it is required to analyse words and sentences to determine the difficulty of the text. Readability Scores are generally grade levels on particular scales, which rates the text as to whats the difficulty of that particular text. It assists the writer in improving the text to make it understandable for a larger audience, thus making content engaging.
Various available Readabilty Score Determination Methods/Formaulae:-
1) The Dale–Chall formula
2) The Gunning fog formula
3) Fry readability graph
4) McLaughlin’s SMOG formula
5) The FORCAST formula
6) Readability and newspaper readership
7) Flesch Scores
Read about more available Readability Formulae from here.
The implementation of the readability formulae is shown below.
The Dale Chall Formula
To apply the formula:
Select several 100-word samples throughout the text.
Compute the average sentence length in words (divide the number of words by the number of sentences).
Compute the percentage of words NOT on the Dale–Chall word list of 3, 000 easy words.
Compute this equation
Raw score = 0.1579*(PDW) + 0.0496*(ASL) + 3.6365 Here, PDW = Percentage of difficult words not on the Dale–Chall word list. ASL = Average sentence length
The Gunning fog Formula
Grade level= 0.4 * ( (average sentence length) + (percentage of Hard Words) ) Here, Hard Words = words with more than two syllables.
SMOG grading = 3 + √(polysyllable count). Here, polysyllable count = number of words of more than two syllables in a sample of 30 sentences.
Reading Ease score = 206.835 - (1.015 × ASL) - (84.6 × ASW) Here, ASL = average sentence length (number of words divided by number of sentences) ASW = average word length in syllables (number of syllables divided by number of words)
Advantages of Readability Formulae:
1. Readability formulas measure the grade-level readers must have to be to read a given text. Thus provides the writer of the text with much needed information to reach his target audience.
2. Know Before hand if the target audience can understand your content.
4. A readable text attracts more audience.
Disadvantages of Readability Formulae:
1. Due to many readability formulas, there is an increasing chance of getting wide variations in results of a same text.
2. Applies Mathematics to Literature which isn’t always a good idea.
3. Cannot measure the complexity of a word or phrase to pinpoint where you need to correct it.
- Dunn index and DB index - Cluster Validity indices | Set 1
- Python | Pandas Index.insert()
- Calculating Wind Chill Factor(WCF) or Wind Chill Index(WCI) in Python
- Program to Calculate Body Mass Index (BMI)
- Python String index() and its applications
- Find last index of a character in a string
- Python | Pandas Index.get_slice_bound()
- Python | Print list after removing element at given index
- Python list | index()
- Python | Pandas Index.delete()
- Python | Extract similar index elements
- Python | Pandas Index.drop_duplicates()
- Python | Pandas Series.str.index()
- Python | Pandas Index.dropna()
- Queries for the minimum element in an array excluding the given index range
- Python | Pandas Index.contains()
- Python | Pandas Series.nonzero() to get Index of all non zero values in a series
- numpy.index() in Python
- Python | Accessing index and value in list
- Python | Pandas Index.difference()
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : Satheesh Kumar Mohan