Summarization is a useful tool for varied textual applications that aims to highlight important information within a large corpus. With the outburst of information on the web, Python provides some handy tools to help summarize a text. This article provides an overview of the two major categories of approaches followed – extractive and abstractive. In this article, we shall look at a working example of extractive summarization.
Below is the algorithm implemented in the
gensim library, called “TextRank”, which is based on PageRank algorithm for ranking search results.
- Pre-process the given text. This includes stop words removal, punctuation removal and stemming.
- Make a graph with sentences are the vertices.
- The graph has edges denoting the similarity between the two sentences at the vertices.
- Run PageRank algorithm on this weighted graph.
- Pick the highest scoring vertices and append them to the summary.
- Based on the ratio or the word count, the number of vertices to be picked is decided.
Code : Summarizes a Wikipedia article based on (a) ratio and (b) word count.
Percent summary Amitabh Bachchan (pronounced [?m??ta?b? ?b?t???n]; born Inquilaab Srivastava; 11 October 1942) is an Indian film actor, film producer, television host, occasional playback singer and former politician. He first gained popularity in the early 1970s for films such as Zanjeer, Deewaar and Sholay, and was dubbed India's "angry young man" for his on-screen roles in Bollywood. . . . Apart from National Film Awards, Filmfare Awards and other competitive awards which Bachchan won for his performances throughout the years, he has been awarded several honours for his achievements in the Indian film industry.
Word count summary Beyond the Indian subcontinent, he also has a large overseas following in markets including Africa (such as South Africa), the Middle East (especially Egypt), United Kingdom, Russia and parts of the United States. Bachchan has won numerous accolades in his career, including four National Film Awards as Best Actor and many awards at international film festivals and award ceremonies. . . . After a three year stint in politics from 1984 to 1987, Bachchan returned to films in 1988, playing the title role in Shahenshah, which was a box office success.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
- Convert Text and Text File to PDF using Python
- Python: Convert Speech to text and text to Speech
- Formatted text in Linux Terminal using Python
- Tokenize text using NLTK in python
- Convert Text to Speech in Python using win32com.client
- Python | Tokenize text using TextBlob
- Python | Count occurrences of each word in given text file (Using dictionary)
- Python | Text to Speech by using pyttsx3
- Extract numbers from a text file and add them using Python
- Text detection using Python
- Search String in Text using Python-Tkinter
- Extract text from PDF File using Python
- Python - Substituting patterns in text using regex
- Python - Tokenize text using Enchant
- Python - Filtering text using Enchant
- Python - Chunking text using Enchant
- Python - SpongeBob Mocking Text Generator GUI using Tkinter
- Python - UwU text convertor GUI using Tkinter
- Python - English (Latin) to Hindi (Devanagiri) text convertor GUI using Tkinter
- Text to speech GUI convertor using Tkinter in Python