Project Idea | Text Summarizer

Project Title: Text Summarizer
Introduction:
Today we know that machines have become smarter than us and can help us with every aspect of life, the technologies have reached to an extent where they can do all the tasks of human beings like household tasks, controlling home devices, making appointments etc. The field which makes these things happen is Machine Learning. Machine Learning train the machines with some data which makes it capable of acting when tested by the similar type of data. The machines have become capable of understanding human languages using Natural Language Processing. Today researches are being done in the field of text analytics.
As the project title suggests, Text Summarizer is a web-based application which helps in summarizing the text. We can upload our data and this application gives us the summary of that data in as many numbers of lines as we want. The product is mainly a text summarizing using Deep Learning concepts. The main purpose is to provide reliable summaries of web pages or uploaded files depends on the user’s choice. The unnecessary sentences will be discarded to obtain the most important sentences.

The product includes the following components:
Text Parser: It will divide the texts into paragraphs, sentences and words.
• HTML Parser: For extracting texts from URLs of web pages HTML parser library is used. HTML parsing is taking in HTML code and, extracting relevant information, like the title of the page, paragraphs in the page, headings in the page, links, bold text etc.
• Document Parser: This library is used to extract text from documents. Using the document parser interface, document parsers can access the content type that is assigned to a document and store the content type in the document itself. In addition, document parsers can update the content type definition that is stored in a document so that it matches the version of the content type definition that is used by a list or document library.

Feature Vector Creator: This component will calculate and get the feature representations of sentences.

AutoEncoder: The root part of the Deep Learning. Autoencoder offers a compressed representation of a given sentence.

NLTK: Nltk is natural language toolkit library. It is a platform for building Python programs to work with human languages. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum. In text summarizer, this library is used to remove stop words in English vocabulary and to convert these words to root forms.

LSM Summariser: This library is used to create a summary of the extracted text.

Classifier: The classifier determines if a sentence is a summary sentence or not.

Text Class: Text class is the most complex class of the system. It has paragraphs, sentences, and words. For dividing the text into these parts, text class should have parser methods. Also, there is a number of sentences and the number of paragraphs attributes in this class. These attributes are necessary for calculating sentence features.

Paragraph Class: Paragraph class is intermediary class of the system. In paragraph object, some necessary calculations are made for sentence features such as the number of the sentence in paragraph and rank of a paragraph in the text. It also has own parser to divide the paragraph into sentences.

Sentence Class: Sentence class is the most important class of the system. Sentence object has methods to calculate feature values of itself with the information it takes from the text, paragraph, and word classes. It has a float list called “features”. “features” list has feature values of the sentence. The system combines “features” lists of the sentence objects of the text and makes a features matrix with them. Autoencoder and Classifier components ¬mentioned¬ uses this features matrix. Sentence class also has own parser to divide the sentence into words.

Word Class: Word class is the most basic class of the system. Using NLP APIs, we can get word’s root, stem and suffix parts, and type of the word such as verb or noun. Also using Word2Vec API, the cosine distance between two words can be calculated. These attributes are used for calculating a sentence’s feature values.

Features:
Home page: The home page simply displays all the contents available on application.

Services: It tells services provided by the application. The services include documents summarization, web page summarization and secured interactions. The summarized data is mailed to the email of the user through which he/she has signed up.

Portfolio: It gives some instances of the text summarization of different types of data.

Demo: It provides a platform to get summary without creating an account. It asks your text and line count that is the number of lines of summary you want.

Login and Sign Up: It helps you create an account on the Text Summarizer web application so that you can get an email of your results.

Tools Used:
• The backend for the framework has been written in Django framework for Python3 using Pycharm IDE.
• The frontend is managed by CSS and Bootstrap.

Applications:

  1. People need to learn much from texts. But they tend to want to spend less time while doing this.
  2. It aims to solve this problem by supplying them the summaries of the text from which they want to gain information.
  3. Goals of this project are that these summaries will be as important as possible in the aspect of the texts’ intention.
  4. The user will be eligible to select the summary length.
  5. Supplying the user, a smooth and clear interface.
  6. Configuring a fast replying server system.

Team Members:

  1. Rohan Piplani
  2. Meenal Gaba

Note: This project idea is contributed for ProGeek Cup 2.0- A project competition by GeeksforGeeks.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.