num2words module in Python, which converts number (like 34) to words (like thirty-four). Also, this library has support for multiple languages. In this article, we will see how to convert number to words using
One can easily install
num2words using pip.
pip install num2words
Consider the following two excerpts from different files taken from 20 Newsgroups, a popular NLP database. Pre-processing 20 Newsgroups effectively has remained to be a matter of interest.
In article, Martin Preston writes: Why not use the PD C library for reading/writing TIFF files? It took me a good 20 minutes to start using them in your own app.
ISCIS VIII is the eighth of a series of meetings which have brought together computer scientists and engineers from about twenty countries. This year’s conference will be held in the beautiful Mediterranean resort city of Antalya, in a region rich in natural as well as historical sites.
In the above two excerpts, one can observe that the number ’20’ appears in both numeric and alphabetical forms. Simply following the pre-processing steps, that involve tokenization, lemmatization and so on would not be able to map ’20’ and ‘twenty’ to the same stem, which is of contextual importance. Luckily, we have the in-built library,
num2words which solves this problem in a single line.
Below is the sample usage of the tool.
thirty-six thirty-sixth 36th zero euro, thirty-six cents treinta y seis
Therefore, in the pre-processing step, one could convert ALL numeric values to words for better accuracy in the further stages.
- NLP | How tokenizing text, sentence, words works
- NLP | Filtering Insignificant Words
- Bag of words (BoW) model in NLP
- Python | Word Similarity using spaCy
- Python | Gender Identification by name using NLTK
- Python | PoS Tagging and Lemmatization using spaCy
- Python | Measure similarity between two sentences using cosine similarity
- Processing text using NLP | Basics
- NLP | Chunking using Corpus Reader
- NLP | Customization Using Tagged Corpus Reader
- Translation and Natural Language Processing using Google Cloud
- Natural Language Processing using Polyglot - Introduction
- Python | Bigram formation from given list
- Text Preprocessing in Python | Set - 1
- Text Preprocessing in Python | Set 2
- Python | Character Encoding
- Python - RemoveAccents Module
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.