Named Entity Recognition (NER) is a standard NLP problem which involves spotting named entities (people, places, organizations etc.) from a chunk of text, and classifying them into a predefined set of categories. Some of the practical applications of NER include:
- Scanning news articles for the people, organizations and locations reported.
- Providing concise features for search optimization: instead of searching the entire content, one may simply search for the major entities involved.
- Quickly retrieving geographical locations talked about in Twitter posts.
NER with spaCy
spaCy is regarded as the fastest NLP framework in Python, with single optimized functions for each of the NLP tasks it implements. Being easy to learn and use, one can easily perform simple tasks using a few lines of code.
pip install spacy python -m spacy download en_core_web_sm
Code for NER using spaCy.
Apple 0 5 ORG U.K. 27 31 GPE $1 billion 44 54 MONEY
In the output, the first column specifies the entity, the next two columns the start and end characters within the sentence/document, and the final column specifies the category.
Further, it is interesting to note that spaCy’s NER model uses capitalization as one of the cues to identify named entities. The same example, when tested with a slight modification, produces a different result.
U.K. 27 31 GPE $1 billion 44 54 MONEY
The word “apple” no longer shows as a named entity. Therefore, it is important to use NER before the usual normalization or stemming preprocessing steps.
One can also use their own examples to train and modify spaCy’s in-built NER model. There are several ways to do this. The following code shows a simple way to feed in new instances and update the model.
By adding a sufficient number of examples in the doc_list, one can produce a customized NER using spaCy.
spaCy supports the following entity types:
PERSON, NORP (nationalities, religious and political groups), FAC (buildings, airports etc.), ORG (organizations), GPE (countries, cities etc.), LOC (mountain ranges, water bodies etc.), PRODUCT (products), EVENT (event names), WORK_OF_ART (books, song titles), LAW (legal document titles), LANGUAGE (named languages), DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL and CARDINAL.
- NLP | Named Entity Chunker Training
- Python | Word Similarity using spaCy
- Python | PoS Tagging and Lemmatization using spaCy
- HTML Cleaning and Entity Conversion | Python
- Python | Reading contents of PDF using OCR (Optical Character Recognition)
- Python | Speech recognition on large audio files
- Google Chrome Dino Bot using Image Recognition | Python
- Speech Recognition in Python using Google Speech API
- NLP | Extracting Named Entities
- Pattern Recognition | Introduction
- Pattern Recognition | Basics and Design Principles
- ML | Implement Face recognition using k-NN with scikit-learn
- Python | Index of Non-Zero elements in Python list
- Python | Convert list to Python array
- Python | Merge Python key values to list
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.