The named entity recognition (NER) is one of the most popular data preprocessing task. It involves the identification of key information in the text and classification into a set of predefined categories. An entity is basically the thing that is consistently talked about or refer to in the text.
NER is the form of NLP.
At its core, NLP is just a two-step process, below are the two steps that are involved:
- Detecting the entities from the text
- Classifying them into different categories
Some of the categories that are the most important architecture in NER such that:
- Place/ location
Other common tasks include classifying of the following:
- Numeral measurement (money, percent, weight, etc)
- E-mail address
Ambiguity in NE
- For a person, the category definition is intuitively quite clear, but for computers, there is some ambiguity in classification. Let’s look at some ambiguous example:
- England (Organisation) won the 2019 world cup vs The 2019 world cup happened in England(Location).
- Washington(Location) is the capital of the US vs The first president of the US was Washington(Person).
Methods of NER
- One way is to train the model for multi-class classification using different machine learning algorithms, but it requires a lot of labelling. In addition to labelling the model also requires a deep understanding of context to deal with the ambiguity of the sentences. This makes it a challenging task for a simple machine learning algorithm.
- Another way is that Conditional random field that is implemented by both NLP Speech Tagger and NLTK. It is a probabilistic model that can be used to model sequential data such as words. The CRF can capture a deep understanding of the context of the sentence. In this model, the input
- Deep Learning Based NER: deep learning NER is much more accurate than previous method, as it is capable to assemble words. This is due to the fact that it used a method called word embedding, that is capable of understanding the semantic and syntactic relationship between various words. It is also able to learn analyzes topic-specific as well as high level words automatically. This makes deep learning NER applicable for performing multiple tasks. Deep learning can do most of the repetitive work itself, hence researchers for example can use their time more efficiently.
- In this implementation, we will perform Named Entity Recognition using two different frameworks: Spacy and NLTK. This code can be run on colab, however for visualization purpose. I recommend the local environment. We can install the following frameworks using pip install
- First, we performed Named Entity recognition using Spacy.
! pip install spacy
! pip install nltk
m spacy download en_core_web_sm
an interpreted, high
purpose programming language
"Pythons design philosophy emphasizes code readability with"
"its notable use of significant indentation."
"Its language constructs and object-oriented approach aim to"
"help programmers write clear and"
"logical code for small and large-scale projects"
[(e.text, e.start_char, e.end_char, e.label_)
[Python is an interpreted, high-level and general-purpose programming language.,
Pythons design philosophy emphasizes code readability with its notable use of significant indentation.,
Its language constructs and object-oriented approachaim to help programmers write clear, logical code for small and large-scale projects]
# named entity
[('Python', 0, 6, 'ORG')]
#here ORG stands for Organization
Spacy entity tags on doc2
- Below is a list and their meaning of spacy entity tags:
Spacy Named Entity recognition Tags
- Now we performed the named entity recognition task on NLTK.
A sentence Example of NER
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses
are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!