Open In App

Named Entity Recognition

Named Entity Recognition (NER) is a technique in natural language processing (NLP) that focuses on identifying and classifying entities. The purpose of NER is to automatically extract structured information from unstructured text, enabling machines to understand and categorize entities in a meaningful manner for various applications like text summarization, building knowledge graphs, question answering, and knowledge graph construction. The article explores the fundamentals, methods and implementation of the NER model.

What is Named Entity Recognition (NER)?

Name-entity recognition (NER) is also referred to as entity identification, entity chunking, and entity extraction. NER is the component of information extraction that aims to identify and categorize named entities within unstructured text. NER involves the identification of key information in the text and classification into a set of predefined categories. An entity is the thing that is consistently talked about or refer to in the text, such as person names, organizations, locations, time expressions, quantities, percentages and more predefined categories.



NER system fin applications across various domains, including question answering, information retrieval and machine translation. NER plays an important role in enhancing the precision of other NLP tasks like part-of-speech tagging and parsing. At its core, NLP is just a two-step process, below are the two steps that are involved:

Ambiguity in NER

How Named Entity Recognition (NER) works?

The working of Named Entity Recognition is discussed below:



Named Entity Recognition (NER) Methods

Lexicon Based Method

The NER uses a dictionary with a list of words or terms. The process involves checking if any of these words are present in a given text. However, this approach isn’t commonly used because it requires constant updating and careful maintenance of the dictionary to stay accurate and effective.

Rule Based Method

The Rule Based NER method uses a set of predefined rules guides the extraction of information. These rules are based on patterns and context. Pattern-based rules focus on the structure and form of words, looking at their morphological patterns. On the other hand, context-based rules consider the surrounding words or the context in which a word appears within the text document. This combination of pattern-based and context-based rules enhances the precision of information extraction in Named Entity Recognition (NER).

Machine Learning-Based Method

Multi-Class Classification with Machine Learning Algorithms

Conditional Random Field (CRF)

Deep Learning Based Method

How to Implement NER in Python?

For implementing NER system, we will leverage Spacy library. The code can be run on colab, however for visualization purpose. I recommend the local environment. We can install the required libraries using:

!pip install spacy 
!pip install nltk 
! python -m spacy download en_core_web_sm

Install Important Libraries

import pandas as pd
import spacy
import requests
from bs4 import BeautifulSoup
nlp = spacy.load("en_core_web_sm")
pd.set_option("display.max_rows", 200)

                    

NER using Spacy

In the following code, we use SpaCy, a natural language processing library to process text and extract named entities. The code iterates through the named entities identified in the processed document and printing each entity’s text, start character, end character and label.

content = "Trinamool Congress leader Mahua Moitra has moved the Supreme Court against her expulsion from the Lok Sabha over the cash-for-query allegations against her. Moitra was ousted from the Parliament last week after the Ethics Committee of the Lok Sabha found her guilty of jeopardising national security by sharing her parliamentary portal's login credentials with businessman Darshan Hiranandani."
 
doc = nlp(content)
 
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

                    

Output:

Congress 10 18 ORG
Mahua Moitra 26 38 PERSON
the Supreme Court 49 66 ORG
the Lok Sabha 94 107 PERSON
Moitra 157 163 ORG
Parliament 184 194 ORG
last week 195 204 DATE
the Ethics Committee 211 231 ORG
Darshan Hiranandani 373 392 PERSON

The output displayed the names of the entities, their start and end positions in the text, and their predicted labels.

Visualize

The displacy.render function from spaCy is used to visualize the named entities in a text. It generates a visual representation with colored highlights indicating the recognized entities and their respective categories.

from spacy import displacy
displacy.render(doc, style="ent")

                    

Output:

Using the following code, we will create a dataframe from the named entities extracted by spaCy, including the text, type (label), and lemma of each entity.

entities = [(ent.text, ent.label_, ent.lemma_) for ent in doc.ents]
df = pd.DataFrame(entities, columns=['text', 'type', 'lemma'])
print(df)

                    

Output:

text    type                 lemma
0              Congress     ORG              Congress
1          Mahua Moitra  PERSON          Mahua Moitra
2     the Supreme Court     ORG     the Supreme Court
3         the Lok Sabha  PERSON         the Lok Sabha
4                Moitra     ORG                Moitra
5            Parliament     ORG            Parliament
6             last week    DATE             last week
7  the Ethics Committee     ORG  the Ethics Committee
8   Darshan Hiranandani  PERSON   Darshan Hiranandani

The data frame provides a structured representation of the named entities, their types and lemmatized forms.

Frequently Asked Questions (FAQs)

1. What is the purpose of NER system?

The purpose of NER is to automatically extract the structed information from unstructured text, enabling machines to understand and categorize entities in a meaning manner for various applications like text summarization, building knowledge graphs, question answering and knowledge graph construction.

2. What are methods of NER in NLP?

Methods of NER in NLP include:

  • Lexicon based NER.
  • Rules Based
  • ML Based
  • Deep learning Based.

3. What are the uses of NER in NLP?

NER plays an important role in enhancing the precision of other NLP tasks like part-of-speech tagging and parsing.

4. Can BERT do named entity recognition?

Yes, BERT can be used for NER.


Article Tags :