Recognizing named entity is a specific kind of chunk extraction that uses entity tags along with chunk tags.
Common entity tags include PERSON, LOCATION and ORGANIZATION. POS tagged sentences are parsed into chunk trees with normal chunking but the trees labels can be entity tags in place of chunk phrase tags. NLTK has already a pre-trained named entity chunker which can be used using
ne_chunk() method in the nltk.chunk module. This method chunks a single sentence into a Tree.
Code #1 : Using ne-chunk() on tagged sentence of the treebank_chunk corpus
Tree('S', [Tree('PERSON', [('Pierre', 'NNP')]), Tree('ORGANIZATION', [('Vinken', 'NNP')]), (', ', ', '), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')])
two entity tags are found: PERSON and ORGANIZATION. Each of these subtrees contains a list of the words that are recognized as a PERSON or ORGANIZATION.
Code #2 : Method to extract named entites using leaves of all the subtrees
Code #3 : using method to get all the PERSON or ORGANIZATION leaves from a tree
Named entities of PERSON : [[('Pierre', 'NNP')]] Named entites of ORGANIZATION : [[('Vinken', 'NNP')]]
To process multiple sentences at a time,
chunk_ne_sents() is used. In the code below, first 10 sentences from
treebank_chunk.tagged_sents() are processed to get ORGANIZATION
Code #4 : Let’s understand
[[[('Vinken', 'NNP')]], [[('Elsevier', 'NNP')]], [[('Consolidated', 'NNP'), ('Gold', 'NNP'), ('Fields', 'NNP')]], , , [[('Inc.', 'NNP')], [('Micronite', 'NN')]], [[('New', 'NNP'), ('England', 'NNP'), ('Journal', 'NNP')]], [[('Lorillard', 'NNP')]], , ]
- Create Your own Intents and Entities in Dialogflow Chatbot
- NLP | Named Entity Chunker Training
- Python | Named Entity Recognition (NER) using spaCy
- Python - Extracting Key from Value Substring
- Python - Extracting Kth Key in Dictionary
- Extracting MAC address using Python
- Using CountVectorizer to Extracting Features from Text
- Python | Pandas Extracting rows using .loc
- Python - Extracting keys not in values
- Python | Extracting rows using Pandas .iloc
- Python - Extracting Priority Elements in Tuple List
- Extracting email addresses using regular expressions in Python
- How to Learn Python in 21 Days?
- Visualising ML DataSet Through Seaborn Plots and Matplotlib
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.