Rule Based Approach in NLP

Last Updated : 17 Apr, 2023

Natural Language Processing serves as an interrelationship between human language and computers. It is a subfield of Artificial Intelligence that helps machines process, understand and generate natural language intuitively. Common tasks done by NLP are text and speech processing, language translation, sentiment analysis, etc. The use cases include spam detection, chatbots, text summarization, etc.

There are three types of NLP approaches:

Rule-based Approach – Based on linguistic rules and patterns
Machine Learning Approach – Based on statistical analysis
Neural Network Approach – Based on various artificial, recurrent, and convolutional neural network algorithms

Rule-based approach in NLP

Rule-based approach is one of the oldest NLP methods in which predefined linguistic rules are used to analyze and process textual data. Rule-based approach involves applying a particular set of rules or patterns to capture specific structures, extract information, or perform tasks such as text classification and so on. Some common rule-based techniques include regular expressions and pattern matches.

Steps in Rule-based approach in NLP:

Rule Creation: Based on the desired tasks, domain-specific linguistic rules are created such as grammar rules, syntax patterns, semantic rules or regular expressions.
Rule Application: The predefined rules are applied to the inputted data to capture matched patterns.
Rule Processing: The text data is processed in accordance with the results of the matched rules to extract information, make decisions or other tasks.
Rule refinement: The created rules are iteratively refined by repetitive processing to improve accuracy and performance. Based on previous feedback, the rules are modified and updated when needed.

Steps in Rule-Based Approach

Libraries that can be used for a rule-based approach are: Spacy(Best suited for production), fast.ai, NLTK(Not preferred nowadays)
In this article, we’ll work with the Spacy library to demonstrate the Rule-based Approach. Spacy is an open-source software library designed for advanced Natural Language Processing (NLP) tasks. It is built in Python and provides a wide range of functionalities for processing and analyzing large volumes of text data

A rule-matching engine in Spacy called the Matcher can work over tokens, entities, and phrases in a manner similar to regular expressions.

Spacy Installation:

# Spacy Installation
!pip install - U spacy
!pip install - U spacy-lookups-data
!python - m spacy download en_core_web_sm  # For English language

Example 1: Matching Token with Rule-based Approach

Step 1: The necessary modules are imported

Python3

#import modules 
import spacy 
#import the Matcher 
from spacy.matcher import Matcher  
#import the Span class 
from spacy.tokens import Span

Step 2: The English Language Spacy model is loaded

Python3

#The English model 'en_core_web_sm' is loaded 
spacy = spacy.load("en_core_web_sm")

Step 3: The input text is added and all the tokens are separated.

Python3

#The input text as a Document object 
txt ="Natural Language Processing serves as an interrelationship between human language and computers. Natural Language Processing is a subfield of Artificial Intelligence that helps machines process, understand and generate natural language intuitively."
doc = spacy(txt) 
Tokens = [] 
for token in doc: 
    Tokens.append(token) 
  
print('Tokens:',Tokens) 
print('Number of token :',len(Tokens))

Output:

Tokens: [Natural, Language, Processing, serves, as, an, interrelationship, between, human, 
language, and, computers, ., Natural, Language, Processing, is, a, subfield, of, Artificial,
 Intelligence, that, helps, machines, process, ,, understand, and, generate, natural, 
 language, intuitively, .]
Number of token : 34

Step 4: The rule-based matching Engine ‘Matcher’ is loaded.

Python3

#Matcher class object instantiation 
matcher = Matcher(spacy.vocab)

Step 5: The rule or the pattern to be searched in the text is added. Here the words ‘language’ and ‘human’ are set as patterns.

Python3

#pattern to be searched 
pattern = [[{'LOWER': 'language'}],[{'LOWER':'human'}]]

Step 6: The pattern is added to the matcher object using the ‘add’ method with the first parameter as ID and the second parameter as the pattern.

Python3

#adding the pattern/rule to the matcher object 
matcher.add("TokenMatch",pattern)

Step 7: The matcher object is called with the ‘doc’ object input text to match the pattern. The result is stored in ‘matches’ variable

Python3

#Matcher object called 
#returns match_id, start and stop indexes of the matched words 
matches = matcher(doc)

Step 8: The matched results are extracted and printed.

Python3

#Extracting matched results 
for m_id, start, end in matches: 
    string_id = spacy.vocab.strings[m_id]   
    span = doc[start:end] 
    print('match_id:{}, string_id:{}, Start:{}, End:{}, Text:{}'.format( 
        m_id, string_id, start, end, span.text) 
         )

Output:

match_id:9580390278045680890, string_id:TokenMatch, Start:1, End:2, Text:Language
match_id:9580390278045680890, string_id:TokenMatch, Start:8, End:9, Text:human
match_id:9580390278045680890, string_id:TokenMatch, Start:9, End:10, Text:language
match_id:9580390278045680890, string_id:TokenMatch, Start:14, End:15, Text:Language
match_id:9580390278045680890, string_id:TokenMatch, Start:31, End:32, Text:language

Example 2: Matching Phrases with the Rule-based Approach

Step 1: The PhraseMatcher module is imported from Spacy

Python3

# import necessary modules 
import spacy 
from spacy.matcher import PhraseMatcher

Step 2: The English Language Spacy model is loaded

Python3

#The English model 'en_core_web_sm' is loaded 
spacy = spacy.load('en_core_web_sm')

Step 3: The input text is added as ‘doc’ object

Python3

#The input text as a Document object 
txt ="Natural Language Processing serves as an interrelationship between human language and computers. Natural Language Processing is a subfield of Artificial Intelligence that helps machines process, understand and generate natural language intuitively."
doc = spacy(txt) 
print(doc)

Output:

Natural Language Processing serves as an interrelationship between human language and computers.
 Natural Language Processing is a subfield of Artificial Intelligence that helps machines process,
  understand and generate natural language intuitively.

Step 4: The PhraseMatcher object is instantiated.

Python3

# PhraseMatcher object creation 
matcher = PhraseMatcher(spacy.vocab, attr='LOWER')

Step 5: The list of phrases is added in term_list which is converted to a patterns object using ‘make_doc’ method to speed up the process.

Python3

# list of phrases 
term_list = ["Language Processing", "human language"] 
# phrases into document object 
patterns = [spacy.make_doc(t) for t in term_list]

Step 6: The created rule is added to the matcher object

Python3

# patterns added to the matcher object 
matcher.add("Phrase Match", None, *patterns)

Step 7: The matcher object is called on the input text ‘doc’ with parameter ‘is_spans=True’ that returns span objects directly. The extracted results are printed.

Python3

# Matcher object called. It returns Span objects directly 
matches = matcher(doc, as_spans=True) 
#Extracting matched results 
for span in matches: 
    print(span.text,":-", span.label_)

Output:

Language Processing :- Phrase Match
human language :- Phrase Match
Language Processing :- Phrase Match

Example 3: Named Entity Recognization with Spacy

Step 1: Import spacy and Load the English Language Spacy model

Python3

# import spacy  
import spacy 
#Load the English Language Spacy model 
nlp = spacy.load("en_core_web_sm")

Step 2: Named Entity Recognization with Spacy

Python3

#The input text as a Document object 
txt = """ 
My name is Pawan Kumar Gunjan. I live in India 
India, officially the Republic of India, is a country in South Asia. 
It is the seventh-largest country by area and the second-most populous country. 
Bounded by the Indian Ocean on the south, the Arabian Sea on the southwest,  
and the Bay of Bengal on the southeast, it shares land borders with Pakistan to the west;  
China, Nepal, and Bhutan to the north; and Bangladesh and Myanmar to the east. 
"""
doc = nlp(txt) 
Tokens = [] 
for entity in doc.ents: 
    print('Text:{}, Label:{}'.format(entity.text, entity.label_))

Output:

Text:Pawan Kumar Gunjan, Label:PERSON
Text:India, Label:GPE
Text:India, Label:GPE
Text:the Republic of India, Label:GPE
Text:South Asia, Label:LOC
Text:seventh, Label:ORDINAL
Text:second, Label:ORDINAL
Text:the Indian Ocean, Label:LOC
Text:the Arabian Sea, Label:LOC
Text:the Bay of Bengal, Label:LOC
Text:Pakistan, Label:GPE
Text:China, Label:GPE
Text:Nepal, Label:GPE
Text:Bhutan, Label:GPE
Text:Bangladesh, Label:GPE
Text:Myanmar, Label:GPE

Advantages of the Rule-based approach:

Easily interpretable as rules are explicitly defined
Rule-based techniques can help semi-automatically annotate some data in domains where you don’t have annotated data (for example, NER(Named Entity Recognization) tasks in a particular domain).
Functions even with scant or poor training data
Computation time is fast and it offers high precision
Many times, deterministic solutions to various issues, such as tokenization, sentence breaking, or morphology, can be achieved through rules (at least in some languages).

Disadvantages of the Rule-based approach:

Labor-intensive as more rules are needed to generalize
Generating rules for complex tasks is time-consuming
Needs regular maintenance
May not perform well in handling variations and exceptions in language usage
May not have a high recall metric

Why Rule-based Approach with Machine Learning and Neural Network Approaches?

Rule-based NLP usually deals with edge cases when included with other approaches.
It helps to speed up the data annotation. For instance, a rule-based technique is used for URL formats, date formats, etc., and a machine learning approach can be used to determine the position of text in a pdf file (including numerical data).
Also, in languages other than English annotated data is really scarce even for common tasks which are carried out by Rule-based NLP.
By using a rule-based approach, the computation performance of the pipeline is also improved.

Suggest improvement

Rule-Based Tokenization in NLP

Share your thoughts in the comments

Rule Based Approach in NLP

Rule-based approach in NLP

Steps in Rule-based approach in NLP:

Spacy Installation:

Example 1: Matching Token with Rule-based Approach

Step 1: The necessary modules are imported

Python3

Step 2: The English Language Spacy model is loaded

Python3

Step 3: The input text is added and all the tokens are separated.

Python3

Step 4: The rule-based matching Engine ‘Matcher’ is loaded.

Python3

Step 5: The rule or the pattern to be searched in the text is added. Here the words ‘language’ and ‘human’ are set as patterns.

Python3

Step 6: The pattern is added to the matcher object using the ‘add’ method with the first parameter as ID and the second parameter as the pattern.

Python3

Step 7: The matcher object is called with the ‘doc’ object input text to match the pattern. The result is stored in ‘matches’ variable

Python3

Step 8: The matched results are extracted and printed.

Python3

Example 2: Matching Phrases with the Rule-based Approach

Step 1: The PhraseMatcher module is imported from Spacy

Python3

Step 2: The English Language Spacy model is loaded

Python3

Step 3: The input text is added as ‘doc’ object

Python3

Step 4: The PhraseMatcher object is instantiated.

Python3

Step 5: The list of phrases is added in term_list which is converted to a patterns object using ‘make_doc’ method to speed up the process.

Python3

Step 6: The created rule is added to the matcher object

Python3

Step 7: The matcher object is called on the input text ‘doc’ with parameter ‘is_spans=True’ that returns span objects directly. The extracted results are printed.

Python3

Example 3: Named Entity Recognization with Spacy

Step 1: Import spacy and Load the English Language Spacy model

Python3

Step 2: Named Entity Recognization with Spacy

Python3

Advantages of the Rule-based approach:

Disadvantages of the Rule-based approach:

Why Rule-based Approach with Machine Learning and Neural Network Approaches?

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?