Open In App

Relationship Extraction in NLP

Relationship extraction in natural language processing (NLP) is a technique that helps understand the connections between entities mentioned in text. In a world brimming with unstructured textual data, relationship extraction is an effective technique for organizing information, constructing knowledge graphs, aiding information retrieval, and supporting decision-making processes by identifying and classifying the associations between entities.

The main goal of relationship extraction is to extract valuable insights from text that enrich our understanding of the relationships that bind people, organizations, concepts, etc.



What is Relationship Extraction in NLP?

Relationship Extraction (RE) is an important process in Natural Language Processing that automatically identifies and categorizes the connections between entities within natural language text. These entities can encompass individuals, organizations, locations, dates, or any other nouns or concepts mentioned in the text. The relationships denote how these entities are related to each other, like “founder of”, “located in”, “works at” “married to”, etc. For instance, “John works at the company” illustrates a “works at” relationship from John to the company. This extracted relationship serves to enrich the semantic understanding of the text and can be organized into structured data for various downstream applications.



Approaches of Extracting Relationships in NLP

1. Rule Based or Pattern Based Approach

2. Supervised Relationship Extraction

3. Unsupervised Relationship Extraction

Types of Relationship Extraction in NLP

We can categorize relational extraction into various types, which are listed below:

Binary Relationship Extraction

  1. This extraction process focuses on identifying and categorizing relationships between pairs of entities, which is a fundamental form of relationship extraction and is often used when the relationships are simple and can be expressed as pairs.
  2. For example, determining whether a person “founded” an organization (Sandeep Jain founded GeeksforGeeks)
  3. The process of binary relationship extraction involves the following steps:
    • Named entity recognition is performed to identify and classify entities in the test. Entities can be people, locations or any other relevant nouns in the context.
    • To analyze the grammatical structure of the text and understand the relationship between words dependency parsing is used.
    • The model determines the relationship between two identified entities by examining the context and co-occurrence of the entities in the text.
    • If a relationship is identified, it is often classified into a specific type or category. For example, if the entities are a person and an organization, the relationship might be categorized as “works for.”
    • The final step extracts the binary relationships from the text and store them in a structured format.

Ternary Relationship Extraction

Nested Relationship Extraction

  1. This extraction process is used when relationships within a text are hierarchical or embedded within one another.
  2. For example, in a sentence like “Sandeep Jain, the CEO of GeeksforGeeks, founded the company,” there are nested relationships involving Sandeep Jain, GeeksforGeeks and the “founded” relationship.
  3. The process is quite similar to ternary and binary relationship extraction.
    • The process begins with name entity recognition.
    • The dependency parsing is employed to analyze the grammatical structure of the text.
    • The primary focus of nested relationship extraction is to identify and extract hierarchical or nested relationships between entities. This involves determining how entities are hierarchically structured and how they relate to each other. For example, you might want to extract relationships like “is part of,” “contains,” “is a member of,” and so on.
    • Once these nested relationships are identified, they are often classified into specific types or categories. This categorization helps in understanding the nature of the hierarchical relationships among the entities. For instance, if you’re extracting relationships between parts of a machine, you might classify them as “subcomponent,” “component,” or “assembly.”
    • The final step is to extract the nested relationships from the text and represent them in a structured format.

Temporal Relationship Extraction

  1. This extraction focuses on identifying relationships with a temporal dimension which includes determining when an event occurred or when a relationship was valid.
  2. Temporal relationships are essential for understanding the chronological order, duration, and temporal dependencies between events.
  3. For example, “GeeksforGeeks was founded by Sandeep Jain in 2008”.
  4. The process of temporal relationship extraction involves following step:
    • Temporal relationship extraction involves recognizing and identifying events, actions, or entities mentioned in the text.
    • In parallel, the system needs to identify and classify temporal expressions within the text. Temporal expressions can be specific dates, times, durations, or more complex phrases like “two weeks ago” or “in the future.”
    • Dependency parsing is often used to analyze the grammatical structure of the text, enabling the understanding of how different elements in the text are connected, including how temporal expressions relate to events or entities.
    • The core task of temporal relationship extraction involves identifying the temporal relationships between events or entities and the associated temporal expressions.
    • Once the temporal relationships are identified, they are typically categorized into specific types or classes. These classes can include “before,” “after,” “during,” “simultaneous,” “started,” “ended,” and more, depending on the nuances of the temporal relationships.
    • The final step is to extract and represent these temporal relationships in a structured format, such as a timeline or a knowledge graph.

Casual Relationship Extraction

  1. This extraction identifies relationships that express cause and effect which is crucial in applications like identifying the causes of diseases or understanding the reasons behind certain events.
  2. The process of causal relationship extraction is to determine if a causal relationship exists between the identified events or entities.
    • This requires examining the context and linguistic cues in the text to identify whether one event or entity is causing another, or if there is a correlation between them.
    • Once a causal relationship is identified, it can be further classified into specific types or categories.
    • For instance, causal relationships can be categorized as direct causation, indirect causation, correlation, reverse causation, or temporal causation.
    • The final step is to extract and represent the causal relationships in a structured format, such as a causal graph or knowledge base.

Cross-Sentence Relationship Extraction

  1. When relationships span multiple sentences, Cross-sentence relationship extraction is used to identify and extract relationships that exist across sentence boundaries, often requiring coreference resolution and contextual analysis.
  2. The process involves:
    • Cross-sentence relationship extraction is to identify and extract relationships between entities or events that are mentioned in different sentences. This can involve recognizing how actions or entities mentioned in one sentence relate to those in another, such as causality, dependency, or association.
    • Once the relationships are identified, they can be categorized into specific types or classes. The classification helps in understanding the nature of the relationships that span multiple sentences, such as causal relationships, temporal dependencies, or co-occurrence.

Open Information Extraction (OpenIE)

In this article, we will see how to perform Relationship extraction from a set of text.

Step-by-step implementation

Installing required modules

We will need to install Transformers module for named entity recognition (NER) using the BERT model and spaCy module for natural language processing in our runtime. Then we will install a small English model of spaCy called ‘en_core_web_sm’ which is used for various NLP tasks like tokenization and dependency parsing.

!pip install spacy-transformers
!python -m spacy download en_core_web_sm

Importing module

Now we will import all required modules like pipeline etc.




import spacy
from transformers import pipeline

Loading spaCy model

To use the spacy model, It should downloaded in your system.

Next we will load the English model of spaCy for text processing.




# Load spaCy's transformer-based model
nlp = spacy.load("en_core_web_sm")

Text processing

We will consider any sample text as per our choices. Then we will perform text process using the spaCy model which is loaded in the previous code.




# Define a sample text for relationship extraction
text = "GeekforGeeks was founded by Sandeep Jain. Sandeep Jain was a former teacher. GeeksforGeeks is a computer science portal and offers various cources and articles."
# Process the text with spaCy
doc = nlp(text)
 
# Create a list to store extracted relationships
relationships = []

Fine-tune model for named entity recognition

We will use a fine-tuned model for Named Entity Recognition which can be loaded by transformers pipeline. This model will be used to extract the named entities from the text.




# Use Hugging Face's transformers NER pipeline for named entity recognition
nlp_ner = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")
  
# Extract named entities and their labels
named_entities = nlp_ner(text)

Checking Entity Labels and Extracting Relationships

Now we will iterate the model through the set of sentences and check if there is an entity with a known labels like person or org etc. Then we will check the grammatical dependencies to extract relationships.




# Iterate through the sentences in the document
for sent in doc.sents:
    # Iterate through the named entities (people, organizations etc.) in the sentence
    for ent in sent.ents:
        # Check if the entity has a known label and is a person or organization
        if ent.label_ in ["PERSON", "ORG"]:
            # Extract the relationship
            for token in sent:
                if token.dep_ in ["attr", "nsubj", "dobj"]:
                    relationships.append((ent.text, token.text))

Printing extracted relationships

Finally, we will print the relationships with the corresponding named entities.




for named_entities, relation in relationships:
    print(f"{named_entities} --> {relation}")

Output:

Sandeep Jain --> Jain
Sandeep Jain --> teacher
GeeksforGeeks --> GeeksforGeeks
GeeksforGeeks --> portal
GeeksforGeeks --> cources

Conclusion

We can conclude that, relation extraction is an important task in NLP and can be done by using various models. Our approach gives a desired output by covering Open Information extraction and Binary extraction techniques. However, an un-usual output, ‘GeeksforGeeks->GeeksforGeeks‘ is generated which shows that there is more requirement of fine-tuning. Here, we have used already available fine-tuned model, but we can also perform manual fine-tuning, or another available fine-tuned model can be used as per requirement.


Article Tags :