Simple Plagiarism Detector in Python

Last Updated : 30 Dec, 2022

In this article, we are going to make a simple plagiarism detector in Python.

What is Plagiarism?

Plagiarism is simply called cheating. When one person copies the work or idea of another person and uses that in their work by their name it is called plagiarism. For example, if someone writing an article on geeksforgeeks and he/she copied the content from another site or resource it is said to be plagiarized content.

Difflib Module

In Python, there are various built-in modules used for making different tasks easy and difflib module is one of them. This module provides different functions and classes by using which we can compare the data sets. In this article, we are going to use SequenceMatcher() function class of this module.

SequenceMatcher()

This function is available in difflib module in Python which is used to compare any two strings or files. Using this function we are going to determine the amount plagiarism in a string or file by comparing them with each other.

Syntax: SequenceMatcher(isjunk=None, x, y)

Parameter:

isjunk: Optional argument isjunk must be None.

x, y: string variable or filename.

Example 1: Detecting Plagiarism in a string.

In this example, we are going to compare two strings to detect the plagiarism using SequenceMatcher() function. For that, we are storing two different strings in different variables and passing them as an argument in SequenceMatcher() function after converting the matched sequence into a ratio using ratio() function and then display the final result by converting it into an integer.

Python3

# Importing SequenceMatcher 
# from difflib module 
from difflib import SequenceMatcher 
  
# Declaring string variables 
string1 = 'I am geek'
string2 = 'I am geeks'
  
# Using the SequenceMatcher() 
match = SequenceMatcher(None, 
                        string1, string2) 
  
# convert above output into ratio 
# and multiplying it with 100 
result = match.ratio() * 100
  
# Display the final result 
print(int(result), "%") 

Output:

94 %

Example 2: Detecting Plagiarized Content of Text Files

In this example, we are going to detect plagiarized content by comparing two text files. For that we use file handling in Python to read text files after that comparing them to detect the plagiarism as we have done in the first example.

Step 1: Create Two Text Files

First, we have to create two text files so we can check Plagiarized content from both files.

doc1.txt:

Hey, This is GeeksForGeeks Page.

doc2.txt:

Hola, You are on GeeksForGeeks Page.

Step 2: Creating Plagiarism Detection in Python for Text Files

In this step what we do is open a text file and store the content of that file in variables file1 and file2 after that comparing them using SequenceMatcher() same as in the first example.

Python3

# importing SequenceMatcher of difflib module 
from difflib import SequenceMatcher 
  
with open('doc1.txt') as first_file, 
     open('doc2.txt') as second_file: 
      
    # Reading Both Text Files 
    file1 = first_file.read() 
    file2 = second_file.read() 
      
    # Comparing Both Text Files 
    ab = SequenceMatcher(None, file1, 
                         file2).ratio() 
      
    # converting decimal output in integer 
    result = int(ab*100) 
    print(f"{result}% Plagiarized Content") 

Output:

70% Plagiarized Content

Suggest improvement

Encrypt and Decrypt Files using Python

Python Ordered Set

Share your thoughts in the comments

Simple Plagiarism Detector in Python

What is Plagiarism?

Difflib Module

SequenceMatcher()

Example 1: Detecting Plagiarism in a string.

Python3

Example 2: Detecting Plagiarized Content of Text Files

Step 1: Create Two Text Files

Step 2: Creating Plagiarism Detection in Python for Text Files

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?