Open In App

Simple Plagiarism Detector in Python

Last Updated : 30 Dec, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to make a simple plagiarism detector in Python.

What is Plagiarism?

Plagiarism is simply called cheating. When one person copies the work or idea of another person and uses that in their work by their name it is called plagiarism. For example, if someone writing an article on geeksforgeeks and he/she copied the content from another site or resource it is said to be plagiarized content.

Difflib Module

In Python, there are various built-in modules used for making different tasks easy and difflib module is one of them. This module provides different functions and classes by using which we can compare the data sets. In this article, we are going to use SequenceMatcher() function class of this module.

SequenceMatcher()

This function is available in difflib module in Python which is used to compare any two strings or files. Using this function we are going to determine the amount plagiarism in a string or file by comparing them with each other.

Syntax: SequenceMatcher(isjunk=None, x, y)

Parameter:

isjunk: Optional argument isjunk must be None.

x, y: string variable or filename.

Example 1: Detecting Plagiarism in a string.

In this example, we are going to compare two strings to detect the plagiarism using SequenceMatcher() function. For that, we are storing two different strings in different variables and passing them as an argument in SequenceMatcher() function after converting the matched sequence into a ratio using ratio() function and then display the final result by converting it into an integer.

Python3




# Importing SequenceMatcher
# from difflib module
from difflib import SequenceMatcher
  
# Declaring string variables
string1 = 'I am geek'
string2 = 'I am geeks'
  
# Using the SequenceMatcher()
match = SequenceMatcher(None,
                        string1, string2)
  
# convert above output into ratio
# and multiplying it with 100
result = match.ratio() * 100
  
# Display the final result
print(int(result), "%")


Output:

94 %

Example 2: Detecting Plagiarized Content of Text Files

In this example, we are going to detect plagiarized content by comparing two text files. For that we use file handling in Python to read text files after that comparing them to detect the plagiarism as we have done in the first example. 

Step 1: Create Two Text Files

First, we have to create two text files so we can check Plagiarized content from both files.

doc1.txt:

Hey, This is GeeksForGeeks Page.

doc2.txt:

Hola, You are on GeeksForGeeks Page.

Step 2: Creating Plagiarism Detection in Python for Text Files

In this step what we do is open a text file and store the content of that file in variables file1 and file2 after that comparing them using SequenceMatcher() same as in the first example.

Python3




# importing SequenceMatcher of difflib module
from difflib import SequenceMatcher
  
with open('doc1.txt') as first_file,
     open('doc2.txt') as second_file:
      
    # Reading Both Text Files
    file1 = first_file.read()
    file2 = second_file.read()
      
    # Comparing Both Text Files
    ab = SequenceMatcher(None, file1,
                         file2).ratio()
      
    # converting decimal output in integer
    result = int(ab*100)
    print(f"{result}% Plagiarized Content")


Output:

70% Plagiarized Content


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads