Open In App

Delete pages from a PDF file in Python

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, We are going to learn how to delete pages from a pdf file in Python programming language.

Introduction

Modifying documents is a common task performed by many users. We can perform this task easily with Python libraries/modules that allow the language to process almost any file, the possibility of data processing inside Programming languages have become limitless. This article is about how to delete pages from a PDF file in Python.

Prerequisite:

The PyMuPDF library will be used for PDF processing in this article. To install the library in our system, run the following command in the command prompt.

pip install pymupdf

NOTE: This library is imported by using the following command.

import fitz

Deleting Pages with PyMuPDF

The PyMuPDF library offers various methods that simplify deleting pages from a PDF file. It allows specifying a single page, a range of page numbers, or a list with the page numbers.

Using each method, the following examples demonstrate how to delete pages from PDF files.

Input pdf file used:

 

Method 1: Deleting a singular page from a PDF

The delete_page() function in the library allows the deletion of a single page. The function takes an argument of the page number. The page associated with the number is deleted in the PDF. Here also indexing starts from ‘0’ so if we pass ‘0’ as an argument first page will be deleted. The following example deletes page number 1. 

Note: The pdf file and program should in the same folder to avoid an error because we are not passing the path.

Python3




import fitz
  
# Path of the PDF file
input_file = r"test.pdf"
  
# Path for the output PDF file
output_file = r"modified_test.pdf"
  
# Opening the PDF file and creating a handle for it
file_handle = fitz.open(input_file)
  
# The page no. denoted by the variable will be deleted
page = 0
  
# Passing the variable as an argument
file_handle.delete_page(page)
  
# Saving the file
file_handle.save(output_file)


Output: After running the above code a new file is generated with the name ‘modified_test.pdf’ in which first page is deleted.

Delete pages from a PDF file in Python

modified_test.pdf file created

Method 2: Deleting a range of page numbers from a PDF

The delete_pages() method in the Python library allows for the deletion of a range of page numbers. The function considers two variables: first, the starting index, and second, the ending index. The pages between these indexes will be deleted. The following example opens the PDF file and deletes the pages between 2 and 7 page numbers.

Python3




import fitz
  
# Path of the PDF file
input_file = r"test.pdf"
  
# Path for the output PDF file
output_file = r"modified_test.pdf"
  
# Opening the PDF file and creating a handle for it
file_handle = fitz.open(input_file)
  
# The index (page no.) from where the pages are to be deleted
start = 2
  
# The index to which the pages are to be deleted
end = 7
  
# Passing the start & end index as arguments
file_handle.delete_pages(start, end)
  
# Saving the file
file_handle.save(output_file)


Output: After running the above code we get the modified pdf file in which pages number 3, 4, 5, 6, 7, and 8 are deleted.

 

Method 3: Deleting a list of pages from a PDF

Similarly, the select() method allows the deletion of pages based on their numbers. i.e., The select function takes a list as an argument containing the page number of the pages we are willing to preserve, and the rest of the pages are deleted. Ex. If a PDF contains 10 pages, and we pass in argument the list [1, 3, 5] to the select function, then only these pages will remain, and the rest will be deleted. The following example deletes all the pages other than the page numbers 0, 1, and 3 from the PDF.

Python3




import fitz
  
# Path of the PDF file
input_file = r"test.pdf"
  
# Path for the output PDF file
output_file = r"modified_test.pdf"
  
# Opening the PDF file and creating a handle for it
file_handle = fitz.open(input_file)
  
# This list contains the pages that we are willing to keep
# Rest are deleted
pages_list = [0,1,3]
  
# Passing the list to the select function
file_handle.select(pages_list)
  
# Saving the file
file_handle.save(output_file)


Output: The output of the above code is a modified pdf file in which only pages 1, 2, and 4 are present rest are deleted.

 



Last Updated : 05 Feb, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads