Open In App

How to Extract Data from PDF file in Android?

Last Updated : 15 Jan, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

PDF is a portable document format that is used to represent data such as images, tables, and many more. Nowadays the use of PDF is increased rapidly in different fields. Many apps have switched overusing PDF files to represent data. So some of the apps have a requirement to extract the data from the PDF file and to display that data inside our app. In this article, we will create an application to extract the data from the PDF file and display it in our app. 

What we are going to build? 

In this article, we will be building a simple application in which we will be extracting the data from the PDF on a button click and display the extracted data in our Text View.

Step by Step Implementation 

Step 1: Create a New Project

To create a new project in Android Studio please refer to How to Create/Start a New Project in Android Studio. Note that select Java as the programming language.

Step 2: Add dependency to the build.gradle(Module:app)

Navigate to the Gradle Scripts > build.gradle(Module:app) and add the below dependency in the dependencies section.

implementation ‘com.itextpdf:itextg:5.5.10’

After adding the dependency click on the sync now option and sync your project. After adding dependency let’s move towards adding a PDF file inside your app. 

Step 3: Adding PDF file in your app

As we are extracting data from PDF files, so we will be adding PDF files inside our app. For adding PDF files to your app we have to create the raw folder first. Please refer to Resource Raw Folder in Android Studio to create a raw folder in android. After creating a new raw directory copy and paste your PDF file inside that “raw” folder. After adding that PDF file in your app, now we will move towards implementation in the XML part.    

Step 4: Working with the activity_main.xml file

Go to the activity_main.xml file and refer to the following code. Below is the code for the activity_main.xml file.

XML




<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout 
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:orientation="vertical"
    tools:context=".MainActivity">
  
    <ScrollView
        android:layout_width="match_parent"
        android:layout_height="match_parent">
          
        <!--text view for displaying our extracted text-->
        <TextView
            android:id="@+id/idPDFTV"
            android:layout_width="match_parent"
            android:layout_height="match_parent" />
  
    </ScrollView>
  
    <!--button for starting extraction process-->
    <Button
        android:id="@+id/idBtnExtract"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_alignParentBottom="true"
        android:layout_centerHorizontal="true"
        android:layout_marginBottom="20dp"
        android:text="Extract Text from PDF"
        android:textAllCaps="false" />
      
</RelativeLayout>


After adding XML code now we will move towards our Java part. 

Step 5: Working with the MainActivity.java file

Go to the MainActivity.java file and refer to the following code. Below is the code for the MainActivity.java file. Comments are added inside the code to understand the code in more detail.

Java




import android.os.Bundle;
import android.view.View;
import android.widget.Button;
import android.widget.TextView;
  
import androidx.appcompat.app.AppCompatActivity;
  
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
  
public class MainActivity extends AppCompatActivity {
      
    // creating variables for
    // button and text view.
    private Button extractPDFBtn;
    private TextView extractedTV;
  
    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
          
        // initializing variables for button and text view.
        extractedTV = findViewById(R.id.idPDFTV);
        extractPDFBtn = findViewById(R.id.idBtnExtract);
          
        // adding on click listener for button
        extractPDFBtn.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                // calling method to extract
                // data from PDF file.
                extractPDF();
            }
        });
    }
      
    private void extractPDF() {
        try {
            // creating a string for 
            // storing our extracted text.
            String extractedText = "";
              
            // creating a variable for pdf reader 
            // and passing our PDF file in it.
            PdfReader reader = new PdfReader("res/raw/amiya_rout.pdf");
              
            // below line is for getting number
            // of pages of PDF file.
            int n = reader.getNumberOfPages();
              
            // running a for loop to get the data from PDF
            // we are storing that data inside our string.
            for (int i = 0; i < n; i++) {
                extractedText = extractedText + PdfTextExtractor.getTextFromPage(reader, i + 1).trim() + "\n";
                // to extract the PDF content from the different pages
            }
              
            // after extracting all the data we are 
            // setting that string value to our text view.
            extractedTV.setText(extractedText);
              
            // below line is used for closing reader.
            reader.close();
        } catch (Exception e) {
            // for handling error while extracting the text file.
            extractedTV.setText("Error found is : \n" + e);
        }
    }
}


After adding this code now run your app and see the output of the code. 

Output:

After you run the app click on Extract Data from PDF button and you will get to see that text is extracted from the PDF file. 



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads