The article demonstrates how to extract paragraphs from a word document using the getParagraphs() method of XWPFDocument class provided by the Apache POI package. Apache POI is a project developed and maintained by Apache Software Foundation that provides libraries to perform numerous operations on Microsoft office files using java.
To extract paragraphs from a word file, the essential requirement is to import the following library of Apache.
poi-ooxml.jar
Approach
- Formulate the path of the word document
- Create a FileInputStream and XWPFDocument object for the word document.
- Retrieve the list of paragraphs using the getParagraphs() method.
- Iterate through the list of paragraphs to print it.
Implementation
- Step 1: Getting the path of the current working directory where the word document is located.
- Step 2: Creating a file object with the above-specified path.
- Step 3: Creating a document object for the word document.
- Step 4: Using the getParagraphs() method to retrieve the paragraphs list from the word file.
- Step 5: Iterating through the list of paragraphs
- Step 6: Printing the paragraphs
- Step 7: Closing the connections
Sample Input
The content of the Word document is as follows:
Implementation
Example
Java
// Java program to extract paragraphs from a Word Document // Importing IO package for basic file handling import java.io.*;
import java.util.List;
// Importing Apache POI package import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
// Main class to extract paragraphs from word document public class GFG {
// Main driver method
public static void main(String[] args) throws Exception
{
// Step 1: Getting path of the current working
// directory where the word document is located
String path = System.getProperty( "user.dir" );
path = path + File.separator + "WordFile.docx" ;
// Step 2: Creating a file object with the above
// specified path.
FileInputStream fin = new FileInputStream(path);
// Step 3: Creating a document object for the word
// document.
XWPFDocument document = new XWPFDocument(fin);
// Step 4: Using the getParagraphs() method to
// retrieve the list of paragraphs from the word
// file.
List<XWPFParagraph> paragraphs
= document.getParagraphs();
// Step 5: Iterating through the list of paragraphs
for (XWPFParagraph para : paragraphs) {
// Step 6: Printing the paragraphs
System.out.println(para.getText() + "\n" );
}
// Step 7: Closing the connections
document.close();
}
} |