Java Program to Extract Paragraphs From a Word Document
The article demonstrates how to extract paragraphs from a word document using the getParagraphs() method of XWPFDocument class provided by the Apache POI package. Apache POI is a project developed and maintained by Apache Software Foundation that provides libraries to perform numerous operations on Microsoft office files using java.
To extract paragraphs from a word file, the essential requirement is to import the following library of Apache.
Attention reader! Don’t stop learning now. Get hold of all the important Java Foundation and Collections concepts with the Fundamentals of Java and Java Collections Course at a student-friendly price and become industry ready. To complete your preparation from learning a language to DS Algo and many more, please refer Complete Interview Preparation Course.
- Formulate the path of the word document
- Create a FileInputStream and XWPFDocument object for the word document.
- Retrieve the list of paragraphs using the getParagraphs() method.
- Iterate through the list of paragraphs to print it.
- Step 1: Getting the path of the current working directory where the word document is located.
- Step 2: Creating a file object with the above-specified path.
- Step 3: Creating a document object for the word document.
- Step 4: Using the getParagraphs() method to retrieve the paragraphs list from the word file.
- Step 5: Iterating through the list of paragraphs
- Step 6: Printing the paragraphs
- Step 7: Closing the connections
The content of the Word document is as follows: