Java Program to Extract Content from a ODF File
The full of ODF is Open Document Format. it is an international family of standards that’s the successor of commonly used deprecated vendor-specific document formats like .doc, .wpd, .xls . ODF documents are smaller when compared to other formats. OpenDocumentParser class is used from TIKA library to extract the content from the ODF file.
Attention reader! Don’t stop learning now. Get hold of all the important Java Foundation and Collections concepts with the Fundamentals of Java and Java Collections Course at a student-friendly price and become industry ready. To complete your preparation from learning a language to DS Algo and many more, please refer Complete Interview Preparation Course.
- BodyContentHandler(): It creates a content handler that writes XHTML body character events to an internal string buffer.
- Metadata() : It constructs new, empty metadata.
- ParseContext(): It creates a parse context object that is used to pass context information to Tika parsers.
- parse(): Instantiate the parser object, and invoke the parse method.
Following are the dependencies required for executing the following java code:
Content in the document :Geekforgeeks has a great content on DSA. Metadata of the document: date : = 2020-11-21T05:38:00Z meta:paragraph-count : = 1 meta:word-count : = 6 meta:initial-author : = Mohan Sai initial-creator : = Mohan Sai dc:creator : = Mohan Sai generator : = MicrosoftOffice/15.0 MicrosoftWord Word-Count : = 6 dcterms:created : = 2020-11-21T05:36:00Z dcterms:modified : = 2020-11-21T05:38:00Z Last-Modified : = 2020-11-21T05:38:00Z nbPara : = 1 Last-Save-Date : = 2020-11-21T05:38:00Z meta:character-count : = 40 Paragraph-Count : = 1 meta:save-date : = 2020-11-21T05:38:00Z modified : = 2020-11-21T05:38:00Z Edit-Time : = PT0S nbCharacter : = 40 nbPage : = 1 nbWord : = 6 Content-Type : = application/vnd.oasis.opendocument.text creator : = Mohan Sai meta:author : = Mohan Sai meta:creation-date : = 2020-11-21T05:36:00Z Creation-Date : = 2020-11-21T05:36:00Z xmpTPg:NPages : = 1 Character Count : = 40 editing-cycles : = 3 Page-Count : = 1 Author : = Mohan Sai meta:page-count : = 1