aspose file tools*
The moose likes I/O and Streams and the fly likes How can i convert a PDF file to XML file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "How can i convert a PDF file to XML file" Watch "How can i convert a PDF file to XML file" New topic
Author

How can i convert a PDF file to XML file

Amit Yadav
Greenhorn

Joined: Aug 09, 2007
Posts: 8
I want to convert a pdf file in a xml file. This pdf file may contain any format like table, text etc. Can anyone give me sorce or any other information regarding this.
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8713
    
    6

PDF is not an easy-to-manipulate format by design. It is intended to be a finished product rather than an editable format (like RTF, DOC, HTML and so on). Our AccessingFileFormats FAQ has what options are available to interact with it.


"blabbing like a narcissistic fool with a superiority complex" ~ N.A.
[How To Ask Questions On JavaRanch]
Peter Chase
Ranch Hand

Joined: Oct 30, 2001
Posts: 1970
A PDF is a description of how to render a document on a page. Things like "draw a vertical line here", "write 'foo bar baz' here in Courier". It does not contain any information about the format or organisation of the stuff it is rendering. You won't be able to tell that you're looking at a table, or a list of bullet points, or a paragraph, or anything like that.

The PDF format does contain information on a page-by-page basis. Therefore, page breaks are the one piece of format/organisation information that you can find.

If you want anything more than a raw stream of completely unformatted, disorganised text, one per page, you are out of luck. It's virtually impossible.


Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How can i convert a PDF file to XML file
 
Similar Threads
Excel renderer
Generating PDF file from XML
OpenOffice File Conversion formats
Can I convert a Jsp to XML
Document Conversion