This week's book giveaway is in the Clojure forum.
We're giving away four copies of Clojure in Action and have Amit Rathore and Francis Avila on-line!
See this thread for details.
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How can i convert a PDF file to XML file

 
Amit Yadav
Greenhorn
Posts: 8
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to convert a pdf file in a xml file. This pdf file may contain any format like table, text etc. Can anyone give me sorce or any other information regarding this.
 
Joe Ess
Bartender
Posts: 9214
9
Linux Mac OS X Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
PDF is not an easy-to-manipulate format by design. It is intended to be a finished product rather than an editable format (like RTF, DOC, HTML and so on). Our AccessingFileFormats FAQ has what options are available to interact with it.
 
Peter Chase
Ranch Hand
Posts: 1970
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A PDF is a description of how to render a document on a page. Things like "draw a vertical line here", "write 'foo bar baz' here in Courier". It does not contain any information about the format or organisation of the stuff it is rendering. You won't be able to tell that you're looking at a table, or a list of bullet points, or a paragraph, or anything like that.

The PDF format does contain information on a page-by-page basis. Therefore, page breaks are the one piece of format/organisation information that you can find.

If you want anything more than a raw stream of completely unformatted, disorganised text, one per page, you are out of luck. It's virtually impossible.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic