Win a copy of Rust Web Development this week in the Other Languages forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

Document Conversion

 
Ranch Hand
Posts: 152
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have a requirement where i have to convert a document(doc,pdf) into an xml file. Are there any api's available that can help me to achieve this?

Thanks
 
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
not that much clear about your requirement, is there is xml file in Doc or PDF format and you want to create .xml file from that ??

Apache POI - HWPF - Java API to Handle Microsoft Word Files can help you with reading word files and then you can use XML generation with JAVA to create XML files.
 
Sahil Sharma
Ranch Hand
Posts: 152
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
the document will contain normal contents or images but in a particular format. e.g. [Heading, sub-heading, paragraphs etc]
 
Rancher
Posts: 43027
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
PDFs are tough; the best you will be able to do is to extract any text they contain, but structural information will be lost (unless you're prepared to invest a lot of time).
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic