• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Parsing Word document

 
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, I need to parse a Microsoft Word document, and I came across a few problems. One of them is that I need to parse out the bold faced words. Is there a StringTokenizer or Reader that recognizes bold faced words?
 
Sheriff
Posts: 11343
Mac Safari Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I've never used the following site, but I understand it contains useful file formatting information for programmers...

http://www.wotsit.org/

(Taking apart a Word doc is no small feat.)
[ January 03, 2005: Message edited by: marc weber ]
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you can get the Word document saved in RTF or HTML format instead of .doc you will probably find it easier to parse since the doc format is notoriously hard to work with.
The free Open Office (download here) product has been able to read all of my .doc files, and can save in XML or other formats that are well documented and may be easier to parse.
The Apache POI toolkit offers some support for reading .doc files - but I think their best kit is for Excel files.
Bill
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic