Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Java Html parser

 
Zeena Shah
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi frendz...
I want to make a html parse that will take a .dco fille as input and parse it..
plz help me if someone knows abt it...
 
Jessica Sant
Sheriff
Posts: 4313
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
you want to take a .doc file and parse it? as in Microsoft Word?

Check out the Jakarta POI project. -- It has an API to manipulate <icrosoft documents with Java.
 
Zeena Shah
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanx 4 ur reply...by .doc i mean any MS word document...infact i want to make a programme that will read in a word file and pull up all the keywords that a user can use for searching that document...like wat is done in google search engine...i made a search engine but that will be too tiring process to manually feed aal the related keywords in the database so that document is availabe when searched.

hope u will understand wat i want...
byz..
 
Jessica Sant
Sheriff
Posts: 4313
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Did you look at the Jakarta POI project?? it should allow you to parse through the Word documents.
 
jetti madhu
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi to all,
I tries to read an .doc file to open on browser but i unable to get the Tables and Images from .doc file..
IS anyone know how to convert an MS-office word (.doc and .docx) files to convert to Html using POI jar?
Please reply ............
 
Ulf Dittmer
Rancher
Pie
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
POI has no facilities for creating HTML. You could look into the JODConverter library - it uses OpenOffice under the hood to convert between many of the formats OO supports.
 
Campbell Ritchie
Sheriff
Posts: 48363
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And welcome to JavaRanch , jetti madhu
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15203
36
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch, Jetti.

Please note that you've added your question to a very old topic from 2004 - it would have been better if you just started your own new topic, especially since your question isn't the same as the original one.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic