File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Java Html parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Java Html parser" Watch "Java Html parser" New topic

Java Html parser

Zeena Shah

Joined: Jul 31, 2004
Posts: 5
Hi frendz...
I want to make a html parse that will take a .dco fille as input and parse it..
plz help me if someone knows abt it...
Jessica Sant

Joined: Oct 17, 2001
Posts: 4313

you want to take a .doc file and parse it? as in Microsoft Word?

Check out the Jakarta POI project. -- It has an API to manipulate <icrosoft documents with Java.
Zeena Shah

Joined: Jul 31, 2004
Posts: 5
thanx 4 ur .doc i mean any MS word document...infact i want to make a programme that will read in a word file and pull up all the keywords that a user can use for searching that wat is done in google search engine...i made a search engine but that will be too tiring process to manually feed aal the related keywords in the database so that document is availabe when searched.

hope u will understand wat i want...
Jessica Sant

Joined: Oct 17, 2001
Posts: 4313

Did you look at the Jakarta POI project?? it should allow you to parse through the Word documents.
jetti madhu

Joined: Feb 22, 2010
Posts: 7
Hi to all,
I tries to read an .doc file to open on browser but i unable to get the Tables and Images from .doc file..
IS anyone know how to convert an MS-office word (.doc and .docx) files to convert to Html using POI jar?
Please reply ............
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
POI has no facilities for creating HTML. You could look into the JODConverter library - it uses OpenOffice under the hood to convert between many of the formats OO supports.
Campbell Ritchie

Joined: Oct 13, 2005
Posts: 46405
And welcome to JavaRanch , jetti madhu
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 15093

Welcome to JavaRanch, Jetti.

Please note that you've added your question to a very old topic from 2004 - it would have been better if you just started your own new topic, especially since your question isn't the same as the original one.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
I agree. Here's the link:
subject: Java Html parser
It's not a secret anymore!