aspose file tools*
The moose likes Java in General and the fly likes Java Html parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Java Html parser" Watch "Java Html parser" New topic
Author

Java Html parser

Zeena Shah
Greenhorn

Joined: Jul 31, 2004
Posts: 5
Hi frendz...
I want to make a html parse that will take a .dco fille as input and parse it..
plz help me if someone knows abt it...
Jessica Sant
Sheriff

Joined: Oct 17, 2001
Posts: 4313

you want to take a .doc file and parse it? as in Microsoft Word?

Check out the Jakarta POI project. -- It has an API to manipulate <icrosoft documents with Java.
Zeena Shah
Greenhorn

Joined: Jul 31, 2004
Posts: 5
thanx 4 ur reply...by .doc i mean any MS word document...infact i want to make a programme that will read in a word file and pull up all the keywords that a user can use for searching that document...like wat is done in google search engine...i made a search engine but that will be too tiring process to manually feed aal the related keywords in the database so that document is availabe when searched.

hope u will understand wat i want...
byz..
Jessica Sant
Sheriff

Joined: Oct 17, 2001
Posts: 4313

Did you look at the Jakarta POI project?? it should allow you to parse through the Word documents.
jetti madhu
Greenhorn

Joined: Feb 22, 2010
Posts: 7
Hi to all,
I tries to read an .doc file to open on browser but i unable to get the Tables and Images from .doc file..
IS anyone know how to convert an MS-office word (.doc and .docx) files to convert to Html using POI jar?
Please reply ............
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39570
    
  27
POI has no facilities for creating HTML. You could look into the JODConverter library - it uses OpenOffice under the hood to convert between many of the formats OO supports.


Ping & DNS - updated with new look and Ping home screen widget
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36563
    
  16
And welcome to JavaRanch , jetti madhu
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13883
    
  10

Welcome to JavaRanch, Jetti.

Please note that you've added your question to a very old topic from 2004 - it would have been better if you just started your own new topic, especially since your question isn't the same as the original one.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Java Html parser
 
Similar Threads
Another Exception in thread "Thread-209" java.lang.StackOverflowError from a valid regex
Reading HTML source of a URL
META tag is getting added while parsing HTML
WA #1.....word association
HTML Table