• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

java html parser.

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi frendz...
I want to make a html parse that will take a .dco fille as input and parse it..
plz help me if someone knows abt it...
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You need to look at java.util.regex which contains the SDK's regular expression classes. At least I think you do - but then I've no idea what a .dco file is so you might be asking something completely different.
 
Zeena Shah
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanx 4 ur reply...by .doc i mean any MS word document...infact i want to make a programme that will read in a word file and pull up all the keywords that a user can use for searching that document...like wat is done in google search engine...i made a search engine but that will be too tiring process to manually feed aal the related keywords in the database so that document is availabe when searched.

hope u will understand wat i want...
byz..

well sorry for typing mistake..its .doc
 
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
did u tried POI?
actually nw a days I m working on the same project.
So me too is searching for a efficient parser.
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
POI HWPF is in a very-alpha state and is not under active development but it will let you read the contents of a word doc (get the latest version out of SVN). If you are doing anything more complex (editing, converting), I recommend Open Office.
 
Ranch Hand
Posts: 88
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For a list of HTML parsers, you can try the following link:

http://www.java-tips.org/java-libraries/html-parser/

And if you want to see some examples of usage of regex package, you can visit the following url:

http://www.java-tips.org/java-se-tips/java.util.regex/

One related example available there is:

How to find and display hyperlinks contained within a web page
http://www.java-tips.org/java-se-tips/java.util.regex/how-to-find-and-display-hyperlinks-contained-within-a-web-page.html
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic