| Author |
Extracting content
|
Raghav Mathur
Ranch Hand
Joined: Jan 12, 2001
Posts: 639
|
|
hi i don't know whether this topic suits this section or not , so please forgive me for that . I want to know whether there is a way to extract contents of a web page which is loaded from a server . thanks in advance regards raghav mathur
|
Raghav.
|
 |
Ruud Steeghs
Ranch Hand
Joined: Jul 09, 2001
Posts: 56
|
|
Hi, Don't know exactly what you are looking for, but perhaps you can find an answer at this page: http://www.junit.org/news/extension/index.htm Take a good look at HttpUnit, hopefully that's just what you're looking for. Have fun, -Ruud.
|
 |
Raghav Mathur
Ranch Hand
Joined: Jan 12, 2001
Posts: 639
|
|
I,am looking for a crawler application which would extract the content of a web page and even do some searching on a specific keyword. i hope i,am able to explain what i,am looking for regards raghav mathur
Originally posted by Ruud Steeghs: Hi, Don't know exactly what you are looking for, but perhaps you can find an answer at this page: http://www.junit.org/news/extension/index.htm Take a good look at HttpUnit, hopefully that's just what you're looking for. Have fun, -Ruud.
|
 |
Peter den Haan
author
Ranch Hand
Joined: Apr 20, 2000
Posts: 3252
|
|
Frankly, it sounds like something you could do by grabbing the web page using java.net.URL and then do a primitive parse/search of the content using JDK 1.4 regular expression support (java.util.regex.*) or perhaps Jakarta ORO. If that doesn't do it, I recently spotted a library that will parse an HTML document and expose it using DOM. Drop me a note and I'll see if I can find it again. - Peter
|
 |
Raghav Mathur
Ranch Hand
Joined: Jan 12, 2001
Posts: 639
|
|
Please give me the url of the library you spotted . I'll try with the option no.1 . Also if you could provide some good tutorial to start with java networking . I tried to go through it from "complete reference" but just can't get the concepts . regards raghav mathur
If that doesn't do it, I recently spotted a library that will parse an HTML document and expose it using DOM. Drop me a note and I'll see if I can find it again. - Peter[/QB]
|
 |
Peter den Haan
author
Ranch Hand
Joined: Apr 20, 2000
Posts: 3252
|
|
Gotcha! The library I was referring to was NekoHTML. Unfortunately I don't have the time to go into any detail right now, maybe later. - Peter
|
 |
 |
|
|
subject: Extracting content
|
|
|