Two Laptop Bag*
The moose likes Sockets and Internet Protocols and the fly likes Extracting content Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Sockets and Internet Protocols
Bookmark "Extracting content " Watch "Extracting content " New topic
Author

Extracting content

Raghav Mathur
Ranch Hand

Joined: Jan 12, 2001
Posts: 641
hi
i don't know whether this topic suits this section or not , so please forgive me for that .
I want to know whether there is a way to extract contents of a web page which is loaded from a server .
thanks in advance
regards
raghav mathur


Raghav.
Ruud Steeghs
Ranch Hand

Joined: Jul 09, 2001
Posts: 56
Hi,
Don't know exactly what you are looking for, but perhaps you can find an answer at this page:
http://www.junit.org/news/extension/index.htm
Take a good look at HttpUnit, hopefully that's just what you're looking for.
Have fun,
-Ruud.
Raghav Mathur
Ranch Hand

Joined: Jan 12, 2001
Posts: 641
I,am looking for a crawler application which would extract the content of a web page and even do some searching on a specific keyword.
i hope i,am able to explain what i,am looking for
regards
raghav mathur
Originally posted by Ruud Steeghs:
Hi,
Don't know exactly what you are looking for, but perhaps you can find an answer at this page:
http://www.junit.org/news/extension/index.htm
Take a good look at HttpUnit, hopefully that's just what you're looking for.
Have fun,
-Ruud.
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
Frankly, it sounds like something you could do by grabbing the web page using java.net.URL and then do a primitive parse/search of the content using JDK 1.4 regular expression support (java.util.regex.*) or perhaps Jakarta ORO.
If that doesn't do it, I recently spotted a library that will parse an HTML document and expose it using DOM. Drop me a note and I'll see if I can find it again.
- Peter
Raghav Mathur
Ranch Hand

Joined: Jan 12, 2001
Posts: 641
Please give me the url of the library you spotted . I'll try with the option no.1 . Also if you could provide some good tutorial to start with java networking . I tried to go through it from "complete reference" but just can't get the concepts .
regards
raghav mathur

If that doesn't do it, I recently spotted a library that will parse an HTML document and expose it using DOM. Drop me a note and I'll see if I can find it again.
- Peter[/QB]
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
Gotcha! The library I was referring to was NekoHTML. Unfortunately I don't have the time to go into any detail right now, maybe later.
- Peter
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Extracting content
 
Similar Threads
Outsourcing and Abuses
Exam Experience
naked in Seattle
Problem related to cookies
Generate JSP based on template