aspose file tools*
The moose likes Beginning Java and the fly likes how to parse html webpage Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "how to parse html webpage" Watch "how to parse html webpage" New topic
Author

how to parse html webpage

naga raaju
Greenhorn

Joined: Mar 14, 2008
Posts: 29
hi guys can anybody give idea to parse html webpage live url parsing

using java.


i have code but the out put is in the form of html tags
so how can i split the tags so give idea friends

import java.net.*;
import java.io.*;

public class URLReader {
public static void main(String[] ar) throws Exception {

URL yahoo = new URL("http://finance.yahoo.com");
BufferedReader in = new BufferedReader(new InputStreamReader(yahoo.openStream()));
BufferedWriter wr=new BufferedWriter(new FileWriter("sample.txt"));

String inputLine;
while ((inputLine = in.readLine()) != null)
// System.out.println(inputLine);
try
{
wr.write(inputLine);
}catch(Exception e)
{
e.printStackTrace();
}
in.close();
}
}
bye
Naga
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42648
    
  65
There are many things you might want to accomplish with a downloaded web page. You need to tell us what you're trying to do with it.

If you want to extract the text, I'd start by converting the HTML into well-formed XML; libraries like NekoXNI, JTidy and TagSoup can do this for you.


Ping & DNS - my free Android networking tools app
naga raaju
Greenhorn

Joined: Mar 14, 2008
Posts: 29
hi
thanks for your reply,
i need some text from the web pages.so what sholud i do.


can i depend on third party API. or that is possible with java coding.


bye
Naga
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8971
    
    9

There is an HTML parser provided in the Java API. As Ulf says, it depends on your exact requirements whether it will fit the bill or not.


[How To Ask Questions On JavaRanch]
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42648
    
  65
That depends on the specifics. Are you talking about one particular page on one particular site? Several pages? Several sites? Is the layout of the page(s) predictable? Are there ID tags on which you can rely?

You will need to do some coding, but the libraries I mentioned will help you get started.
Randi Randwa
Greenhorn

Joined: Feb 21, 2009
Posts: 7

You can also use biterscripting (.com for free download) for parsing html. It works great.

They have a sample script posted at http://www.biterscripting.com/SS_URLs.html . This script extracts referenced URLs from a page. Another sample script http://www.biterscripting.com/SS_SearchURL.html will search a page for specific search words. The sample script http://www.biterscripting.com/SS_SearchWeb.html is de facto your own search engine.

You can get started with these scripts.

If you come up with new html parsing scripts of your own, can you please post them for the rest of us ? Thanks.

Randi
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39885
    
  28
Welcome to JavaRanch, Randi but please don't resurrect 10-month old threads. Have a look at this FAQ.
 
Don't get me started about those stupid light bulbs.
 
subject: how to parse html webpage