File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Sockets and Internet Protocols and the fly likes how can i read HTML file in java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Sockets and Internet Protocols
Bookmark "how can i read HTML file in java " Watch "how can i read HTML file in java " New topic
Author

how can i read HTML file in java

prashant fusate
Greenhorn

Joined: Aug 24, 2005
Posts: 16
Hi ,
I want to make and application that read the HTML file from
from website and want to take some action depending upon the information present in that HTML file(suppose in HTML file somewhere if ARR word present then i want to send a mail in respective person which is related to that word ARR

please help me

How can i developed the things in JAVA

THANKS


psfusate
Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8898

It is same as reading a file.


Groovy
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 13869
    
  10

Reading from a webpage is quite simple in Java; you can use class java.net.URL, call method openStream() on the URL object and read the HTML page from it. Then you need to look for "ARR" or whatever else you want to look for in the returned HTML page. You can do that with the methods in class String.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 7 API documentation
Scala Notes - My blog about Scala
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

I am going to move this to the Sockets and Internet Protocol forum. This forum is for Servlets questions directly. Since you aren't creating a Servlet, that is why I am moving it.

Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Look into URL as mentioned above or maybe HttpURLConnection or even the Apache HTTPClient package to pull HTML from web sites.

If you need to get complex stuff out of the HTML - more than just the contents of a tag or two with simple string manipulation - look into HTML parsers. I use the Quiotix parser described and linked HERE.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Matt Chambers
Greenhorn

Joined: Oct 06, 2005
Posts: 2
Hello all, longtime reader with a quick question:

I'm hoping to extract data from a website (namely ncbi) and analyze data after several queries. Has anyone compared the performance of apache's httpclient to say, HTTPUrlConnection?

I'll try to do the same, but I wonder if there are any server/platform vagaries.

Matt
[ October 06, 2005: Message edited by: Matt Chambers ]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Your program will almost certainly be faster than the network so you'll spend more time waiting for data over the wire than waiting for your instructions to run. I'd guess HttpClient is "fast enough".
Matt Chambers
Greenhorn

Joined: Oct 06, 2005
Posts: 2
Thanks Stan! (sorry for the late response).
Shamil Shah
Greenhorn

Joined: May 04, 2009
Posts: 1
Hey.. i want to read html from a page where I need to login first.

Is there any way I can login programatically and then read html?

Example: I want to read html from mail.yahoo.com .. right now i get html code for login page.. but i want to read the html after login.

Please help.
Regards,
Shamil
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39547
    
  27
I'd use a library like jWebUnit for programmatic access to web sites. Makes it much easier to deal with the HTML, and it supports Basic and Form Authentication.


Ping & DNS - updated with new look and Ping home screen widget
 
 
subject: how can i read HTML file in java
 
Similar Threads
Important!
Object Referencing
[boolean][int] array
HTML from jasper
Read Word/PDF