File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Sockets and Internet Protocols and the fly likes how can i read HTML file in java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Sockets and Internet Protocols
Bookmark "how can i read HTML file in java " Watch "how can i read HTML file in java " New topic

how can i read HTML file in java

prashant fusate

Joined: Aug 24, 2005
Posts: 16
Hi ,
I want to make and application that read the HTML file from
from website and want to take some action depending upon the information present in that HTML file(suppose in HTML file somewhere if ARR word present then i want to send a mail in respective person which is related to that word ARR

please help me

How can i developed the things in JAVA


Pradeep bhatt
Ranch Hand

Joined: Feb 27, 2002
Posts: 8927

It is same as reading a file.

Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 15080

Reading from a webpage is quite simple in Java; you can use class, call method openStream() on the URL object and read the HTML page from it. Then you need to look for "ARR" or whatever else you want to look for in the returned HTML page. You can do that with the methods in class String.

Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Mark Spritzler

Joined: Feb 05, 2001
Posts: 17276

I am going to move this to the Sockets and Internet Protocol forum. This forum is for Servlets questions directly. Since you aren't creating a Servlet, that is why I am moving it.


Perfect World Programming, LLC - iOS Apps
How to Ask Questions the Smart Way FAQ
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Look into URL as mentioned above or maybe HttpURLConnection or even the Apache HTTPClient package to pull HTML from web sites.

If you need to get complex stuff out of the HTML - more than just the contents of a tag or two with simple string manipulation - look into HTML parsers. I use the Quiotix parser described and linked HERE.

A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Matt Chambers

Joined: Oct 06, 2005
Posts: 2
Hello all, longtime reader with a quick question:

I'm hoping to extract data from a website (namely ncbi) and analyze data after several queries. Has anyone compared the performance of apache's httpclient to say, HTTPUrlConnection?

I'll try to do the same, but I wonder if there are any server/platform vagaries.

[ October 06, 2005: Message edited by: Matt Chambers ]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Your program will almost certainly be faster than the network so you'll spend more time waiting for data over the wire than waiting for your instructions to run. I'd guess HttpClient is "fast enough".
Matt Chambers

Joined: Oct 06, 2005
Posts: 2
Thanks Stan! (sorry for the late response).
Shamil Shah

Joined: May 04, 2009
Posts: 1
Hey.. i want to read html from a page where I need to login first.

Is there any way I can login programatically and then read html?

Example: I want to read html from .. right now i get html code for login page.. but i want to read the html after login.

Please help.
Ulf Dittmer

Joined: Mar 22, 2005
Posts: 42965
I'd use a library like jWebUnit for programmatic access to web sites. Makes it much easier to deal with the HTML, and it supports Basic and Form Authentication.
I agree. Here's the link:
subject: how can i read HTML file in java
It's not a secret anymore!