Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

how can i read HTML file in java

 
prashant fusate
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi ,
I want to make and application that read the HTML file from
from website and want to take some action depending upon the information present in that HTML file(suppose in HTML file somewhere if ARR word present then i want to send a mail in respective person which is related to that word ARR

please help me

How can i developed the things in JAVA

THANKS
 
Pradeep bhatt
Ranch Hand
Posts: 8927
Firefox Browser Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is same as reading a file.
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15214
36
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Reading from a webpage is quite simple in Java; you can use class java.net.URL, call method openStream() on the URL object and read the HTML page from it. Then you need to look for "ARR" or whatever else you want to look for in the returned HTML page. You can do that with the methods in class String.
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am going to move this to the Sockets and Internet Protocol forum. This forum is for Servlets questions directly. Since you aren't creating a Servlet, that is why I am moving it.

Mark
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look into URL as mentioned above or maybe HttpURLConnection or even the Apache HTTPClient package to pull HTML from web sites.

If you need to get complex stuff out of the HTML - more than just the contents of a tag or two with simple string manipulation - look into HTML parsers. I use the Quiotix parser described and linked HERE.
 
Matt Chambers
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello all, longtime reader with a quick question:

I'm hoping to extract data from a website (namely ncbi) and analyze data after several queries. Has anyone compared the performance of apache's httpclient to say, HTTPUrlConnection?

I'll try to do the same, but I wonder if there are any server/platform vagaries.

Matt
[ October 06, 2005: Message edited by: Matt Chambers ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your program will almost certainly be faster than the network so you'll spend more time waiting for data over the wire than waiting for your instructions to run. I'd guess HttpClient is "fast enough".
 
Matt Chambers
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Stan! (sorry for the late response).
 
Shamil Shah
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey.. i want to read html from a page where I need to login first.

Is there any way I can login programatically and then read html?

Example: I want to read html from mail.yahoo.com .. right now i get html code for login page.. but i want to read the html after login.

Please help.
Regards,
Shamil
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd use a library like jWebUnit for programmatic access to web sites. Makes it much easier to deal with the HTML, and it supports Basic and Form Authentication.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic