This week's book giveaway is in the Big Data forum.
We're giving away four copies of Elasticsearch in Action and have Radu Gheorghe & Matthew Lee Hinman on-line!
See this thread for details.
The moose likes Java in General and the fly likes Scanning HTML page for HREF AND IMG tags Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Elasticsearch in Action this week in the Big Data forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Scanning HTML page for HREF AND IMG tags" Watch "Scanning HTML page for HREF AND IMG tags" New topic

Scanning HTML page for HREF AND IMG tags

Rohan Amin

Joined: Mar 19, 2008
Posts: 15
Does any body how i could scan an entire html page and look for href and img tags and change some text in the tags.
Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 62139

In a servlet? How would the servlet get the HTML page?

In any case, once you have the HTML source in a string, you'll either need to parse the HTML or (easier) scan the text using regular expressions. As this is not a Servlet issue, it's been moved off the the general forum.
[ April 24, 2008: Message edited by: Bear Bibeault ]

[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Gregg Bolinger
GenRocket Founder
Ranch Hand

Joined: Jul 11, 2001
Posts: 15302

Actual Possible Question #1 - I am developing a web site and I don't know how to use the Find command of my IDE to change some href and image tags.

Actual Possible Question #2 - I am trying to "borrow" work that someone else has already done and make it appear as my own by changing some text in the links and images.

Actual Possible Question #3 - I am trying to scrape porn sites for images. How do I do that?

I crack myself up.

GenRocket - Experts at Building Test Data
Rob Spoor

Joined: Oct 27, 2005
Posts: 19911

Check out javax.swing.text.html.ParserDelegator. This will find a parser for you, so let it do the dirty work.

Here's some example code:

How To Ask Questions How To Answer Questions
subject: Scanning HTML page for HREF AND IMG tags