File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Scanning HTML page for HREF AND IMG tags Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Head First Android this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Scanning HTML page for HREF AND IMG tags" Watch "Scanning HTML page for HREF AND IMG tags" New topic

Scanning HTML page for HREF AND IMG tags

Rohan Amin

Joined: Mar 19, 2008
Posts: 15
Does any body how i could scan an entire html page and look for href and img tags and change some text in the tags.
Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 63053

In a servlet? How would the servlet get the HTML page?

In any case, once you have the HTML source in a string, you'll either need to parse the HTML or (easier) scan the text using regular expressions. As this is not a Servlet issue, it's been moved off the the general forum.
[ April 24, 2008: Message edited by: Bear Bibeault ]

[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Gregg Bolinger
GenRocket Founder
Ranch Hand

Joined: Jul 11, 2001
Posts: 15302

Actual Possible Question #1 - I am developing a web site and I don't know how to use the Find command of my IDE to change some href and image tags.

Actual Possible Question #2 - I am trying to "borrow" work that someone else has already done and make it appear as my own by changing some text in the links and images.

Actual Possible Question #3 - I am trying to scrape porn sites for images. How do I do that?

I crack myself up.

GenRocket - Experts at Building Test Data
Rob Spoor

Joined: Oct 27, 2005
Posts: 20049

Check out javax.swing.text.html.ParserDelegator. This will find a parser for you, so let it do the dirty work.

Here's some example code:

How To Ask Questions How To Answer Questions
jQuery in Action, 3rd edition
subject: Scanning HTML page for HREF AND IMG tags
It's not a secret anymore!