Win a copy of Soft Skills: The software developer's life manual this week in the Jobs Discussion forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Scanning HTML page for HREF AND IMG tags

 
Rohan Amin
Greenhorn
Posts: 15
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does any body how i could scan an entire html page and look for href and img tags and change some text in the tags.
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 64171
83
IntelliJ IDE Java jQuery Mac Mac OS X
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In a servlet? How would the servlet get the HTML page?

In any case, once you have the HTML source in a string, you'll either need to parse the HTML or (easier) scan the text using regular expressions. As this is not a Servlet issue, it's been moved off the the general forum.
[ April 24, 2008: Message edited by: Bear Bibeault ]
 
Gregg Bolinger
GenRocket Founder
Ranch Hand
Posts: 15302
6
Chrome IntelliJ IDE Mac OS X
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actual Possible Question #1 - I am developing a web site and I don't know how to use the Find command of my IDE to change some href and image tags.

Actual Possible Question #2 - I am trying to "borrow" work that someone else has already done and make it appear as my own by changing some text in the links and images.

Actual Possible Question #3 - I am trying to scrape porn sites for images. How do I do that?

I crack myself up.
 
Rob Spoor
Sheriff
Pie
Posts: 20368
43
Chrome Eclipse IDE Java Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Check out javax.swing.text.html.ParserDelegator. This will find a parser for you, so let it do the dirty work.

Here's some example code:
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic