Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Editing html tags in a *.htm file

 
Arindham Samanta
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had read a HTML file using through java.io. and i kept whole content of the html file in a string buffer. Now i want to replace the value of href="" in anchor tag ie <a href="">.
 
Mag Hoehme
Ranch Hand
Posts: 194
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Arindham,

I have my doubts whether StringBuffer is a suitable object type to accompish your task. As I understand it, you want to replace some URL in a href tag by another.

I would adopt the following strategy:

1. get the original String
2. open an empty StringBuffer
3. scanning the original String for href tags
4. copying the portions not to be modified to the StringBuffer as they are, and replace the portions to be modified by the new content.

Use two int variables, start and end, String methods such as indexOf, substring.

However, this approach can be further optimized towards a better performance. Most String methods, such as string.substring (), create new String objects for their return values. To avoid these unnecessary objects, you may choose to work on a char array, copying the chars one by one to a StringBuffer (or to a second char array). For more information on using char array to boost performance, use Google with "String" and "performance".

Hope this helps.
[ June 25, 2004: Message edited by: Mag Hoehme ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look into regular expressions, there is good doc on the class Pattern. You can match and replace complicated patterns with a line or three of code and only a few days of head-banging to create a good pattern. (mild exaggeration )

You might also look into HTML parsers. I use This One quite happily and there are many others around. It uses the visitor pattern to help you scan a parsed document for particular tags and attributes like href. You can modify the DOM and rewrite it to HTML, though it may not be quite character-for-character exactly like it was before.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic