This week's book giveaway is in the Performance forum.
We're giving away four copies of The Java Performance Companion and have Charlie Hunt, Monica Beckwith, Poonam Parhar, & Bengt Rutisson on-line!
See this thread for details.
Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Need suggestions to read the value of some HTML tags from its source

 
Esmaeil Ashrafi
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I'm writing a simple application that should read the source of a HTML file and change the value of some tags.
For instance:
change the value http://foo.com in tag below

to the value http://blah.com

actually i want to do only this modification.

Some essential way could be opening a character stream and reading source untill encounter the <a href and then doing modification, but i think that's not a good approach.

I'm totally unfamiliar with HTMLs and currently searching in javax.swing.text.html package to see if there is better way to jump to the specified tag and do the modification...

So, any suggestion or direction will be gratifying...
Thanks in advance
 
Joe Ess
Bartender
Posts: 9298
10
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Esmaeil Ashrafi wrote:Some essential way could be opening a character stream and reading source untill encounter the <a href and then doing modification, but i think that's not a good approach.


That's probably the easiest option. Of course, you have to remember the standard issues when editing an existing file

The Java Swing API has an HTML parser, but it's strictly a one-way operation (i.e. it can't change and write a document like the XML DOM parser).
 
Somnath Mallick
Ranch Hand
Posts: 483
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Best thing is create a copy of the original file and make changes there and then rename the file or copy the changes into the old file.
 
Esmaeil Ashrafi
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's probably the easiest option. Of course, you have to remember the standard issues when editing an existing file

The Java Swing API has an HTML parser, but it's strictly a one-way operation (i.e. it can't change and write a document like the XML DOM parser).

I think so.
after writing the first post until now, i googled a lot and furthermore i read several API documentations most from javax.swing.text.html and today i became hopeless to find something useful to change any particular attribute value of any tag, on the fly.


Of course the are lots of features to parse and display (however seems very sophisticated in most cases for a tyro on HTMLs like me, except the calling back parser...), and also HTMLDocument class gives some kind of modifications (although I'm not sure this modifications are in source or not)

Best thing is create a copy of the original file and make changes there and then rename the file or copy the changes into the old file.

The matter is not copying, actually one goal of writing this application, for me, is to exercise the nio2 from jdk7...(so i will copy the whole file tree to another place first, then want to change the HTMLs)

The issue is: finding the best performance oriented way to do the changes
Substantially I'm looking for a way to keep me away having to read throughly contents of the HTML source file and just jump to the specified tag and attribute...
 
Peter Taucher
Ranch Hand
Posts: 174
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think a XSLT transformation would be the obvious tool to modify some contents of a HTML document. If you need to do it in Java maybe you could use an existing HTML parser (with JTidy you could work with a DOM tree, at least I've read something about it ; - )
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic