aspose file tools*
The moose likes I/O and Streams and the fly likes Need suggestions to read the value of some HTML tags from its source Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Need suggestions to read the value of some HTML tags from its source" Watch "Need suggestions to read the value of some HTML tags from its source" New topic
Author

Need suggestions to read the value of some HTML tags from its source

Esmaeil Ashrafi
Ranch Hand

Joined: Feb 22, 2010
Posts: 73
Hi,
I'm writing a simple application that should read the source of a HTML file and change the value of some tags.
For instance:
change the value http://foo.com in tag below

to the value http://blah.com

actually i want to do only this modification.

Some essential way could be opening a character stream and reading source untill encounter the <a href and then doing modification, but i think that's not a good approach.

I'm totally unfamiliar with HTMLs and currently searching in javax.swing.text.html package to see if there is better way to jump to the specified tag and do the modification...

So, any suggestion or direction will be gratifying...
Thanks in advance


I'm really tired of being engaged with stuff other than Java and programming

Wish to get back soon to my love...
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8996
    
    9

Esmaeil Ashrafi wrote:Some essential way could be opening a character stream and reading source untill encounter the <a href and then doing modification, but i think that's not a good approach.


That's probably the easiest option. Of course, you have to remember the standard issues when editing an existing file

The Java Swing API has an HTML parser, but it's strictly a one-way operation (i.e. it can't change and write a document like the XML DOM parser).


[How To Ask Questions On JavaRanch]
Somnath Mallick
Ranch Hand

Joined: Mar 04, 2009
Posts: 477
Best thing is create a copy of the original file and make changes there and then rename the file or copy the changes into the old file.
Esmaeil Ashrafi
Ranch Hand

Joined: Feb 22, 2010
Posts: 73
That's probably the easiest option. Of course, you have to remember the standard issues when editing an existing file

The Java Swing API has an HTML parser, but it's strictly a one-way operation (i.e. it can't change and write a document like the XML DOM parser).

I think so.
after writing the first post until now, i googled a lot and furthermore i read several API documentations most from javax.swing.text.html and today i became hopeless to find something useful to change any particular attribute value of any tag, on the fly.


Of course the are lots of features to parse and display (however seems very sophisticated in most cases for a tyro on HTMLs like me, except the calling back parser...), and also HTMLDocument class gives some kind of modifications (although I'm not sure this modifications are in source or not)

Best thing is create a copy of the original file and make changes there and then rename the file or copy the changes into the old file.

The matter is not copying, actually one goal of writing this application, for me, is to exercise the nio2 from jdk7...(so i will copy the whole file tree to another place first, then want to change the HTMLs)

The issue is: finding the best performance oriented way to do the changes
Substantially I'm looking for a way to keep me away having to read throughly contents of the HTML source file and just jump to the specified tag and attribute...
Peter Taucher
Ranch Hand

Joined: Nov 18, 2006
Posts: 174
I think a XSLT transformation would be the obvious tool to modify some contents of a HTML document. If you need to do it in Java maybe you could use an existing HTML parser (with JTidy you could work with a DOM tree, at least I've read something about it ; - )


Censorship is the younger of two shameful sisters, the older one bears the name inquisition.
-- Johann Nepomuk Nestroy
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Need suggestions to read the value of some HTML tags from its source