aspose file tools*
The moose likes Java in General and the fly likes HTML to HTML conversion using Java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "HTML to HTML conversion using Java" Watch "HTML to HTML conversion using Java" New topic
Author

HTML to HTML conversion using Java

Rahul Divedi
Ranch Hand

Joined: Dec 11, 2011
Posts: 40
Hi,
I have to convert from one HTML to another HTML through a java program (or if there is any other efficient method). Both the HTML files differ little bit in structure. Can anyone please help me out in this matter?

Source HTML:



This is the desired HTML:



Thanking you!
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61433
    
  67

What problems are you having?


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Rahul Divedi
Ranch Hand

Joined: Dec 11, 2011
Posts: 40
Bear Bibeault wrote:What problems are you having?


Hi, Actually I had tried to convert it using XSLT but that didn't work. So, I want to convert it by using some java program but I don't know how to do that. So I need some idea regarding this, that how can I include and exclude some HTML elements through a Java program. Thanking you!
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61433
    
  67

So you had problems with the XSLT approach and you just abandoned it rather than seeking help on why it "didn't work"?
Rahul Divedi
Ranch Hand

Joined: Dec 11, 2011
Posts: 40
Bear Bibeault wrote:So you had problems with the XSLT approach and you just abandoned it rather than seeking help on why it "didn't work"?


No not abandoned abruptly. Actually I have two packages A and B (consisting of .xml,.html,.js,.xsd,.gif etc.) and my task it to convert from package B to a. I converted the XML files using XSLT successfully and then copied the necessary files from B to A using java program. Now third part is to maintain the structure of HTML files (including the javascripts which I copied from B and some other differences). For this I invested lots of time to do with XSLT but its not working out. I tried in XSLT forums also but no luck. So I want to do it through some java program but don't have idea that how to implement it. If you can then please help me out in this matter.
Thanking you!
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Rahul Divedi wrote:I have to convert from one HTML to another HTML through a java program (or if there is any other efficient method). Both the HTML files differ little bit in structure. Can anyone please help me out in this matter?

"Differ a little bit" doesn't really help much. How do they differ? Exactly.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Rahul Divedi
Ranch Hand

Joined: Dec 11, 2011
Posts: 40
Winston Gutkowski wrote:
Rahul Divedi wrote:I have to convert from one HTML to another HTML through a java program (or if there is any other efficient method). Both the HTML files differ little bit in structure. Can anyone please help me out in this matter?

"Differ a little bit" doesn't really help much. How do they differ? Exactly.

Winston


Hi, Thanks for the reply. Well I have posted the source HTML file and desired HTML files with my question. Please look at both the files for the complete differences. I have commented where do they differ. In the source HTML files I have two Javascripts common.js and ligot.js while in desired HTML file there are four javascripts common.js, APIWrapper.js, SCOFunctions.js and calculate.js. Apart from this there are some onClick events in the source one which are to be disabled and I need to add one submit button in the end.

Thanking you!
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Rahul Divedi wrote:Please look at both the files for the complete differences...

No, you tell us. Presumably this isn't the only piece of HTML that needs to be converted, so there must be some rules involved. What are they?

To be honest, if you list them out, it'll probably be a major step in working out a solution.

Winston
Rahul Divedi
Ranch Hand

Joined: Dec 11, 2011
Posts: 40
Winston Gutkowski wrote:
Rahul Divedi wrote:Please look at both the files for the complete differences...

No, you tell us. Presumably this isn't the only piece of HTML that needs to be converted, so there must be some rules involved. What are they?

To be honest, if you list them out, it'll probably be a major step in working out a solution.

Winston


Yes this is not the only piece of HTML, there are more files but all of them have same structure. These are the files containing some quiz questions, so in the sample code I have included only one question while there can be many. First difference between the source and desired HTML files is the 'Javascripts'. I have copied those Javascripts from package B to A, which were not present there and now those 3 Javascripts should be included in the desired HTML file and one javascript has to be excluded from the source file. This remains true for all the HTML files and not affected by the number of questions in the quiz.
Second difference is the desired HTML Body tag should be like <body onload="loadPage()" onunload="unloadPage()"> while in the source it is like <body>.
Third difference is there is a Form tag <form name="quizForm8" id="quizForm8" action="javascript:calcScore2();"> in the desired one, which is not there in the source (action event is important).
Fourth one is <input type="radio" name="option1b12" id="true1b12" onclick="getFeedback(0,2,'1b12','truefalse')"/> , here I need to eliminate this onclick event.
Last one is that there should be a submit button in the desired file. <input type="submit" name="submitB" value="SUBMIT ANSWERS"/></div></div></form>
These rules remain same for all the HTML files. I have read in one technical paper that it can be done using Java program but they haven't given any idea that how to do it.
The number of questions and the questions themselves remain the same in both source and desired HTML the only difference is that in source file when I click on True or False then it immediately tells me whether its correct or worng while in the desired one it calculates the score and tells me when click on submit button.
Thanking you!
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18669
    
    8

Well, fortunately your HTML is actually XHTML (at least the first couple of lines claim it is), so you should be able to use XSLT.

If your goal is a transformation which leaves the document basically the same except for a few modifications, then what you should do is to start with the identity transformation and add some templates which do those modifications. Your requirement to change the <body> element, for example: you need a template which handles a <body> element by writing out a revised <body> element and calling xsl:apply-template for its contents.

And don't forget that all of your elements are in a namespace. Don't make the mistake of treating them as if they aren't. Declare the "http://www.w3.org/1999/xhtml" namespace with a prefix and use that prefix in your XSLT.
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8008
    
  22

Rahul Divedi wrote:Yes this is not the only piece of HTML, there are more files but all of them have same structure. These are the files containing some quiz questions, so in the sample code I have included only one question while there can be many. First difference between the source and desired HTML files is the 'Javascripts'. I have copied those Javascripts from package B to A, which were not present there and now those 3 Javascripts should be included in the desired HTML file and one javascript has to be excluded from the source file. This remains true for all the HTML files and not affected by the number of questions in the quiz.
Second difference is the desired HTML Body tag should be like <body onload="loadPage()" onunload="unloadPage()"> while in the source it is like <body>.
Third difference is there is a Form tag <form name="quizForm8" id="quizForm8" action="javascript:calcScore2();"> in the desired one, which is not there in the source (action event is important).
Fourth one is <input type="radio" name="option1b12" id="true1b12" onclick="getFeedback(0,2,'1b12','truefalse')"/> , here I need to eliminate this onclick event.
Last one is that there should be a submit button in the desired file. <input type="submit" name="submitB" value="SUBMIT ANSWERS"/></div></div></form>
These rules remain same for all the HTML files.

Right. Now we have something to work with. Personally, I'd look at a solution based on a simple parser like Sax. That way you can look for tags by name and simply replace the attributes or contents as you need. The alternative (which I wouldn't recommend), is simply to treat the whole thing as a big String edit: read the file into a big String, replace whatever contents you need, and then write it out again.

The first will take a bit more work (including possible conversion to XHTML), but it's a lot more generic, so if you have any files with the same basic content but slightly different format (different tag order perhaps), the changes should still work. It's also good practise.

Winston
Rahul Divedi
Ranch Hand

Joined: Dec 11, 2011
Posts: 40
Paul Clapham wrote:Well, fortunately your HTML is actually XHTML (at least the first couple of lines claim it is), so you should be able to use XSLT.

If your goal is a transformation which leaves the document basically the same except for a few modifications, then what you should do is to start with the identity transformation and add some templates which do those modifications. Your requirement to change the <body> element, for example: you need a template which handles a <body> element by writing out a revised <body> element and calling xsl:apply-template for its contents.

And don't forget that all of your elements are in a namespace. Don't make the mistake of treating them as if they aren't. Declare the "http://www.w3.org/1999/xhtml" namespace with a prefix and use that prefix in your XSLT.


Thanks a lot for your suggestions. I'll try in the way as you have suggested. This really helps. :)
Rahul Divedi
Ranch Hand

Joined: Dec 11, 2011
Posts: 40
Winston Gutkowski wrote:
Right. Now we have something to work with. Personally, I'd look at a solution based on a simple parser like Sax. That way you can look for tags by name and simply replace the attributes or contents as you need. The alternative (which I wouldn't recommend), is simply to treat the whole thing as a big String edit: read the file into a big String, replace whatever contents you need, and then write it out again.

The first will take a bit more work (including possible conversion to XHTML), but it's a lot more generic, so if you have any files with the same basic content but slightly different format (different tag order perhaps), the changes should still work. It's also good practise.

Winston


Thanks a lot. I'll try it by using Sax and XSLT as per suggestions. If nothing works out then I'll go for the second option which you have mentioned.
Rahul Divedi
Ranch Hand

Joined: Dec 11, 2011
Posts: 40
Paul Clapham wrote:Well, fortunately your HTML is actually XHTML (at least the first couple of lines claim it is), so you should be able to use XSLT.

If your goal is a transformation which leaves the document basically the same except for a few modifications, then what you should do is to start with the identity transformation and add some templates which do those modifications. Your requirement to change the <body> element, for example: you need a template which handles a <body> element by writing out a revised <body> element and calling xsl:apply-template for its contents.

And don't forget that all of your elements are in a namespace. Don't make the mistake of treating them as if they aren't. Declare the "http://www.w3.org/1999/xhtml" namespace with a prefix and use that prefix in your XSLT.


Hi, I have tried in this way and I'm able to do one part i.e. to eliminate the 'onClick' event from the input tag but for rest of the things its not working out. Please look into my XSLT and suggest the modifications. I tried to add a script tag, input tag and modified body tag but that didn't work out.



One other way had tried by creating an identity.xsl file, which copies entire source file and then created a new XSLT in which I included identity file and then did the modifications but same thing happened.

Thanking you!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: HTML to HTML conversion using Java