Two Laptop Bag*
The moose likes XML and Related Technologies and the fly likes why dom4j Node.getText() escaping html tags content from xml string in java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "why dom4j Node.getText() escaping html tags content from xml string in java" Watch "why dom4j Node.getText() escaping html tags content from xml string in java" New topic
Author

why dom4j Node.getText() escaping html tags content from xml string in java

Komari raj
Ranch Hand

Joined: Dec 12, 2008
Posts: 43
HI ALL
I am trying to retrieve the node content for TEXT node using dom4j Node.getText(), but it is escaping html bold tags including content as shown in below.Please help on this to get the content with html tags from xml for TEXT node of SECTION parent node.






output: html bold tag content got missed with tag also.


sectioncontent..terms1
textcontent..By signing below, I agree that Terms and Conditionson the reverse side of this page will apply to the service of theproduct identified above;;as loss of data may occur as a result of theservice, it is my responsibility to make a backup copy of my databefore bringing my product to XXX for service;
sectioncontent..Conditions2
textcontent..1 year warranty



Regartds
Raju
g tsuji
Ranch Hand

Joined: Jan 18, 2011
Posts: 544
    
    3
I am trying to retrieve the node content for TEXT node using dom4j Node.getText(), but it is escaping html bold tags including content as shown in below.Please help on this to get the content with html tags from xml for TEXT node of SECTION parent node.

You use of the word "escaping" is misleading. If I read this correctly coupled with what said :
output: html bold tag content got missed with tag also.

and
textcontent..By signing below, I agree that Terms and Conditionson the reverse side of this page will apply to the service of theproduct identified above;;as loss of data may occur as a result of theservice, it is my responsibility to make a backup copy of my databefore bringing my product to XXX for service;

You seem to mean you want the part between the literal "bold tag" to appear in the print out. This is not "escaping" in the usual sense.

To implement correctly what you seem to have in mind, in particular your use getText() method, the item1 string is in fact a what we called CDATA Section. You could do this.

And now, you would get the supposedly html fragment embedded in the TEXT node as item1, and the getText() will include the bold tags... This may or may not your intention. It is just my interpretation as the most reasonable reading of what you posted at various locations combined. If it is not, you can say again what you really mean to obtain.
Komari raj
Ranch Hand

Joined: Dec 12, 2008
Posts: 43
Thank you Suji for your quick reply..

Actually, my requirement is what ever string content below is coming from People Soft side and that tag (<TEXT></TEXT>) dont have "<![CDATA[" in that content.


Can you please correct me if i missed anything.


Regards
Raju
g tsuji
Ranch Hand

Joined: Jan 18, 2011
Posts: 544
    
    3
If the whole string to load up and to parse with is not constructed by your application but is fed to the application via some mechanism, the "bold" tag is part of the markup of the givens rather than what embedded in a payload of some kind. In that case, the getText() is not the proper method without recurring to deeper levels of the Element TEXT.

You have a utility class MyStringUtil with some static method getText(). Within the getText() of it, I would deduce it calls upon in its turn getText() method of org.dom4j.Element or org.dom4j.Node implementation of it. As MyStringUtil is not shown, I can only reason by deduction. What you should do is to make a similar method within which, instead of using getText() dom4j Node api, use getStringValue() method.

I can shown you directly using dom4j without making thing more obscur by hiding in some method doing not much more.
Komari raj
Ranch Hand

Joined: Dec 12, 2008
Posts: 43
HI Suji Thank you for your support..
I have validated with the below code also what you have suggested but no luck,it is giving total content except tags it missed the bold tags in that content as shown in below.



Output is:

textcontent..
By signing below, I agree that Terms and Conditionson the reverse side of this page will apply to the service of theproduct identified above;XXX is not responsible for any
loss, corruption or breach of the data on my product during service;as loss of data may occur as a result of theservice, it is my responsibility to make a backup copy of my databefore bringing my product to XXX for service;
g tsuji
Ranch Hand

Joined: Jan 18, 2011
Posts: 544
    
    3
I have validated with the below code also what you have suggested but no luck,it is giving total content except tags


That is exactly my code is intended to output without!

It becomes clear what you want to output. If that's what you want the output to include, every parser and object models people write would ever use getText() as the keyword to indicate the functionality... In any case, if you just want the answer to the title (only the "why" with getText()) you read the documentation on org.dom4j.Node and it contains the answer to it and I wouldn't bother, in principle. I'd wanted to show you "how" but it would need some effort on your part in filling the gap by reading the documentation whenever you see something you're not familiar with (such as getStringValue() above or .asXML() below etc.)

This is how.
Komari raj
Ranch Hand

Joined: Dec 12, 2008
Posts: 43
HI Suji,
Thank's a lot for your continue support and now it is working as expected with your suggested code. (regular expression string content is going to replace with empty string)



Regards
Raju
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: why dom4j Node.getText() escaping html tags content from xml string in java