aspose file tools*
The moose likes XML and Related Technologies and the fly likes passing Invalid characters within XML tags Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "passing Invalid characters within XML tags" Watch "passing Invalid characters within XML tags" New topic
Author

passing Invalid characters within XML tags

Glen Tippetts
Greenhorn

Joined: Jun 14, 2001
Posts: 10
I have begun to incorporate XML export of bookmarks from my bookmark manager program.
The problem I have is that I need to place bookmark URLs between XML tags on export, but the URLs may contain invalid characters like the & character and others.
Can anyone tell me how to pass this information through a parser?
XML fragment:
<bookmarks>
<page name="XML">
<button number="23" title="Brainbench - Web Design Tests">
<url type="DNS">http://www.brainbench.com/xml/bb/common/testcenter/subcatresults.xml?cat1=9&cat2=31&cat3=22</url>
<ip autoupdate="">64.14.126.119</ip>
<notes>Web design tests</notes>
<icon path="C:\AcqURL\ORCOMwebicons\brainbench_com_xml_bb_common_testcenter_subcatresults_xml_cat1=9&cat2=31&cat3=22.ico" index="600" />
</button>
</page>
</bookmarks>
XSL:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<html>
<body>
<table border="0" cellpadding="2">
<xsl:for-each select="bookmarks/page">
<tr>
<th><xsl:value-of select="@name"/></th>
<xsl:for-each select="button">
<tr bgcolor="{color/@button}">
<td width="200">
<a href="{url}" title="{notes}">
<font color="{color/@text}">
<xsl:value-of select="@title"/>
</font>
</a>
</td>
</tr>
</xsl:for-each>
<td> </td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
DTD:
<!--DTD for validating bookmarks-->
<!ELEMENT bookmarks (page)+>
<!ELEMENT page (button)*>
<!ATTLIST page
name CDATA #REQUIRED
>
<!ELEMENT button (url, ip, notes, user_id, site_password, extra_info, icon, color, site_update)?>
<!ATTLIST button
number CDATA #REQUIRED
title CDATA #IMPLIED
>
<!ELEMENT url (#PCDATA)>
<!ATTLIST url
type (DNS | IP | Header | FTP | File | Dir) #REQUIRED
>
<!ELEMENT ip (#PCDATA)>
<!ATTLIST ip
autoupdate (0 | 1 | 2 | 4) "0"
>
<!ELEMENT notes (#PCDATA)>
<!ELEMENT user_id (#PCDATA)>
<!ELEMENT site_password (#PCDATA)>
<!ELEMENT extra_info (#PCDATA)>
<!ELEMENT icon EMPTY>
<!ATTLIST icon
path CDATA #IMPLIED
index CDATA #IMPLIED
>
<!ELEMENT color EMPTY>
<!ATTLIST color
button CDATA #IMPLIED
text CDATA #IMPLIED
>
<!ELEMENT site_update (#PCDATA)>
Any help would be appreciated.

------------------
AcqURL : The next evolution in bookmark management.


IBM Certified Developer - XML and Related Technologies, V1.<BR><A HREF="http://www.acqurl.com" TARGET=_blank rel="nofollow">AcqURL</A> : The next evolution in bookmark management.
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
You can use entity reference "&amp;" instead of "&".


Uncontrolled vocabularies
"I try my best to make *all* my posts nice, even when I feel upset" -- Philippe Maquet
Glen Tippetts
Greenhorn

Joined: Jun 14, 2001
Posts: 10
Thanks for the reply Map.
I probably needed to be more clear about the situation and the question however. I need some way to "escape" the text within the <url> tags because it will need to remain intact and exact for the resulting link to work.
When the the XML data is translated to HTML the data in the <url> tag is not displayed, but rather becomes the underlying hyper-link.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Map's answer is exactly what you need. When you parse the XML, the parser will convert the "&" back to "&" for you - the URL will be restored to its original form. E.g. the characters(char[]) method in org.xml.sax.helpers.DefaultHandler is passed an array of parsed characters. Likewise getData() in org.w3c.dom.CharacterData returns a String of parsed characters. The conversion is done for you.


"I'm not back." - Bill Harding, Twister
Glen Tippetts
Greenhorn

Joined: Jun 14, 2001
Posts: 10
Yes, I agree Map's answer is correct. Unfortunatly my question didn't explain the depth of the situation as to why I wanted the text within the tag to be passed directly through the parser.
The user may have thousands of bookmarks, and the & character may not be the only special character I have to deal with. Beyond the <url> tag, the <notes> tag may also contain any character the user desires. My program grabs URL and site description information from the browser which may even have different character encoding. The users can also modify the text in whatever way they choose.
Also, if I can pass the values directly through, I do not have to slow down the output of the program by testing each line of bookmark information before output to XML.
I tried wrapping the data within the tags in <![CDATA[var]]> tags but the parser still choked on an apostrophe and a registered trademark symbol within the CDATA tags.
I also don't think I can use disable-output-escaping with the { } that I need to use within the HREF=" " in the XSL.
------------------
IBM Certified Developer - XML and Related Technologies, V1.
AcqURL : The next evolution in bookmark management.
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
Glen, if you need to use "disable-output-escaping", you can use <xsl:attribute> construct instead of {}. Then your <a href="..."> tag will be coded as:
<a>
<xsl:attribute name="href">
<xsl:value-of select="url" disable-output-escaping="yes"/>
</xsl:attribute>
this is a link
</a>
but the code above works fine on CDATA sections without "disable-output-escaping", as long as the output method is set to html, either explicitly,
<xsl:output method="html"/>
or implicitly, if the first non-xsl element in your stylesheet is an HTML tag.
Does it help?

[This message has been edited by Mapraputa Is (edited July 03, 2001).]
Glen Tippetts
Greenhorn

Joined: Jun 14, 2001
Posts: 10
Map, thank you very much for your patience and advice.
I had used xsl:attribute when I was practicing for my test, but I didn't even think of it for this situation.
I assumed that <![CDATA[var]]> on the XML site was equivalent to using disable-output-escaping="yes" on the XSL side. Unfortunatly neither solved my problem.
It seems as though the parser (I have tested both XT and XMLSpy) is choking on characters it does not recognize, even when they are wrapped in the <![CDATA[ ]]> tag.
For example I have site where the URL is valid but the Description META tag (this is where I capture my <notes> from) contains invalid characters.
The URL is: http://itmanagement.earthweb.com/staff/carstrat/article/0,,12208_601321,00.html
The Description META tag contains:
Combine certification of your skills with a bit of experience and youll have a resume combination thats hard to beat.
When the parser sees the character it stops, even if the character is wrapped in the <![CDATA[ ]]> tag. If I don't wrap it in the <![CDATA[ ]]> tag, then the parser sees the XML a non-well-formed.
Any ideas on what I should try next?
------------------
IBM Certified Developer - XML and Related Technologies, V1.
AcqURL : The next evolution in bookmark management.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: passing Invalid characters within XML tags
 
Similar Threads
at org.apache.myfaces.tomahawk.application.jsp.JspTilesTwoViewHandlerImpl.getServletMapping
HTML content for a Markup
Characters after markup
XML jargon: declaration vs. definition
Why b.c.d is correct?