Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Creating Document with Char Entities in JAXP

 
David Patterson
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to use JAXP to create an XML document that contains some character entities (such as ʃ [ampersand, pound, x, 0,2,8.3]) in the content of some elements.
What winds up in the output document is
ʃ (ampersand-entity, pound, x, 0,2,8,3).
Also, the document is generating an attribute of
encoding="UTF-8" on the initial statement.
So, a series of questions:
1) Would all these problems go away if I used UTF-16?
2) How do I specify the encoding to use?
3) Do I need to add each of these character entities separately? Right now they are part of a string (htmlKeyWord) that I add as
elemKeyWord.appendChild(
doc.createTextNode( htmlKeyWord ) );
That would mean adding text nodes for the non-entities and entity references (or something else) for the character entities.
Or, is there some other way to do this kind of a document?
David Patterson
patterd1@comcast.net
 
David Patterson
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was afraid that the ampersands would be eaten.
Using AMP for the ampersand character, what I put into the text content of a tag is
AMP-#-x-0-2-8-9-semicolon
What winds up in the output is
AMP-semicolon-#-x-0-2-8-9-semicolon
So, how do I put this content into a document?
Thanks
David Patterson
patterd1@comcast.net
 
David Patterson
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
RATS.
Let me try again.
I was afraid that the ampersands would be eaten.
Using AMP for the ampersand character, what I put into the text content of a tag is
AMP-#-x-0-2-8-9-semicolon
What winds up in the output is
AMP-a-m-p-semicolon-#-x-0-2-8-9-semicolon
So, how do I get the content I want?
David Patterson
patterd1@comcast.net
 
Roseanne Zhang
Ranch Hand
Posts: 1953
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you want to disply €, You must write this €
If you want to display €, You must write this €
etc. etc.
 
Roseanne Zhang
Ranch Hand
Posts: 1953
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do a quote on my above post, to find out my secret. Then You know the answer of your question.
 
David Patterson
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Roseanne,
I am entering a text element in my document that has:
ampersand-#-x-0-2-8-3-semicolon
This should be the International Phonetic Alpahabet symbol for a "esh" -- the sound of "sh". It should look like an integral sign, almost.
After entering this field (and many more) I serialize the document to a file. At the end of the process, what I get in the file is
ampersand-a-m-p-semicolon-#-x-0-2-8-3-semicolon
This will not be able to be read back in later and processed.
Does the fact that the process defaults to UTF-8 have anything to do with it? Would it help if I specified UTF-16? (And how can I specify that?)
Thanks.
 
Roseanne Zhang
Ranch Hand
Posts: 1953
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That means your input is escaped. That is a standard way to do something, you need to turn it off. However, I uses your entity with/without the hexidecimal
ʃ
ě
[ April 22, 2003: Message edited by: Roseanne Zhang ]
 
David Patterson
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is a character that is not in many fonts. You need a pretty complete Unicode font to be able to see it.
For a free font that has many of the odd unicode symbols, see
http://www.sil.org/~gaultney/gentium/
I realize that turning an ampersand into ampersand-amp; is an escaping mechanism. I want to turn it off so that the character entity is left intact in the XML file produced by my program. I want a file that if viewed by a capable browser will show the IPA symbols, and can be read by another Java program that will read
ampersand-#-x-whatever-semicolon
as was originally created.
How can I defeat the ampersand to ampersand-amp;
conversion?
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic