File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes XML and Related Technologies and the fly likes Creating Document with Char Entities in JAXP Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Creating Document with Char Entities in JAXP" Watch "Creating Document with Char Entities in JAXP" New topic
Author

Creating Document with Char Entities in JAXP

David Patterson
Ranch Hand

Joined: Jul 01, 2002
Posts: 65
I'm trying to use JAXP to create an XML document that contains some character entities (such as ʃ [ampersand, pound, x, 0,2,8.3]) in the content of some elements.
What winds up in the output document is
ʃ (ampersand-entity, pound, x, 0,2,8,3).
Also, the document is generating an attribute of
encoding="UTF-8" on the initial statement.
So, a series of questions:
1) Would all these problems go away if I used UTF-16?
2) How do I specify the encoding to use?
3) Do I need to add each of these character entities separately? Right now they are part of a string (htmlKeyWord) that I add as
elemKeyWord.appendChild(
doc.createTextNode( htmlKeyWord ) );
That would mean adding text nodes for the non-entities and entity references (or something else) for the character entities.
Or, is there some other way to do this kind of a document?
David Patterson
patterd1@comcast.net
David Patterson
Ranch Hand

Joined: Jul 01, 2002
Posts: 65
I was afraid that the ampersands would be eaten.
Using AMP for the ampersand character, what I put into the text content of a tag is
AMP-#-x-0-2-8-9-semicolon
What winds up in the output is
AMP-semicolon-#-x-0-2-8-9-semicolon
So, how do I put this content into a document?
Thanks
David Patterson
patterd1@comcast.net
David Patterson
Ranch Hand

Joined: Jul 01, 2002
Posts: 65
RATS.
Let me try again.
I was afraid that the ampersands would be eaten.
Using AMP for the ampersand character, what I put into the text content of a tag is
AMP-#-x-0-2-8-9-semicolon
What winds up in the output is
AMP-a-m-p-semicolon-#-x-0-2-8-9-semicolon
So, how do I get the content I want?
David Patterson
patterd1@comcast.net
Roseanne Zhang
Ranch Hand

Joined: Nov 14, 2000
Posts: 1953
If you want to disply €, You must write this €
If you want to display €, You must write this €
etc. etc.
Roseanne Zhang
Ranch Hand

Joined: Nov 14, 2000
Posts: 1953
Do a quote on my above post, to find out my secret. Then You know the answer of your question.
David Patterson
Ranch Hand

Joined: Jul 01, 2002
Posts: 65
Roseanne,
I am entering a text element in my document that has:
ampersand-#-x-0-2-8-3-semicolon
This should be the International Phonetic Alpahabet symbol for a "esh" -- the sound of "sh". It should look like an integral sign, almost.
After entering this field (and many more) I serialize the document to a file. At the end of the process, what I get in the file is
ampersand-a-m-p-semicolon-#-x-0-2-8-3-semicolon
This will not be able to be read back in later and processed.
Does the fact that the process defaults to UTF-8 have anything to do with it? Would it help if I specified UTF-16? (And how can I specify that?)
Thanks.
Roseanne Zhang
Ranch Hand

Joined: Nov 14, 2000
Posts: 1953
That means your input is escaped. That is a standard way to do something, you need to turn it off. However, I uses your entity with/without the hexidecimal
ʃ
ě
[ April 22, 2003: Message edited by: Roseanne Zhang ]
David Patterson
Ranch Hand

Joined: Jul 01, 2002
Posts: 65
It is a character that is not in many fonts. You need a pretty complete Unicode font to be able to see it.
For a free font that has many of the odd unicode symbols, see
http://www.sil.org/~gaultney/gentium/
I realize that turning an ampersand into ampersand-amp; is an escaping mechanism. I want to turn it off so that the character entity is left intact in the XML file produced by my program. I want a file that if viewed by a capable browser will show the IPA symbols, and can be read by another Java program that will read
ampersand-#-x-whatever-semicolon
as was originally created.
How can I defeat the ampersand to ampersand-amp;
conversion?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Creating Document with Char Entities in JAXP