aspose file tools*
The moose likes XML and Related Technologies and the fly likes How to put clob in org.w3c.dom.Document Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "How to put clob in org.w3c.dom.Document" Watch "How to put clob in org.w3c.dom.Document" New topic
Author

How to put clob in org.w3c.dom.Document

Ian Chen
Greenhorn

Joined: Nov 29, 2005
Posts: 4
Does anyone know how to put clobs in org.w3c.dom.Document?
According to the org.w3c.dom.Document API, the createTextNode method only takes String as input parameter.
Thanks!
Yi
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42908
    
  68
Do you mean Clob as in java.sql.Clob? That is really just a string, so there should be no problems. In any event, the "C" in clob is for "character", and what consists of characters in Java can be represented by strings.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18986
    
    8

The createTextNode method only allows Strings because XML is a text-based format. Everything in XML is text. So if you want to put Java objects into XML you must convert them to text somehow.

As Ulf says, it's easy to convert a Clob into text.
Ian Chen
Greenhorn

Joined: Nov 29, 2005
Posts: 4
Thanks for your responses. I converted my clobs to strings and put them in the document instance and everything worked.
My only concern is that strings may not be large enough to hold all the possible clobs that my users want to upload. Are you guys aware of any upper limit in the string size?
Ian
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18986
    
    8

The maximum theoretical length of a String is Integer.MAX_VALUE, which is about 2 billion characters. The maximum practical length of a String is limited by the actual memory available.
Ian Chen
Greenhorn

Joined: Nov 29, 2005
Posts: 4
Great! Thanks again for your replies.
Ian
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
I hope you realize that any number of characters and character sequences could cause an XML disaster when inserted as a Text node. Things are only slightly better when inserted as CDATA.
Examples:
1. & < and > have special XML meaning
2. some characters such as MS Word "smart punctuation" are illegal Unicode
3. some control characters such as ctrl-z are illegal Unicode
Bill
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18986
    
    8

1. & < and > have special XML meaning
2. some characters such as MS Word "smart punctuation" are illegal Unicode
3. some control characters such as ctrl-z are illegal Unicode
1. The org.w3c.dom.Document and other standard XML classes take care of this. The users of the classes don't have to concern themselves with escaping ampersands and less-thans.

2. Not true. You'll find those characters between U+2018 and U+201F. If you have them in your data correctly then the standard XML classes will handle them correctly. What is true is that people often paste those characters into text files without regard to the proper encoding of those files, or hand-generate XML which doesn't declare its encoding properly.

3. This one is true. I don't know what happens if you pass a String containing one of those characters into a DOM text node.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
The org.w3c.dom.Document and other standard XML classes take care of this. The users of the classes don't have to concern themselves with escaping ampersands and less-thans.

I suspect you have not actually tried to handle all the bizarre situations one can get into when trying to handle user text input.
Bill
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18986
    
    8

Originally posted by William Brogden:

I suspect you have not actually tried to handle all the bizarre situations one can get into when trying to handle user text input.
Bill
No, I haven't. I know there can be problems with non-ASCII characters if input is coming from a browser where the encoding isn't handled properly by the browser and/or the server, for example, but I still say the XML serializer will handle conversion of ampersands to the escaped form. The programmer doesn't have to escape them before putting them into a DOM text node.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12835
    
    5
My view of the & problem is probably colored by my experience dealing with a client's XML design in which a CDATA section contained more XML formatted data. I had to write a browser based editor and converting between stored text and displayed text was a frustrating job.

Where I hit the "smart punctuation" problem was both cut-and-paste and files resulting from "output as text" from MS Word.

The only legal control characters are tab, carriage return, line feed. You may hit a <ctrl>z in text generated by an older application that uses it as an end of file marker. Really old word processor formats used other control characters. XML parsers will throw an exception when hitting one of those characters on input but I don't know if output writers would turn them into something legal.
Bill
Bill
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to put clob in org.w3c.dom.Document