The moose likes XML and Related Technologies and the fly likes SAX parser issue,character call back method being called twice Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login


JavaRanch » Java Forums » Engineering » XML and Related Technologies
Reply Bookmark "SAX parser issue,character call back method being called twice" Watch "SAX parser issue,character call back method being called twice" New topic
Author

SAX parser issue,character call back method being called twice

Jhakda Velu
Ranch Hand

Joined: Feb 26, 2008
Posts: 158
Hi All
I wrote a simple SAX parser to parse my document having the following format
<data>
<element>
<record>
</record>
<record>
</record>
</element>
</data>

The data in between the record tags are unicode characters(of chinese).
However, i face problem at times as the character() call back method is called twice at times.Its totally random,can't predict.
So if my unicode data is 1234 4567 1234, it at times reads it as
1234 4 and then as 567 1234
so when i convert my unicode back to string, i get special characters.
I've checked the XML before sending, its proper and well formatted.
The converted unicode is added to an arraylist.
Thankful if someone could throw some light.

In the mean time, I've added 2 int variables.I increment one of them when the start element method is called and other when the Character method is called. I check if both are equal before converting the unicode to string, if not, i remove the last added element in the arraylist and concat it to teh current one.This has solved my problem, but want to know the reason for the improper behaviour.

Jhakda Velu


If I become filthy rich, I'll sponsor research for painless dental treatment at Harvard Medical School. Thats why,I'm learning Java.I have 32 teeth, 22 are man made.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 11671
character() call back method is called twice at times.


The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time.

It is up to the programmer to assemble the text properly.

Bill


Java Resources at www.wbrogden.com
Jhakda Velu
Ranch Hand

Joined: Feb 26, 2008
Posts: 158
Hi
Thanks a lot for the reply. It has cleared my misconception. Any better way of going about the issue than the one I mentioned is welcome. I'm adding the part of code having my logic.
Thanks a lot.
Jhakda



int iStartCallCounter=0,iCharCallCounter=0;
private String value="";
private String oldValue="";// Class level variables

public void startElement(String uri, String x, String qName, Attributes attributes)
//additional code
iStartCallCounter++;
//additional code
public void characters(char[] ch, int start, int length)
//additional code
iCharCallCounter++;
if(iStartCallCounter!=iCharCallCounter){
value=new String(ch, start, length);
value=oldValue.concat(value);
oldValue=value;
iStartCallCounter=0;
iCharCallCounter=0;
}
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 11671
I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method.

When endElement occurs I use toString to get the assembled characters and then work on the logic. It appears you are trying to do logic inside the characters() method - there is no reason to do that, wait for endElement to do your logic.
Jhakda Velu
Ranch Hand

Joined: Feb 26, 2008
Posts: 158
Hi
Thats a really cool way to do it. Thanks a ton!
So in the chahracters method, i keep on appendding the values got to the stringbuffer
once the end element is hit, i do the processing and at the end re-initialize the buffer to empty string,right?
Actually i was fixated with the impression that the characters method is called once only for every call to the startElement.


Jhakda
Anand Gondhiya
Ranch Hand

Joined: Feb 24, 2004
Posts: 155
Thanks everyone. code below worked for me.





Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 32418
Anand Gondhiya wrote:value = value + new String(ch, start, length).trim();

The call to trim is dangerous. What if you have an element that contains "Anand Gondhiya", and the parser decides to break it up before or after the space character? Then you'd be left with "AnandGondhiya" - not what you wanted.


Android appsImageJ pluginsJava web charts
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 32418
Since I just addressed that very same question today as well, I figure this is a FAQ. So I took the liberty of adding William's explanation to the XML FAQ: http://faq.javaranch.com/java/XmlFaq
 
 
subject: SAX parser issue,character call back method being called twice
 
developer file tools

cast iron skillet 49er

more from paul wheaton's glorious empire of web junk: cast iron skillet diatomaceous earth rocket mass heater sepp holzer raised garden beds raising chickens lawn care CFL flea control missoula heat permaculture