SAX parser issue,character call back method being called twice
Jhakda Velu
Ranch Hand
Joined: Feb 26, 2008
Posts: 158
posted
0
Hi All I wrote a simple SAX parser to parse my document having the following format <data> <element> <record> </record> <record> </record> </element> </data>
The data in between the record tags are unicode characters(of chinese). However, i face problem at times as the character() call back method is called twice at times.Its totally random,can't predict. So if my unicode data is 1234 4567 1234, it at times reads it as 1234 4 and then as 567 1234 so when i convert my unicode back to string, i get special characters. I've checked the XML before sending, its proper and well formatted. The converted unicode is added to an arraylist. Thankful if someone could throw some light.
In the mean time, I've added 2 int variables.I increment one of them when the start element method is called and other when the Character method is called. I check if both are equal before converting the unicode to string, if not, i remove the last added element in the arraylist and concat it to teh current one.This has solved my problem, but want to know the reason for the improper behaviour.
Jhakda Velu
If I become filthy rich, I'll sponsor research for painless dental treatment at Harvard Medical School. Thats why,I'm learning Java.I have 32 teeth, 22 are man made.
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11862
posted
0
character() call back method is called twice at times.
The characters() method may be called any number of times within a single element because the SAX parser only handles one bufferload of input characters at a time.
It is up to the programmer to assemble the text properly.
Hi Thanks a lot for the reply. It has cleared my misconception. Any better way of going about the issue than the one I mentioned is welcome. I'm adding the part of code having my logic. Thanks a lot. Jhakda
int iStartCallCounter=0,iCharCallCounter=0; private String value=""; private String oldValue="";// Class level variables
public void startElement(String uri, String x, String qName, Attributes attributes) //additional code iStartCallCounter++; //additional code public void characters(char[] ch, int start, int length) //additional code iCharCallCounter++; if(iStartCallCounter!=iCharCallCounter){ value=new String(ch, start, length); value=oldValue.concat(value); oldValue=value; iStartCallCounter=0; iCharCallCounter=0; }
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11862
posted
0
I normally have a StringBuffer or StringBuilder reference that gets a new instance when the appropriate startElement() is hit and gets additions from each call to the characters() method.
When endElement occurs I use toString to get the assembled characters and then work on the logic. It appears you are trying to do logic inside the characters() method - there is no reason to do that, wait for endElement to do your logic.
Jhakda Velu
Ranch Hand
Joined: Feb 26, 2008
Posts: 158
posted
0
Hi Thats a really cool way to do it. Thanks a ton! So in the chahracters method, i keep on appendding the values got to the stringbuffer once the end element is hit, i do the processing and at the end re-initialize the buffer to empty string,right? Actually i was fixated with the impression that the characters method is called once only for every call to the startElement.
Jhakda
Anand Gondhiya
Ranch Hand
Joined: Feb 24, 2004
Posts: 155
posted
0
Thanks everyone. code below worked for me.
Ulf Dittmer
Marshal
Joined: Mar 22, 2005
Posts: 32765
posted
0
Anand Gondhiya wrote:value = value + new String(ch, start, length).trim();
The call to trim is dangerous. What if you have an element that contains "Anand Gondhiya", and the parser decides to break it up before or after the space character? Then you'd be left with "AnandGondhiya" - not what you wanted.