Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

sax parser with cdata

 
Ranch Hand
Posts: 144
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am parsing the xml file shown below using sax parser:


<?xml version="1.0" encoding="UTF-8"?>
<prod12 xsi:noNamespaceSchemaLocation="CTXSYS.CTX_STOPWORDS.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">;

<CTX_STOPWORDS>
<SPW_OWNER><![CDATA[CTXSYS]]></SPW_OWNER>
<SPW_STOPLIST><![CDATA[DEFAULT_STOPLIST]]></SPW_STOPLIST>
<SPW_TYPE><![CDATA[STOP_WORD]]></SPW_TYPE>
<SPW_WORD><![CDATA[after]]></SPW_WORD>
</CTX_STOPWORDS>
<CTX_STOPWORDS>
<SPW_OWNER><![CDATA[CTXSYS]]></SPW_OWNER>
<SPW_STOPLIST><![CDATA[DEFAULT_STOPLIST]]></SPW_STOPLIST>
<SPW_TYPE><![CDATA[STOP_WORD]]></SPW_TYPE>
<SPW_WORD><![CDATA[all]]></SPW_WORD>
</CTX_STOPWORDS>
<CTX_STOPWORDS>
<SPW_OWNER><![CDATA[CTXSYS]]></SPW_OWNER>
<SPW_STOPLIST><![CDATA[DEFAULT_STOPLIST]]></SPW_STOPLIST>
<SPW_TYPE><![CDATA[STOP_WORD]]></SPW_TYPE>
<SPW_WORD><![CDATA[also]]></SPW_WORD>
</CTX_STOPWORDS>
<CTX_STOPWORDS>
<SPW_OWNER><![CDATA[CTXSYS]]></SPW_OWNER>
<SPW_STOPLIST><![CDATA[DEFAULT_STOPLIST]]></SPW_STOPLIST>
<SPW_TYPE><![CDATA[STOP_WORD]]></SPW_TYPE>
<SPW_WORD><![CDATA[an]]></SPW_WORD>
</CTX_STOPWORDS>
<CTX_STOPWORDS>
<SPW_OWNER><![CDATA[CTXSYS]]></SPW_OWNER>
<SPW_STOPLIST><![CDATA[DEFAULT_STOPLIST]]></SPW_STOPLIST>
<SPW_TYPE><![CDATA[STOP_WORD]]></SPW_TYPE>
<SPW_WORD><![CDATA[and]]></SPW_WORD>
</CTX_STOPWORDS>
<CTX_STOPWORDS>
<SPW_OWNER><![CDATA[CTXSYS]]></SPW_OWNER>
<SPW_STOPLIST><![CDATA[DEFAULT_STOPLIST]]></SPW_STOPLIST>
<SPW_TYPE><![CDATA[STOP_WORD]]></SPW_TYPE>
<SPW_WORD><![CDATA[any]]></SPW_WORD>
</CTX_STOPWORDS>
<CTX_STOPWORDS>
<SPW_OWNER><![CDATA[CTXSYS]]></SPW_OWNER>
<SPW_STOPLIST><![CDATA[DEFAULT_STOPLIST]]></SPW_STOPLIST>
<SPW_TYPE><![CDATA[STOP_WORD]]></SPW_TYPE>
<SPW_WORD><![CDATA[are]]></SPW_WORD>
</CTX_STOPWORDS>
<CTX_STOPWORDS>
<SPW_OWNER><![CDATA[CTXSYS]]></SPW_OWNER>
<SPW_STOPLIST><![CDATA[DEFAULT_STOPLIST]]></SPW_STOPLIST>
<SPW_TYPE><![CDATA[STOP_WORD]]></SPW_TYPE>
<SPW_WORD><![CDATA[as]]></SPW_WORD>
</CTX_STOPWORDS>

</prod12>


The bold section of the xml content is not getting parsed correctly. I see the below parsed output:

SPW_OWNER SPW_STOPLIST SPW_TYPE SPW_WORD SPW_LANGUAGE

CTXSYS DEFAULT_STOPLIST STOP_WORD after
CTXSYS DEFAULT_STOPLIST P_WORD all
CTXSYS DEFAULT_STOPLIST STOP_WORD also
CTXSYS DEFAULT_STOPLIST STOP_WORD an
CTXSYS DEFAULT_STOPLIST STOP_WORD and
CTXSYS DEFAULT_STOPLIST STOP_WORD any
CTXSYS DEFAULT_STOPLIST STOP_WORD are
CTXSYS DEFAULT_STOPLIST STOP_WORD as



Here one of the lines in the output is showing P_WORD instead of STOP_WORD which is present in the xml section.

Please can someone help me with some approach. I am using sax2.jar to parse the xml file.

Thanks & Regards















All the data from the xml files appears correctly in the output except one line.
 
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That output is produced by your parsing code, so without seeing that code it's impossible to say what might be happening.
 
Moieen Khatri
Ranch Hand
Posts: 144
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear Ulf,

The parser code is pasted below

Thanks & Regards

 
Ulf Dittmer
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I haven't looked at all the code, but one bug I noticed right away is in the characters method. It's the same one I pointed out to you in this topic.
 
Moieen Khatri
Ranch Hand
Posts: 144
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ulf,

You are correct, the string is broken up.How can I concatenate the strings incase the below characters method is called more than once for a particular tag.
Is there some standard approach to take care of this.Please could you provide some sample code if possible

Thanks

 
Ulf Dittmer
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The usual way would be to have a StringBuilder as a field in that class. Then you can append to it at each call of the characters method. You would handle its contents in the endElement method when that's called for "SPW_TYPE". Don't forget to clear the StringBuilder after that, so that it's ready for the next text content.
 
Moieen Khatri
Ranch Hand
Posts: 144
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear Ulf,

Sorry for the late reply from my side.Thanks for the suggestion to use the StringBuffer class.The XmlUtility works fine now and displays correct output.
However I am again facing the sax parser issue,now in the SearchUtility class which searches the rows inside the xml file displayed by the XmlUtility by the means of AND & OR logic. The code for the SearchUtility class is pasted below:





The searched output produced by the above class is inconsistent. For e.g if I search the xml file for rows which have DEK_ABR element's value = 1 ANDED with DAT_ABR element's value = 9710 it also searches some rows which have 9711.
The part of the searched result is pasted below:

DEK_ABR DAT_ABR
1 9710
1 9710
1 9710
1 9711

What is the reason for this behaviour? Can I make any modification to my searchutility class to take of this inconsistent behaviour?

Please advice

Many Thanks & Regards

Moieen
 
Ulf Dittmer
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're not seriously expecting anyone to wade through more than 300 lines of code, do you?

I'd start by adding a lot more logging statements to the code, or by stepping through the code in a debugger.
 
    Bookmark Topic Watch Topic
  • New Topic