aspose file tools*
The moose likes XML and Related Technologies and the fly likes Missing characters with SAX Parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Missing characters with SAX Parser" Watch "Missing characters with SAX Parser" New topic
Author

Missing characters with SAX Parser

Mahesh Mamani
Ranch Hand

Joined: Jun 25, 2001
Posts: 110
I am using JAVA SAX parser to parse a XML file and write the contents to a text file.My program is working fine and contents are retrieved correctly when the XML is small. But when the XML file size is large(around 1.5 mb) few bytes are missing / split in contents randomly.


public void characters(char[] char, int start, int length) throws SAXException {
String str = new String(char, start, length);
}

How can I over come this issue?

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42929
    
  68
From the javadocs of ContentHandler.characters:
SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks;

Is that what you're talking about? If so, then your code needs to account for this possibility.
Mahesh Mamani
Ranch Hand

Joined: Jun 25, 2001
Posts: 110
You mean we need to make changes to the characters function ???

Can you please help...It's very urgent.

Thanks in advance,
Mahesh
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42929
    
  68
Yes. The one you have apparently assumes that all text data is returned in a single chunk (I can't tell for sure since you didn't post the full code of the method). You need to change it so that it still works if the text is returned in several chunks. One way to do that would be to append the text to a StringBuffer (which would be field in your handler class). Once the end element of whatever tag surrounds this text is reached, you can handle the text itself.
Mahesh Mamani
Ranch Hand

Joined: Jun 25, 2001
Posts: 110
Below is the sample code which is being referred to...Actual code is based on this code itself....

We did a System.out.println in the characters function and there itself it prints out partial characters...

Hope this info is enough

Thanks again,

Mahesh

a) Create a Sax Parser and parse the xml

private void parseDocument() {

//get a factory
SAXParserFactory spf = SAXParserFactory.newInstance();
try {

//get a new instance of parser
SAXParser sp = spf.newSAXParser();

//parse the file and also register this class for call backs
sp.parse("employees.xml", this);

}catch(SAXException se) {
se.printStackTrace();
}catch(ParserConfigurationException pce) {
pce.printStackTrace();
}catch (IOException ie) {
ie.printStackTrace();
}
}




b) In the event handlers create the Employee object and call the corresponding setter methods.


//Event Handlers
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
//reset
tempVal = "";
if(qName.equalsIgnoreCase("Employee")) {
//create a new instance of employee
tempEmp = new Employee();
tempEmp.setType(attributes.getValue("type"));
}
}


public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch,start,length);
}

public void endElement(String uri, String localName,
String qName) throws SAXException {

if(qName.equalsIgnoreCase("Employee")) {
//add it to the list
myEmpls.add(tempEmp);

}else if (qName.equalsIgnoreCase("Name")) {
tempEmp.setName(tempVal);
}else if (qName.equalsIgnoreCase("Id")) {
tempEmp.setId(Integer.parseInt(tempVal));
}else if (qName.equalsIgnoreCase("Age")) {
tempEmp.setAge(Integer.parseInt(tempVal));
}

}

c) Iterating and printing.


private void printData(){

System.out.println("No of Employees '" + myEmpls.size() + "'.");

Iterator it = myEmpls.iterator();
while(it.hasNext()) {
System.out.println(it.next().toString());
}
}



Employee.xml
<?xml version="1.0" encoding="UTF-8"?>
<Personnel>
<Employee type="permanent">
<Name>Seagull</Name>
<Id>3674</Id>
<Age>34</Age>
</Employee>
<Employee type="contract">
<Name>Robin</Name>
<Id>3675</Id>
<Age>25</Age>
</Employee>
<Employee type="permanent">
<Name>Crow</Name>
<Id>3676</Id>
<Age>28</Age>
</Employee>
</Personnel>
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42929
    
  68
Yes, this code does exactly what I suspected; you'll need to fix it in the way I outlined.
Mahesh Mamani
Ranch Hand

Joined: Jun 25, 2001
Posts: 110
Hi Ulf Dittmer,

Thanks a lot for your help and suggestion provided. Could resolve using the StringBuffer in the characters function.

Thanks again.

Mahesh
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Missing characters with SAX Parser