I need to parse XML documents and convert them into specific record types based test file. I created this application using DOM parser. But I can process only 3 gigs of data an hour. My boss wants me to use SAX parser. I need to process gigs of data. Probably around 300 gigs a day. Each data file is inturn group of multiple small files(each with200 kb to 1mb). I split these files into small files and parse them using DOM and output to a text file. Do you think SAX is better than DOM. For SAX don't i need to split the file, can I just open the file and write it out to text file. --Thanks
posted 11 years ago
SAX parsers work on streams of events instead of reading the whole document into memory at once. That makes it perform better than DOM.
You could try to write a SAX handler (extend DefaultHandler or implement ContentHandler) which collects a single record (whatever that is) based on the events it receives from the SAX parser, writes that record into the output file, collects the next record based on events, writes that record, and so forth.