Alasdair Jones

Greenhorn
+ Follow
since Mar 04, 2008
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Alasdair Jones

If anyone is interested here is my code to retrieve whole XML docs from the input stream. I realise it's probably not the most efficient method, but I didn't have much luck with StreamTokenizer, and regular expressions:

Do you think I will get the same problem of a parsing error sending multiple documents to a pure SAX and StAX parser as I did with JDOM?
OK, 1st just to clarify what my project is:

Single socket connection between 2 applications. This will be opened at initialisation and will have to remain open for the duration of the data exchange.

The destination app which I am writing is the socket server.

The source app is the socket client. I have no control over how this sends messages, and can only receive data. Each message will be sent separately but the source app can't guarantee that these will be sent in a continuous stream and may be split into several segments, although these will be in the correct order. Also, the messages will be sent with no separator/header so the stream I receive could contain multiple messages and/or message segments.

Just to get going I've built a proto with a test socket client which sends a series of messages as a continuous stream. Using JDOM it unsurprisingly throws a parse exception when it reaches the start of the new message "<?xml..." not expecting another in what it believes is the same document. It did however, cope with parsing the data in segments. And of course, this way, I will not be able to get at the data that has been passed...

I'm going to try with SAX/StAX now and then the SequenceInputStream/ByteArrayInputStream...
Right, so if I understand correctly, it seems as though I'll have to do some low-level 'parsing' of my own to repackage the XML segments into whole documents, and only then can I send to an XML parser. In which case it really doesn't matter which API I use (streaming or DOM) because I'll have to wait until I've got the whole message before parsing anyway! Thanks for all the replies.
Thanks, I would if I could. Unfortunately I have no control over the data I will receive and as I said, it will come in segments. Therefore I will have to parse the data before I can determine the message boundaries.

It does sound like I need to use SAX or StAX, although I still don't know if they can cope with incomplete documents, or segments spanning documents. BTW these are APIs not parsers, so my question of which parser to use stands.

Ideas?
Could anyone recommend an API and Parser for reading streaming XML?

I have to process a series of fairly small XML messages which I receive via a socket over a receive-only network connection.

My app needs to be able to cope with messages coming pretty fast over the connection and I CANT afford to miss any of them. It also will have to cope with incomplete message segments, and segments that span multiple messages. I MUST also be able to process a complete message as soon as it arrives.

I have used the nice and easy JDOM with Xerces in the past, but I am not convinced that it will work here as I can't see anything that says if JDOM can cope with incomplete documents from the input stream, and I have read that Xerces will try to close the socket if data is delayed, and may also prematurely close the socket which I need to keep open. StAX and SAX sound like interesting options... Any recommendations or experience would be very welcome.

Thanks