I should have explained what XMPP-like meant. XMPP uses a single top-level (level-0) <stream> element for the duration of the connection (a separate stream in each direction). Throughout the lifetime of the connection (which can last minutes, hours, days, or years), various level-1 elements are exchanged between the client and server. For example:
My original thinking was to let an XML parser do the job of spotting when a level-1 element was complete so I could then go ahead and process it. But alternatively I could just find the level-1 end tag myself and then feed the level-1 element to the XML parser as a stand-alone document for it to parse for me. Or, since I'm the only user of this protocol (I'm writing both the client and server) and I don't care if it's not pure XML, I can just slap a length in front of each level-1 element so the receiving side can trivially break the stream up into level-1 elements without having to look for end tags. Heck, I could just get rid of the <stream> element entirely since it's really just decorative...the level-1 elements are stand-alone entities and could just as well be promoted to level-0.
I say this is XMPP-like since I'm not actually implementing XMPP. If I were, I'd obviously keep the <stream> element and look for the level-1 end tags myself rather than introducing the "ugly" (but efficient) length field before each level-1 element.
If I've failed to make all of this nonsense clear, just ignore me.
I'm satisfied with my hybrid solution, however horrifying it might be to the purist.