This week's book giveaways are in the Java EE and JavaScript forums.
We're giving away four copies each of The Java EE 7 Tutorial Volume 1 or Volume 2(winners choice) and jQuery UI in Action and have the authors on-line!
See this thread and this one for details.
The moose likes XML and Related Technologies and the fly likes Need help on new JSR Proposal Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Need help on new JSR Proposal" Watch "Need help on new JSR Proposal" New topic
Author

Need help on new JSR Proposal

Michael Ernest
High Plains Drifter
Sheriff

Joined: Oct 25, 2000
Posts: 7292

A big problem that I'm having deals with reliable DTD messaging between systems where subclassing of objects is essential to the extensibility of a program design.
We're concerned in our current project with so-called long-term (sometimes called orthogonal) persistence. Think of this as a contrast to transitive persistence, which focusses on things like resolving endianness of data between systems. With long-term persistence, we have to resolve the issues raised by classes being altered without breaking their association with already-serialized data.
This becomes a tangled issue when a complex object is frequently subclassed to allow for variation and flexibility, but doesn't impact the integrity of the parent or generic DTD.
To handle this situation, initially we tried ways to 'snip out' bytecode that was relevant to the subclass only, and remove it from the datastream. Unfortunately this compromises type-safety on the client side, so we had to send the whole class object over, then snip from the deserialized copy. Given our pre-existing concern with the potential size of these objects, we chose simply to markup the actual bytecode in XML and transmit the binary contents.
Our subsequent design objective: receive the bytecode in native form, use the XML transform to 'snip' the offending subclass elements, then report to some logging utility which elements were cut -- just in case. We refer to this internally as a widowing service. To maintain the distinction between server- and client-side snipping, the formal term is local widowing service. And of course, since we've adopted the use of XML-transformed bytecode, you get the full project name: XML Bytecode Local Object Widowing Service.
What we're looking for are a few design reviewers that can trade their time and insights for the experience in working on a JSR that we are confident Sun will adopt quickly.
Please review the brief, admittedly choppy idea as presented above and tell us if you'd like to be one of people behind the move to ratify XML Bytecode Local Object Widowing Service.
Thanks,


Make visible what, without you, might perhaps never have been seen.
- Robert Bresson
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
Well, if you want to get response, you should probably formulate your design goals in more acceptable language. It's hard to understand what you mean.
Questions to clarify few points.
Why did you choose XML an as a long-term persistence mechanism in the first place? Maybe you would be better of with some kind of proprietary binary format? Then, what is the reason for keeping your meta-data in DTD format? XML Schema offers much better tools to work with hierarchical data!
This becomes a tangled issue when a complex object is frequently subclassed to allow for variation and flexibility,
Did you consider using XSLT instead of parsers? It is easier to write a generic XSL stylesheet that would allow for extensions and wont break on new elements.
I am not sure I understood the rest of your post
Ravi Veera
Ranch Hand

Joined: Jun 23, 2001
Posts: 127

I am not sure I understood the rest of your post

U will understand more if you look at your calendar.
Hey, this is a very nice one. especially liked the "endianness between systems"
Michael Ernest
High Plains Drifter
Sheriff

Joined: Oct 25, 2000
Posts: 7292

Ravi is right, submissions of this type are best done in the fourth quarter of Sun's fiscal year, which we just started. They close the year's business on the last day of June. So while the proposal may be rough, this is the best time to submit. As with all things XML, a year is way too long to wait on something like this, so we're going with what we have in hand.
We did not consciously choose DTD over Schema; this is possibly a technical point for debate in committee.
We did consciously not bother with XSLT, for a variety of reasons, not the least of which is that bytecode is already a portable format. We just needed to tag the data in transit, not transform it.
[ April 01, 2002: Message edited by: Michael Ernest ]
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
There is no question in my mind that XML Bytecode Local Object Widowing Service will be quickly adopted by Sun. I would imagine that Map and a few others may have a few objections but there objections should be simply ignored and XML BLOWS made a standard as soon as possible.


Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I'm sure this proposal will gain even more support if it's integrated with RFC-3252. And doubtless Parrot will be the scripting language of choice to work with these technologies.
[ April 02, 2002: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
RFC-3252: Binary Lexical Octet Ad-hoc Transport (BLOAT)! All great thinnkers think the same!
From my side I intuitively feel that Sean McGrath innovative approach to XML validation can be more than successfully applied to Michael's proposal.
[ April 02, 2002: Message edited by: Mapraputa Is ]
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
I have a lasting feeling that this thread has a lot of material for deep thinking. Let's take only one sentence:
the least of which is that bytecode is already a portable format
I've been thinking about these your words since yesterday.
is "protable" Booleanian property, "yes" or "no"? Can it be more complicated? Can we try to define what "portability" is and which degress, stages, classes, levels etc of portability are there?
To start, any format is "portable" if there is a bridge to the target platform, a software that can read a format and make sense of it because it simply "knows" this format. I would call it "level 0 portability" since there is nothing in the format itself that provides for potability.
Your "bytecode portability" in my understanding means what you call endianness resolved, since there are no multi-byte artifacts, only a stream of bytes. Anything else? If we add here encoding specification. We will have level 1 portability, a format that defines its symbol base.
XML documents are portable in the same sense + they are self-describing, anybody can see where one part of XML started and where it ends. All thanks to the fact that some symbols ( "<", ">" most notable) serve as meta-symbols, a feature absent in bytecode formats. Format that expose their structure I will define as "level 1 portability" compliant.
Thus, we can define portability via how much metainformation is included in format.
I would say that XML exhibits a higher-order of portability compared to simple bytecode.
The next level is achieved when format carries information helping to define its own meaning, so reading agents can not only "read" but "understand" it. This is what Tim Berners-Lee calls "Semantic Web".
The highest level of portability is when the format totally defines and includes its own meaning. Such a format self-sufficient and doesn't need any outside agent. Only in such format full portability finds its magnificent embodiment.
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
Originally posted by Mapraputa Is:
The highest level of portability is when the format totally defines and includes its own meaning. Such a format self-sufficient and doesn't need any outside agent. Only in such format full portability finds its magnificent embodiment.

But how low a level must you go for something to be totally self defining. A dictionary includes its own meaning and yet if you don't speak English, the dictionary would be useless. Would the message have to go down to the level of explaining the atoms that make up the very message?
Michael Ernest
High Plains Drifter
Sheriff

Joined: Oct 25, 2000
Posts: 7292

Originally posted by Mapraputa Is:

The highest level of portability is when the format totally defines and includes its own meaning. Such a format self-sufficient and doesn't need any outside agent. Only in such format full portability finds its magnificent embodiment.

Ok, so! time for questions:
1) How might a 'level 0' portability definition address ASCII, EBCDIC, or even Unicode, in that none of these are self-describing. Does XML provide for distinguishing among them?
2) Isn't one purpose of a standard/specification to reduce the metadata that would otherwise have to be described (another being to simply describe how some 'thing' meets some arbitrary technical criteria)? So if XML could be defined as 'no criteria at all,' other than what's needed to describe other criteria, is this a conscious or blind assumption that hiding metadata itself is an overtly obfuscatory and hostile act?
3) Ultimately, Thomas' point is inescapable. You cannot both describe and be one thing at one time. In computational terms, the most intuitive level to assert this point is at the assembler instruction library understood by any one CPU. When drawing lines/levels of portability, how do you reconcile XML with its underlying dependencies. If XML really does sneer at the prospect of 'understood-but-not-explained' metadata, how does it justify the systems on which it makes its claims to 'ultimate portability'?
4) Contention: 'portability' is the buzzword XML uses to hide its own true agenda, which is to litter the network with countless bytes of crap that we once were content to agree didn't need to be put out there. I'll bet we find more people at Cisco, Juniper, Seagate, and Fujitsu excited by XML's popularity more than anyone else.

:roll:
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
Originally posted by Thomas Paul:
But how low a level must you go for something to be totally self defining. Would the message have to go down to the level of explaining the atoms that make up the very message?

Any self-respecting portable message have to define all the levels, yes. To continue your dictionary analogy: one can be an expert in English Grammar i.e. know how sentences are made out of words but if s/he doesn't know words - the meaning is lost. And you cannot learn words if you do not have an idea what all these As Bs and Cs are!
Back to portable messages. To get rid of infamous endianess problem, all contemporary messages take form of a stream of bytes (or a stream of bits in W3C terminology). The next level of interpretation is defined by MIME type. Now, for text-based messages we need to map bites to symbols, i.e. our message should specify an encoding. There can be more levels before the symbolic representation of the message will be built, for example for XML documents escape sequences and entities should be resolved.
This level probably corresponds to "letters" in natural language analogy.
Next level, structural, is made out of xml tags. What is the meaning, the application decides for itself.
I was kidding about "self-sufficient" format, of course!
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
Originally posted by Michael Ernest:
1) How might a 'level 0' portability definition address ASCII, EBCDIC, or even Unicode, in that none of these are self-describing. Does XML provide for distinguishing among them?

"encoding" attribute in XML declaration:
<?xml version="1.0" encoding="ISO-8859-1"?>
UTF8/UTF-16 as default if none is specified.
2) Isn't one purpose of a standard/specification to reduce the metadata that would otherwise have to be described (another being to simply describe how some 'thing' meets some arbitrary technical criteria)?
Sure! And you know how fast it is to standardize something. :roll: XML in my understanding is a cheap tool for an arbitrary set of stakeholders to quickly agree on vocabulary and grammar for their messages, with a format to express all grammatical details and parser here, ready to use.
So if XML could be defined as 'no criteria at all,' other than what's needed to describe other criteria, is this a conscious or blind assumption that hiding metadata itself is an overtly obfuscatory and hostile act?
I feel that there is some confusion in your idea of XML. Either that, or there is confusion in my idea about your idea of XML
XML is similar to natural language in that it can be used to express all levels of meta-discourse. Instance document is XML, then its meta-data, Schema, is also XML. A language to transform data, XSLT is XML as well. And someone already written an XSLT stylesheet that reads Schema and output GUI in form of HTML files, which, of course, is also XML! But all this doesn't mean that you have to include Schema and XSLT to transform it in your each and every instance document! They simply happened to use the same syntax.
3) Ultimately, Thomas' point is inescapable. You cannot both describe and be one thing at one time.
Neither should you. Michael, it's entertaining to watch your attempts to set Tom and me against each other, but... hate to say it... your attempts are doomed. There is no way to arouse any serious disagreement out of two Hofstadter followers.
When drawing lines/levels of portability, how do you reconcile XML with its underlying dependencies.
By using XML-spec compliant parser.
If XML really does sneer at the prospect of 'understood-but-not-explained' metadata, how does it justify the systems on which it makes its claims to 'ultimate portability'?
Do you understand what you said here yourself?
Ok. Now serious conversation.
Knowledge about how to interpret a message can be distributed between:
1) message itself in some form of metadata (like encoding spec. in XML file)
2) reading agent, in form of "default", or "built-in" rules (default encoding, the fact that "<" symbol starts a new tag...)
3) some external file with extra specifications (DTD, Schema...)
Where to put most (or all) of this information, depends on "flexibility/redundancy" balance. One extreme would be not to provide any meta-info and to rely on a reading agent to interpret a message. Huge number of various proprietary formats belongs to this category. XML was an answer to desire to have something more flexible. Another extreme would be for a reading agent not to have any assumption and for a data to carry all the rules for lexical analysis/parsing with itself, in form of lex/yacc specification files, for example. XML chose a middle way, really. It fixed rules for lexical analysis, so they are built in parsers and instance document do not need to carry them around. Rules for parsing are expressed via tag enclosing and attributes-for-tags attachments. In this (and only in this) sense XML documents are self-describing. This is why XML is called a meta-language or "language to define languages". HTML is also capable of specifying its encoding, you know... :roll:
Part of meta-information (data types, for example) is external to instance documents and is located in Schema files.
Overall, this gives us even, balanced distribution of metainformation and great (although not unlimited) amount of flexibility.
Michael Ernest
High Plains Drifter
Sheriff

Joined: Oct 25, 2000
Posts: 7292


XML in my understanding is a cheap tool for an arbitrary set of stakeholders to quickly agree on vocabulary and grammar for their messages, with a format to express all grammatical details and parser here, ready to use.

Do they really qualify as messages? That seems lofty.

XML is similar to natural language in that it can be used to express all levels of meta-discourse.

Oh I get it, kind of like an Al Gore language, where every statement is then repeated until some version of the original statement the audience finds least offensive is deemed conclusive by the fact that no other revision follows it.

Neither should you. Michael, it's entertaining to watch your attempts to set Tom and me against each other, but... hate to say it... your attempts are doomed. There is no way to arouse any serious disagreement out of two Hofstadter followers.

It's only matter of time -- stay on guard, sweetie.

Where to put most (or all) of this information, depends on "flexibility/redundancy" balance. One extreme would be not to provide any meta-info and to rely on a reading agent to interpret a message. Huge number of various proprietary formats belongs to this category. XML was an answer to desire to have something more flexible. Another extreme would be for a reading agent not to have any assumption and for a data to carry all the rules for lexical analysis/parsing with itself, in form of lex/yacc specification files, for example. XML chose a middle way, really.

That's an intriguing (albeit revisionist) summation of where XML stands. I like it.
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
Originally posted by Mapraputa Is:
To get rid of infamous endianess problem, all contemporary messages take form of a stream of bytes (or a stream of bits in W3C terminology).

This was a very stupid statement of mine. Old good "one character - one byte" assumption. As soon as contemporary text-based messages are expressed in Unicode-compliant encodings, they use more than one byte and endianess problem gets a perfect chance to raise its ugly head.
To indicate the endianness, encodings use byte order mark (BOM), as this wonderful article informs.
So XML in its serialized form still carries all information needed for its intrpretation, but it is used by different layers of software. I bet, parsers do not deal with the endianness, because "the character-code conversion utilities" as the article calls them already took care about it.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Need help on new JSR Proposal