| Author |
Resolving relative URIs for Canonicalization
|
Mike Halford
Ranch Hand
Joined: Jul 13, 2005
Posts: 31
|
|
Howdy, Looking at this (from w3.org stuff) The difficulties arise due to the loss of the following information not available in the data model: 1. base URI, especially in content derived from the replacement text of external general parsed entity references 2. notations and external unparsed entity references 3. attribute types in the document type declaration In the first case, note that a document containing a relative URI [URI] is only operational when accessed from a specific URI that provides the proper base URI. In addition, if the document contains external general parsed entity references to content containing relative URIs, then the relative URIs will not be operational in the canonical form, which replaces the entity reference with internal content (thereby implicitly changing the default base URI of that content). Both of these problems can typically be solved by adding support for the xml:base attribute [XBase] to the application, then adding appropriate xml:base attributes to document element and all top-level elements in external entities. In addition, applications often have an opportunity to resolve relative URIs prior to the need for a canonical form. For example, in a digital signature application, a document is often retrieved and processed prior to signature generation. The processing SHOULD create a new document in which relative URIs have been converted to absolute URIs, thereby mitigating any security risk for the new document. Given that we have the following code flow: <code> SOAPMessageContext smc; //passed value SOAPMessage msg = smc.GetMessage(); Canonicalizer c14n = Canonicalizer.getInstance(ALGO_ID_C14N_EXCL_OMIT_COMMENTS); byte [] body = c14n.canonicalizeSubtree(msg.getBody()); </code> This gives us the ol' 'relative URI being used in the namespace' chestnut. <ns urLine xmlns:ns="www.ourco.com/release/v4.5" ...> more stuff here </ns urLine> The message is Node-Set contains relative namespace URI ns=www.ourco.com/release/v4.5 The code works ok when the namespace is changed to <ns urLine xmlns:ns="http://www.ourco.com/release/v4.5" ...> What's the best way of solving the relative uri problem, given that we can't change the passed doc ?
|
 |
Peer Reynders
Bartender
Joined: Aug 19, 2005
Posts: 2906
|
|
If I understand your problem correctly, you are making the assumption that the namespace URI is in fact a URL (for validation?). A Universal Resource Identifier as used for namespace disambiguation does not have to be a Universal Resource Locator. Ronald Bourret: XML Namespaces FAQ
14.5) What does the URI reference used as an XML namespace name point to? The URI reference used as an XML namespace name is simply an identifier. It is not guaranteed to point to anything and, in general, it is a bad idea to assume that it does. This point causes a lot of confusion, so we'll repeat it here: URI REFERENCES USED AS XML NAMESPACE NAMES ARE JUST IDENTIFIERS. THEY ARE NOT GUARANTEED TO POINT TO ANYTHING. While this might be confusing when URLs are used as namespace names, it is obvious when other types of URI references are used as namespace names.
James Clark: XML Namespaces While it makes sense to find a canonical form of a URI that is a URL, there is no canonical form for a URI that isn't a URL. For validation you usually have to provide the schema file locations (one per namespace URI) separately.
|
"Don't succumb to the false authority of a tool or model. There is no substitute for thinking."
Andy Hunt, Pragmatic Thinking & Learning: Refactor Your Wetware p.41
|
 |
Mike Halford
Ranch Hand
Joined: Jul 13, 2005
Posts: 31
|
|
Thanks, you've made a couple of assumptions about what I've assumed though. The problem we have is referred to on this page canonicalization. This is the relevant quote
The difficulties arise due to the loss of the following information not available in the data model : 1. base URI, especially in content derived from the replacement text of external general parsed entity references
Further on
The processing SHOULD create a new document in which relative URIs have been converted to absolute URIs, thereby mitigating any security risk for the new
I am aware of that
URI REFERENCES USED AS XML NAMESPACE NAMES ARE JUST IDENTIFIERS. THEY ARE NOT GUARANTEED TO POINT TO ANYTHING.
We have a relative URI in the doc that we are attempting to canonicalize, that is more or less cast in stone. I was asking for suggestions on the most efficient way to work round the problem. The option suggested on w3, ie creating a new doc does seem to be a bit of a performance hit.
|
 |
Peer Reynders
Bartender
Joined: Aug 19, 2005
Posts: 2906
|
|
Originally posted by Mike Halford: We have a relative URI in the doc that we are attempting to canonicalize.
Ronald Bourret: XML Namespaces FAQ
14.4) Can I use a relative URI reference as a namespace name? In version 1.1, the answer is no.
That relative URI may very well invalidate the entire document; so it wouldn't possible to obtain canonical XML from it.
Originally posted by Mike Halford: The message is Node-Set contains relative namespace URI ns=www.ourco.com/release/v4.5 The code works ok when the namespace is changed to <ns  urLine xmlns:ns="http://www.ourco.com/release/v4.5" ...>
What does "the code works ok" mean? What failure occurs when the URI scheme name is omitted? The scheme name can also be interpreted as a transport protocol for resource access. So when "http://" is added the "code may start to work" because: The addition of the scheme name makes the XML document valid so that it can be processed properly. Something is interpreting that URI as an URL and tries to access it (when it shouldn't). [ October 03, 2007: Message edited by: Peer Reynders ]
|
 |
Mike Halford
Ranch Hand
Joined: Jul 13, 2005
Posts: 31
|
|
Thanks for that link, that's useful. Maybe the stone is going to have to crumble afterall It would seem that they are going to have to change it to a urn. Regarding
What does "the code works ok" mean? What failure occurs when the URI scheme name is omitted? The scheme name can also be interpreted as a transport protocol for resource access. So when "http://" is added the "code may start to work" because: * The addition of the scheme name makes the XML document valid so that it can be processed properly. * Something is interpreting that URI as an URL and tries to access it (when it shouldn't).
What I meant was that that with the addition of the
http://
the canicalization works.
|
 |
 |
|
|
subject: Resolving relative URIs for Canonicalization
|
|
|