aspose file tools*
The moose likes Web Services and the fly likes Resolving relative URIs for Canonicalization Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Web Services
Bookmark "Resolving relative URIs for Canonicalization" Watch "Resolving relative URIs for Canonicalization" New topic
Author

Resolving relative URIs for Canonicalization

Mike Halford
Ranch Hand

Joined: Jul 13, 2005
Posts: 31
Howdy,
Looking at this (from
w3.org stuff)
The difficulties arise due to the loss of the following information not available in the data model:
1. base URI, especially in content derived from the replacement text of external general parsed entity references
2. notations and external unparsed entity references
3. attribute types in the document type declaration

In the first case, note that a document containing a relative URI [URI] is only operational when accessed from a specific URI that provides the proper base URI. In addition, if the document contains external general parsed entity references to content containing relative URIs, then the relative URIs will not be operational in the canonical form, which replaces the entity reference with internal content (thereby implicitly changing the default base URI of that content). Both of these problems can typically be solved by adding support for the xml:base attribute [XBase] to the application, then adding appropriate xml:base attributes to document element and all top-level elements in external entities. In addition, applications often have an opportunity to resolve relative URIs prior to the need for a canonical form. For example, in a digital signature application, a document is often retrieved and processed prior to signature generation. The processing SHOULD create a new document in which relative URIs have been converted to absolute URIs, thereby mitigating any security risk for the new document.

Given that we have the following code flow:
<code>
SOAPMessageContext smc; //passed value

SOAPMessage msg = smc.GetMessage();

Canonicalizer c14n = Canonicalizer.getInstance(ALGO_ID_C14N_EXCL_OMIT_COMMENTS);

byte [] body = c14n.canonicalizeSubtree(msg.getBody());

</code>

This gives us the ol' 'relative URI being used in the namespace' chestnut.
<ns urLine xmlns:ns="www.ourco.com/release/v4.5" ...>
more stuff here
</ns urLine>

The message is Node-Set contains relative namespace URI ns=www.ourco.com/release/v4.5

The code works ok when the namespace is changed to <ns urLine xmlns:ns="http://www.ourco.com/release/v4.5" ...>

What's the best way of solving the relative uri problem, given that we can't change the passed doc ?
Peer Reynders
Bartender

Joined: Aug 19, 2005
Posts: 2922
    
    5
If I understand your problem correctly, you are making the assumption that the namespace URI is in fact a URL (for validation?).

A Universal Resource Identifier as used for namespace disambiguation does not have to be a Universal Resource Locator.

Ronald Bourret: XML Namespaces FAQ

14.5) What does the URI reference used as an XML namespace name point to?

The URI reference used as an XML namespace name is simply an identifier. It is not guaranteed to point to anything and, in general, it is a bad idea to assume that it does. This point causes a lot of confusion, so we'll repeat it here:

URI REFERENCES USED AS XML NAMESPACE NAMES ARE JUST IDENTIFIERS. THEY ARE NOT GUARANTEED TO POINT TO ANYTHING.

While this might be confusing when URLs are used as namespace names, it is obvious when other types of URI references are used as namespace names.

James Clark: XML Namespaces

While it makes sense to find a canonical form of a URI that is a URL, there is no canonical form for a URI that isn't a URL.

For validation you usually have to provide the schema file locations (one per namespace URI) separately.

Mike Halford
Ranch Hand

Joined: Jul 13, 2005
Posts: 31
Thanks, you've made a couple of assumptions about what I've assumed though.
The problem we have is referred to on this page canonicalization. This is the relevant quote
The difficulties arise due to the loss of the following information not available in the data model :
1. base URI, especially in content derived from the replacement text of external general parsed entity references


Further on
The processing SHOULD create a new document in which relative URIs have been converted to absolute URIs, thereby mitigating any security risk for the new


I am aware of that
URI REFERENCES USED AS XML NAMESPACE NAMES ARE JUST IDENTIFIERS. THEY ARE NOT GUARANTEED TO POINT TO ANYTHING.


We have a relative URI in the doc that we are attempting to canonicalize, that is more or less cast in stone. I was asking for suggestions on the most efficient way to work round the problem. The option suggested on w3, ie creating a new doc does seem to be a bit of a performance hit.
Peer Reynders
Bartender

Joined: Aug 19, 2005
Posts: 2922
    
    5
Originally posted by Mike Halford:
We have a relative URI in the doc that we are attempting to canonicalize.


Ronald Bourret: XML Namespaces FAQ

14.4) Can I use a relative URI reference as a namespace name?

In version 1.1, the answer is no.


That relative URI may very well invalidate the entire document; so it wouldn't possible to obtain canonical XML from it.


Originally posted by Mike Halford:
The message is Node-Set contains relative namespace URI ns=www.ourco.com/release/v4.5
The code works ok when the namespace is changed to <ns urLine xmlns:ns="http://www.ourco.com/release/v4.5" ...>


What does "the code works ok" mean? What failure occurs when the URI scheme name is omitted? The scheme name can also be interpreted as a transport protocol for resource access. So when "http://" is added the "code may start to work" because:
  • The addition of the scheme name makes the XML document valid so that it can be processed properly.
  • Something is interpreting that URI as an URL and tries to access it (when it shouldn't).


  • [ October 03, 2007: Message edited by: Peer Reynders ]
    Mike Halford
    Ranch Hand

    Joined: Jul 13, 2005
    Posts: 31
    Thanks for that link, that's useful. Maybe the stone is going to have to crumble afterall It would seem that they are going to have to change it to a urn.

    Regarding
    What does "the code works ok" mean? What failure occurs when the URI scheme name is omitted? The scheme name can also be interpreted as a transport protocol for resource access. So when "http://" is added the "code may start to work" because:

    * The addition of the scheme name makes the XML document valid so that it can be processed properly.
    * Something is interpreting that URI as an URL and tries to access it (when it shouldn't).

    What I meant was that that with the addition of the
    http://
    the canicalization works.
     
    Don't get me started about those stupid light bulbs.
     
    subject: Resolving relative URIs for Canonicalization