aspose file tools*
The moose likes Web Services and the fly likes Managing xsd types across multiple service projects Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Web Services
Bookmark "Managing xsd types across multiple service projects" Watch "Managing xsd types across multiple service projects" New topic
Author

Managing xsd types across multiple service projects

Marc Peabody
pie sneak
Sheriff

Joined: Feb 05, 2003
Posts: 4727

We're starting with .xsd files (canonical-first approach) to describe the data to be passed across the wire in our services.

Here's a problem:

Let's say Service A has xsd type Item and Service B has type Item. Item is the same in both instances... for now. As good OOAD folks would do, we're hoping to share this Item xsd definition across the two services. One consideration is to have a project that contains nothing but xsd definitions and each service can pick and choose what types they want.

The problem now is that if one service needs an extra field added to Item and we change the shared definition for Item, it would effect all dependent services because they all share the same xsd type. Should we allow that? Or should Item not be changed and we instead create a different ItemWithNewField or DeepItem or similar type to be used by only the service that needs it? If we go that route, would it make more sense to name the new version of type Item something else or simply give it a different namespace?


A good workman is known by his tools.
Peer Reynders
Bartender

Joined: Aug 19, 2005
Posts: 2922
    
    5
Originally posted by Marc Peabody:
As good OOAD folks would do, we're hoping to share this Item xsd definition across the two services.


As a side note: We all know that OOAD can be hard - in my opinion SOAD is even harder because the guidelines are much more vague and it's even more of a balancing act between all the competing concerns than what you are already used to. So most decisions tend to be much less clear cut.

The problem now is that if one service needs an extra field added to Item and we change the shared definition for Item, it would effect all dependent services because they all share the same xsd type. Should we allow that? Or should Item not be changed and we instead create a different ItemWithNewField or DeepItem or similar type to be used by only the service that needs it? If we go that route, would it make more sense to name the new version of type Item something else or simply give it a different namespace?


The answer is, unfortunately, "it depends".

In a large organization it is unreasonable to expect total data type harmonization over the entire service inventory - ultimately that would prove counterproductive as it would continually hamper your efforts to evolve your inventory. However you should divide you service inventory into distinct business domains (Domain Inventory Pattern). Within each domain you should try to harmonize common data types. However on a case by case basis you may run into the odd service who's functional context needs to add data that other services are not interested in propagating (but some clients may be interested in). You may be able to manage some of these additions by only making the core document elements mandatory and making the additions optional (a lot of this depends on whether your web service stack will tolerate "extra" elements and simply ignore them - mitigating the need to update clients that aren't interested in new optional elements of the updated XSD). In effect you need to spend some time up front in your analysis to develop your "service inventory blueprint". This sounds like BDUF but you aren't building the whole thing at once. The "service inventory blueprint" is a map of your long term service inventory goals with the (non-overlapping) domains and services with their (non-overlapping) functional contexts. That way you should have a better notion where all the boundaries are and whether a capability that fills an immediate need should be added to a existing service or belongs to a new service.

Also be careful about which of the XML Schema features you employ.

SOA in Practice: The Art of Distributed System Design (p44):
Based on my own experience (and others), I recommend that you have a basic set of fundamental types that you can compose to other types (structures, records) and sequences (arrays). Be careful with enumerations (an anachronism from the time when each byte did count - use strings instead), inheritance and polymorphism (even when XML supports it).


i.e. web service stacks on less developed platforms or those outside of your own service inventory may not support all those "fancy schmancy" XML Schema features. Enumerations are great in non-distributed OO, however they would place too much detail in the service contract and restrict the service's evolution and lifespan; the trade-offs are increased validation inside of the services and the need for a richer fault message type system. And above everything resist the temptation to simply mimic your existing object models in the XML document schemas.

Given where you are right now you might want to have a look at Web Service Contract Design and Versioning for SOA and SOA Principles of Service Design.
[ December 19, 2008: Message edited by: Peer Reynders ]
Bob Jake
Greenhorn

Joined: Apr 15, 2002
Posts: 2
________________________________________________________________________________________________________________________
"In a large organization it is unreasonable to expect total data type harmonization over the entire service inventory - ultimately that would prove counterproductive as it would continually hamper your efforts to evolve your inventory. However you should divide you service inventory into distinct business domains (Domain Inventory Pattern). Within each domain you should try to harmonize common data types. However on a case by case basis you may run into the odd service who's functional context needs to add data that other services are not interested in propagating (but some clients may be interested in)."
________________________________________________________________________________________________________________________

The ideal situation will be to achieve a single canonical representation for the entity (within a distinct business domain) on which general consensus can be agreed to by multiple systems requesting that entity. If the reuse of the canonical is between 1 and 2 only, we will not be achieving the ROI on the investment by the organization. That said, the intent of the SOA organization should be focused on leveraging this exercise to enable all the systems to talk at the canonical model level with the special message interaction treated as an exception or something which that needs to be worked-upon when more evolved canonicals are prepared at regular pre-defined/ planned/ market focused intervals.
________________________________________________________________________________________________________________________
"You may be able to manage some of these additions by only making the core document elements mandatory and making the additions optional (a lot of this depends on whether your web service stack will tolerate "extra" elements and simply ignore them - mitigating the need to update clients that aren't interested in new optional elements of the updated XSD)."
_______________________________________________________________________________________________________________

I would be very interested to know the various alternative ways how the web service stack can be made to tolerate the extra elements. I perceive this as a transformation layer between the canonical and the special message at the service end point for that specific client.
[ December 21, 2008: Message edited by: Remote Reference ]
Peer Reynders
Bartender

Joined: Aug 19, 2005
Posts: 2922
    
    5
"Remote Reference",

Please check your private messages regarding an important administrative manner.
Peer Reynders
Bartender

Joined: Aug 19, 2005
Posts: 2922
    
    5
Originally posted by Remote Reference:
The ideal situation will be to achieve a single canonical representation for the entity (within a distinct business domain) on which general consensus can be agreed to by multiple systems requesting that entity.


This is unrealistic if those multiple systems are under the control of multiple owners (different departments, companies etc.). Basically its a balancing act between the extremes of total data harmonization and imposing data-transformation overhead on every service call. Usually a business domain does not extend outside of the organization's service inventory. If all the services within the business domain are under control of the same owner then total data harmonization may be attainable - if however services within the domain are controlled by different owners then some data transformation may be necessary as those owners may promote differing "views" of the same domain (which could imply that services in a single domain have to be under the control of the same "owner" - a different owner may automatically imply a distinct domain even if the services are "related").

SOA in Practice: The Art of Distributed System Design "Heterogeneus Data Types" pp.38-41

... In fact, when object-orientation became mainstream, having a common business object model (BOM) became a general goal. But, it turned out that this approach was a recipe for disaster for large systems.

The first reason for the disaster was an organizational one: it was simply not possible to come to an agreement for harmonized types ... Either you didn't fulfill all interests, or your model became became far too complicated, or it simply was never finished. This is a perfect example of "analysis paralysis": if you try to achieve perfection when analyzing all requirements, you'll never finish the job.

... different systems enhance differently. Say you create a harmonized data type for customers. Later, a billing system might need two new customer attributes to deal with different tax rates, while a CRM system might introduce new forms of electronic addresses, and an offering system might need attributes to deal with privacy protection. If a customer data type is shared among all your systems (including systems not interested in any one of these extensions), all the systems will have to be updated accordingly to reflect each change, and the customer data type will become more and more complicated.

Sooner or later, the price of harmonization becomes too high. Keeping all the systems in sync is simply too expensive in terms of time and money. ...

Common BOMs do not scale because they lead to a coupling of systems that is too tight. As a consequence, you have to accept the fact that data types on large distributed systems will not be harmonized. In decoupled large systems, data types differ.

... if data types are not harmonized, you need data type mappings (which include technical and semantic aspects). Although mapping adds complexity, it is a good sign in a large systems because it demonstrates that components are decoupled.

... Note that a service consumer should avoid using the provider's data types in it's own source code. Instead the consumer should have a thin mapping layer to map the provider's data types to its own data types. ...

... Having no common business data model has pros and cons:
  • The advantage is that systems can modify their data types without directly affecting other systems (modified service interfaces affect only corresponding consumers)
  • The drawback is that you have to map data types from one sytems to another.


  • ... To promote loose coupling, fundamental datatype harmonized for all services should usually be very basic. The most complicated common data type I've seen a phone company introduce in a SOA landscape was a data type for a phone number (a structure/record of country code, area code, and local number). The trial to harmonize a common type of address (customer address, invoice addresses, etc.) failed {how to deal with titles of nobility? disparate constraints of different systems and tool to process and print addresses}.

    ... If you are able to harmonize, do it. Harmonization helps. However don't fall into the trap of requiring that data types be harmonized. This approach doesn't scale.

    {To deal with differences between multiple related types (e.g. addresses) for consumers} ... introduce a composed service that allows you to query and modify addresses. The service then deals with the differences between the backend systems by mapping the data appropriately.

    ...


    I would be very interested to know the various alternative ways how the web service stack can be made to tolerate the extra elements. I perceive this as a transformation layer between the canonical and the special message at the service end point for that specific client.


    Unfortunately current toolsets (that I am aware of) don't seem to support the type of features that are needed to minimize interdependecies between the provider and consumer. Current tools focus more on features that enable developers to remain in the comfort zone of their primary IDE/implementation platform/implementation language without having to mentally shift into the "service-orientation paradigm" - usually with detrimental results. If your messages use XML payloads then you need to understand the differences between strict, flexible, loose (with XML Schema wildcards) schema versioning and incompatible vs compatible changes (e.g. addition of optional XML components) when it comes to interface versioning.

  • The designer of the provider needs to design all the request messages with a minimal set of mandatory XML components (elements and attributes). This minimizes the "Defined Set" that every consumer that uses a request has dependencies on.
  • Short of hand-coding all the client code, WSDL2Code code generators need to acquire features that let the client customize the exposed part of the service contract. You should be able to select the request and response messages in the service contract that are actually relevant to the client and ignore those that aren't. Furthermore you should be able to identify exactly which portions of the response payload and exactly which optional XML components in the request payloads the client is actually interested in. This should allow the code generator to create binding code that is more resilient to changes in non-relevant (from the client perspective) areas in the service contract.


  • Also note that when you use a typical type-binding framework (like JAXB) your client type mapping layer will (or should) look something like this:

    XML documents <-[Generated XML/Object Binding Code]-> generated artifact objects <-[Hand-coded type-mapping "glue code"]-> Service Logic Objects

    The "generated artifact objects" have to be considered "provider data types" as they will change as the XML document definitions change - therefore the "generated artifact objects" shouldn't be used in the service logic.

    You can sometimes eliminate one layer by using a binding framework that allows you to completely configure structure mapping like JiBX.

    XML documents <-[structure mapping binding definition]-> Service Logic Objects
    Marc Peabody
    pie sneak
    Sheriff

    Joined: Feb 05, 2003
    Posts: 4727

    First off, thanks for all the great information. This is terrific.

    I believe I picked up from both of you that adding optional elements to a type can be a nice way of using the same type across multiple endpoints that communicate with slightly different versions of essentially the same data type.

    The only major negative consequence I can see with this approach is that potential service clients who discover a service endpoint see the types defined in the request or response and assume that all of these "optional" elements are relevant to the endpoint.

    What about just forgetting harmonization altogether? I imagine it would be tremendously frustrating for a client to call two difference services with similar data but in one service a tag is organization while in the other service it is organisation or org or company. It would be terrific to at least keep tag names in harmony.

    These two xsd design strategies, when contrasted, present a major tradeoff between xsd reusability and (human) clarity.

    It would be great to have the best of both worlds:
    1) An enterprise-wide harmony type that defines all possible elements, most of which are optional. This helps to standardize naming.
    2) A service-specific type that adheres to the naming of its harmony type but lists only the elements that the service supports. This helps to communicate to the client exactly what the service supports.

    Is there any xsd trick that supports this?
    Bear Bibeault
    Author and ninkuma
    Marshal

    Joined: Jan 10, 2002
    Posts: 61432
        
      67

    "Bobby J", please check your private messages for an important administrative matter. Again.


    [Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
    Bob Jake
    Greenhorn

    Joined: Apr 15, 2002
    Posts: 2
    Originally posted by Marc Peabody:

    The only major negative consequence I can see with this approach is that potential service clients who discover a service endpoint see the types defined in the request or response and assume that all of these "optional" elements are relevant to the endpoint.

    What about just forgetting harmonization altogether? I imagine it would be tremendously frustrating for a client to call two difference services with similar data but in one service a tag is organization while in the other service it is organisation or org or company. It would be terrific to at least keep tag names in harmony.



    Forgetting harmonization will be at antithesis to the SOA philosophy. Using multiple schema however poses an interesting problem. While each service is a contract, it will be useful for the domain client to use a single entity representation maybe by utilizing the extended schema (org/organisation/company) alone and not the base representation (as the case might be) or through the transformation at the client layer if multiple external services are providing different representations.
    Peer Reynders
    Bartender

    Joined: Aug 19, 2005
    Posts: 2922
        
        5
    Originally posted by Marc Peabody:
    I believe I picked up from both of you that adding optional elements to a type can be a nice way of using the same type across multiple endpoints that communicate with slightly different versions of essentially the same data type.


    That is, if you practice flexible or loose versioning. With strict versioning any change is an incompatible change and requires a schema update that propagates up to all the dependent providers and consumers. Also your ability to benefit from a flexible versioning strategy may be limited by your web service stack - what does it do when it is suddenly faced with an XML component that it didn't expect? Will it ignore it. Will it fail the parse? Can this behavior be configured? It's worth some tests ...


    The only major negative consequence I can see with this approach is that potential service clients who discover a service endpoint see the types defined in the request or response and assume that all of these "optional" elements are relevant to the endpoint.

    The fact that they are optional should signal that they are only meaningful in a limited range of contexts. If they are always relevant they need to be mandatory. That is simply something that the consumer developers have to wrap their heads around. Of course the significance of the optional types needs to be documented somewhere.


    What about just forgetting harmonization altogether? I imagine it would be tremendously frustrating for a client to call two difference services with similar data but in one service a tag is organization while in the other service it is organisation or org or company. It would be terrific to at least keep tag names in harmony.

    SOA is about balance. The right balance will tend to be different in different situations. So "harmonize where you can, but don't rely on total harmonization" pretty much sums it up.

    Standardization of vocabularies is considered a supporting practice in this case. However even if you align the localnames of your XML components the fully qualified names will still be distinct due to the use of distinct namespaces.


    These two xsd design strategies, when contrasted, present a major tradeoff between xsd reusability and (human) clarity.

    It would be great to have the best of both worlds:
    1) An enterprise-wide harmony type that defines all possible elements, most of which are optional. This helps to standardize naming.
    2) A service-specific type that adheres to the naming of its harmony type but lists only the elements that the service supports. This helps to communicate to the client exactly what the service supports.


    Ultimately I think it is unrealistic to expect harmonization of too many types over the entire enterprise (1). Once you constrain harmonization to a distinct business domain you will probably find that the proliferation of optional XML components is kept in check so that (2) is no longer a major issue.



    Originally posted by Bob Jake:
    Forgetting harmonization will be at antithesis to the SOA philosophy.

    That is a bit strong. SOA recognizes that total harmonization is unrealistic - it establishes practices to deal with heterogeneous environments and non-harmonized data.


    Using multiple schema however poses an interesting problem.

    As pointed out above a typical solution is a composed (delegating proxy) service at the business domain boundary which transforms the requests and responses from one set of schemas to the other set of schemas.
     
     
    subject: Managing xsd types across multiple service projects