File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes General Computing and the fly likes Lack of data support in contemporary CS Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » General Computing
Bookmark "Lack of data support in contemporary CS" Watch "Lack of data support in contemporary CS" New topic
Author

Lack of data support in contemporary CS

Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
I wasn't sure which forum this question belongs to, so posted it here.
There is a problem that has been tormenting me for several years. What is the most important aspect of data? You think "type" - all those integer/short/float/sunk? Nope. Dimension is most important. You can compare int to short and even float, but you do not want to compare dollars to pounds or miles. Yet no programming language supports units and dimensions. And what can we expect from general-purpose programming languages, if even data-oriented languages like SQL and XML maintain ignorance on the problem?
You think that Map made up some contrived problem to entertain the public. Nothing even close. I worked on a database that kept information about scientific projects sponsored by Ministry of Agriculture of Russia. They wanted to be able to compare similar projects to decide which one to finance. Data were submitted in amazing units like "bacteriums per square millimeter" or something. To be able to compare data we had to parse units, to decide with dimension they belong to, to keep reduction coefficients within dimensions etc. It was one of the most labor-consuming parts of the project. This was when I started to wonder how our industry has been living without such a basic tool for decades.
Well, either by assuming that data are submitted in proper units, or each application has to maintain untis on its own. "Assuming" strategy in our era of globalisation, internationalization, distributed computing and especially recent service-discovery trend is very myopic for not to say more. "Home-made" solution will work, but this is not an application-level task! There is nothing application-specific, there is nothing even language-specific. There really should be some kind of independent specification, like ISO standards for data/time/county codes etc.
I hoped XML schema will address the issue somehow, but nope. To my surprise, I haven't seen a lot of discussion, the only mentioning I found is:
"My motivation was to specify an attribute as a numeric value with units of measure attached to it for absolute clarity. (Ask NASA how important this might be. They lost a Mars probe because numbers had no units associated with it and one group assumed metric while the other group was specifying English.)"
http://lists.xml.org/archives/xml-dev/200104/msg00525.html
Maybe I do not understand something. How common the problem is? Have you ever had to deal with units?
Kyle Brown
author
Ranch Hand

Joined: Aug 10, 2001
Posts: 3892
    
    5
Map,
Unit conversion was a huge part of some Smalltalk programs that I worked with in the early 90's. In fact, Kent Beck wrote some wonderful little patterns about it that made it into his "Smalltalk Best Practice Patterns" book...
This turns out to be one of the places where polymorphism was a big win for us -- Once we build units into our objects and allowed them to do conversion on their own, it made our life much easier...
Kkyle


Kyle Brown, Author of Persistence in the Enterprise and Enterprise Java Programming with IBM Websphere, 2nd Edition
See my homepage at http://www.kyle-brown.com/ for other WebSphere information.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Where does this appear in Smalltalk Best Practice Patterns? I have a (largely unread) copy - I can't seem to find anything about unit conversions in it. Of course, this may be in part because I haven't actually learned Smalltalk. :roll: Which is a pain in ht ebutt sometimes when I'm looking through Gof and the like. One of these days...
Map - Martin Fowler does show some relevant ideas in his Analysis Patterns. See sections 3.1-3.3. Enjoy...


"I'm not back." - Bill Harding, Twister
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
Thank you, Kyle and Jim!
You apparently suggest "implement it yourself", when my concern is how not to implement it myself.
I wondered why Java huge API lacks such an obvious service. Then I thought: what good does it make to have such API if incoming data don't provide untis?
I am especially puzzled by lack of units consideration in Web-services model. Web services are supposed to be discovered and used by just anybody. To transmit data they use SOAP protocol which is built on top of XML. This is an example from SOAP specification:

<SOAP-ENV:Body>
   <m:GetLastTradePriceResponse xmlns:m="Some-URI">
      <Price>34.5</Price>
   </m:GetLastTradePriceResponse>
</SOAP-ENV:Body>

Now what is it - "price 34.5" ?
Or this:
<element name="height" type="float"/>
<height>5.9</height>

Awesome. And this is intended to serve international audience! :roll:
Thinking about how to improve Web-services model , Map came to the conclusion that it should be fixed on XML datatypes level, not SOAP. There are only seven fundamental dimensions in SI - length, time, mass, electric charge and few other. We already have "time" datatype, all we need now is to add few more . Then, each dimension has "base unit", for length, for example, it is "meter". So instead of
<element name="height" type="float"/>
<height>5.9</height>

we would have
<element name="height" type="length"/>
<height>5.9</height>

which would unambiguously mean "5.9 m"
Tell me where I am wrong.
[ February 23, 2002: Message edited by: Mapraputa Is ]
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
Originally posted by Mapraputa Is:

<element name="height" type="length"/>
<height>5.9</height>

which would unambiguously mean "5.9 m"
Tell me where I am wrong.
Why would that be unambiguous? I assumed that it was inches... or was it feet.
How about:
< element name="height" type="length" measurement="feet" />
< height >5.9< /height >
Now I don't have to worry about how it is actually stored in a database.


Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog
Frank Carver
Sheriff

Joined: Jan 07, 1999
Posts: 6920
Hmm. I much prefer Map's suggestion of standardizing on a single unit set. Allowing arbitrary unit-specifiers in the requtest has a mess of problems:
  • It's very vulnerable to typos
  • Every application would have to have access to a huge all-to-all translation table
  • what should happen if one of the ends does not understand the named unit type. Would you bother supporting pecks, cubits, angstroms, fahrenheit ?
  • multi-service processes would end up dominated by unit conversion on every input and output
  • rounding errors would rapidly accumulate (degrees to radians, anyone ?)
  • and many more


  • On the other hand, a standardization strategy only works where the conversion factors are static and well understood. Using SI units for physical measurements is cool, but won't work for money, for example, where the exchange factors are dynamic.
    I tend to think that this is a subset of a wider cultural inference problem. Web service designers are heading for a rude awakening when they try to deploy to a global market.
    Consider the following textbook web service: I wish to rent a vehicle for a foreign visit. The car rental agency provides a web service which takes basic parameters:
    start location, duration of rental, vehicle type, credit card details
    That's fine, until you realize that vehicle type must be one of "subcompact", "compact", "medium", "wagon", or "minivan"
    But these are purely US cultural terms, and mean almost nothing even to visitors from the UK - and at least we speak (mostly) the same language.
    Is there any answer to this sort of problem ?


    Read about me at frankcarver.me ~ Raspberry Alpha Omega ~ Frank's Punchbarrel Blog
    Johannes de Jong
    tumbleweed
    Bartender

    Joined: Jan 27, 2001
    Posts: 5089
    Is there any answer to this sort of problem ?
    Well I don't know if you've lost any luggage yet when flying. But the guys at the lost-luggage counters have a sheet with drawings of the different shapes that luggage comes in. Funny enough they all adhere to some basic shapes. They could the same for the cars and other "objects"
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    I don't think standardizing on a single unit set is going to fly for all applications - even when conversion factors are well-established constants. Leaving aside the issue of the US saddling itself with an archaic unit system, the simple fact is that SI isn't always used throughout the rest of the world. The most blatant example of this is time. "Seconds" is just one of the units in common usage. And which of these would you rather see?
    <element name="warranty_period" type="time"/>
    <warranty_period>63113000</warranty_period>

    or
    <element name="warranty_period" type="time" unit="year"/>
    <warranty_period>2</warranty_period>

    I'm sure archaeologists and geologists would be really thrilled to be stuck with the former system. :roll: There are lots of applications where non-SI units are more natural. Kilometers, light-years, or angstroms for length. Electron-volts or kW-hours for energy. Kelvin rather than Celsius for temperature. And good luck trying to get the world at large to use proper units for "weight" - most will try to use kilograms, not knowing any better. And what of angles? While most of the world would assume degrees, radians and milliradians are more useful for a lot of things.
    (As an aside, why does SI use kilogram as the base unit for mass, rather than gram? Seems pretty silly...)
    Addressing some of Frank's specific concerns...
    Allowing arbitrary unit-specifiers in the requtest has a mess of problems:
    I certainly wouldn't allow completely arbitrary unit specifiers. I imagine a master list could be made up, and a DTD placed at a readily available URL.
    It's very vulnerable to typos
    Hence the usefulness of a DTD listing all allowed values. But what isn't vulnerable to typos?
    Every application would have to have access to a huge all-to-all translation table
    I don't think it need be particularly huge. Each unit just needs type (e.g.length) and a nice double-precision conversion factor to the apropriate SI unit (meter). From there it's simple to convert to any other unit of the same type. It's true that the complete set of possible conversions might get to be a bit much for, say, a J2ME app on your cell phone. But you could make stipped down versions of the table, and throw an "unrecognized unit" error if someone tried to input length in cubits.
    what should happen if one of the ends does not understand the named unit type. Would you bother supporting pecks, cubits, angstroms, fahrenheit ?
    I would certainly support the latter two. Heck , how difficult can it be to calculate Ångstroms? But for the most part, I'd expect all unit conversions to be handled by a readily-available package. Anyone claiming to support this "XML Unit Standard" (whatever trendy name it might acquire) would just include that package in their code.
    multi-service processes would end up dominated by unit conversion on every input and output
    rounding errors would rapidly accumulate (degrees to radians, anyone ?)

    Could be a risk. There could be a standard "recommended" set of units which services are encouraged to use internally for maximum efficiency. If two services both use meters internally, and pass values using meters, then there's no conversion overhead. Just a tiny bit to read the units declaration and see that it's in meters - no more math to be done. If a client insisted on using the non-recommended units, they could be converted once on initial input. Any number of subsequent services could process the data with minimal overhead. Then maybe one more conversion on output, if desired. My thinking is that any application that chokes on unit conversion probably waqn't doing anything interesting with the numbers anyway. And note that many applications wouldn't need to convert the units anyway,if they're just moving the data around without doing any calculations. Services could adopt a sort of "lazy instantiation" policy, only converting units if they found it necessary, and then usually converting to the preferred units for the benefit of any subsequent services in the chain.
    and many more
    Well, maybe. I think there many more reasons against expecting this standardization.
    [ February 28, 2002: Message edited by: Jim Yingst ]
    Rob Ross
    Bartender

    Joined: Jan 07, 2002
    Posts: 2205
    FYI : someone else has been thinking about this very same thing:
    http://servlet.java.sun.com/javaone/sf2002/conf/bofs/display-2034.en.lite


    Rob
    SCJP 1.4
    Mapraputa Is
    Leverager of our synergies
    Sheriff

    Joined: Aug 26, 2000
    Posts: 10065
    Aha! What did I say! And you thought Map is a nut. Thank you, Rob
    I did not know "horsepower" has a strict definition, though.
    Originally posted by Elliotte Rusty Harold:
    "This strikes me as a very interesting idea in principal. Unit analysis was one of the simplest and most powerful techniques I learned in physics, and I've long felt that the lack of support for units has been a major flaw in virtually all programming languages from Fortran through C#."
    http://www.cafeaulait.org/2001march.html

    See?
    Originally posted by Elliotte Rusty Harold:
    "Nonetheless I wonder if Java is really the right language in which to do this. While a class library for unit handling would certainly be useful, I'd be a lot more excited to see a language built around this idea in which all primitive numeric variables had units (or were explicitly dimensionless). Such a language would probably need to eliminate the distinction between primitive and object types to allow for new units to be attached to the base types. Does any such language already exist?"
    http://www.cafeaulait.org/2001march.html

    Mm hm... There are two kinds of data from a program's point of view: internal and external. I am not sure general purpose programming language should provide its primitive datatypes with units. IMHO, these are external, domain-specific data that need units. So "such a language" should be SQL/XML type.
    [ February 28, 2002: Message edited by: Mapraputa Is ]
    Mapraputa Is
    Leverager of our synergies
    Sheriff

    Joined: Aug 26, 2000
    Posts: 10065
    Originally posted by Frank Carver:
    On the other hand, a standardization strategy only works where the conversion factors are static and well understood. Using SI units for physical measurements is cool, but won't work for money, for example, where the exchange factors are dynamic.

    That's true that standard units are only a part of the problem. I was puzzled why even this well-defined and easy to implement part is in no way addressed, in Web-services model in particular.
    I tend to think that this is a subset of a wider cultural inference problem. Web service designers are heading for a rude awakening when they try to deploy to a global market.
    Absolutely!
    And I am sure there will be problems we do not realize until we start "to think globally" For example, in many cultures person names are listen in "FamilyName GivenName" order, the reverse to the USA standard. Usually people switch to US practice when write their names in English, and I was amuzed to see that if you really want to keep your national identity, you can sign your posts as:
    MURATA Makoto (FAMILY Given)
    http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-Jan-2000/0611.html

    Is there any answer to this sort of problem?
    I came to the same problem from another side. For myself I call it "Open society vs Federation" problem. Web services represent "Open society" model, while Jini is built around the idea of "federations". There are opinions that "open society" model simply wont work,
    My intuition tells me that Tim Berners-Lee spoke about similar issues:
    "I have discussed elsewhere how we must avoid the two opposite social deaths of a global monoculture and a set of isolated cults, and how the fractal patterns found in nature seem to present themselves as a good compromise. It seems that the compromise between stability and diversity is served by there the same amount of structure at all scales.
    Tim Berners-Lee. The Fractal nature of the Web
    Thomas Paul
    mister krabs
    Ranch Hand

    Joined: May 05, 2000
    Posts: 13974
    The idea of a standard unit doesn't work, of course. Since my data is in lbs and ozs, I would have to convert it to metric to get it into XML so I would have rounding errors. And what happens if I forget to convert it? What happens is that my satellite goes crashing into Mars. Saying that we have standardized units is no different than what we have today.
    Mapraputa Is
    Leverager of our synergies
    Sheriff

    Joined: Aug 26, 2000
    Posts: 10065
    The idea of a standard unit doesn't work, of course. Since my data is in lbs and ozs, I would have to convert it to metric to get it into XML so I would have rounding errors.
    I have a thermometer that shows temperature in both F and C, which effectively disproves your thesis about impossibility of unit conversion.
    This thermometer is much better than the formula, some local scientists invented:
    C = (F - 30) / 2

    My point is: there is no point in achieving absolute precision because all measurements are imprecise by definition. There are always metering errors. Then, if you represent your even unconverted data as float numbers, they will be imprecise anyway...
    And what happens if I forget to convert it? What happens is that my satellite goes crashing into Mars.
    Well, this your argument undermines the whole computer industry, not only XML with proposed units added. :roll: Why use Java if a programmer still can make a mistake in algorithm?
    Saying that we have standardized units is no different than what we have today.
    I am saying that
    <element name="height" type="length" measurement="feet"/>
    <height>5.9</height>
    as you proposed, is much better than
    <element name="height" type="float"/>
    <height>5.9</height>

    Whether all units should be in SI is another question, that must be properly and thoroughly discussed.
    [ March 04, 2002: Message edited by: Mapraputa Is ]
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    Eh? Map, in your post on Feb 23, you were proposing
    <element name="height" type="length"/>
    <height>5.9</height>

    to "unambiguously" indicate 5.9 meters. (Or is that metres? ) Thomas and I disagreed with that idea. Are you saying you've changed your mind? Or are you trying to say you've always thought this way?
    Thomas Paul
    mister krabs
    Ranch Hand

    Joined: May 05, 2000
    Posts: 13974
    Originally posted by Mapraputa Is:
    I have a thermometer that shows temperature in both F and C, which effectively disproves your thesis about impossibility of unit conversion.
    My point is: there is no point in achieving absolute precision because all measurements are imprecise by definition. There are always metering errors. Then, if you represent your even unconverted data as float numbers, they will be imprecise anyway...
    Well, this your argument undermines the whole computer industry, not only XML with proposed units added. :roll: Why use Java if a programmer still can make a mistake in algorithm?
    I am saying that
    <element name="height" type="length" measurement="feet"/>
    <height>5.9</height>
    as you proposed, is much better than
    <element name="height" type="float"/>
    <height>5.9</height>

    Whether all units should be in SI is another question, that must be properly and thoroughly discussed.

    I agree with everything you say. I was trying to show that Frank's post was incorrect. The problem will be that unless the type of unit is specified then there will always be ambiguity as to which unit is being used. By specifying the unit we allow DTD to validate that the unit we have chosen is valid and we can use XSLT to convert it into whatever unit we desire.
    Mapraputa Is
    Leverager of our synergies
    Sheriff

    Joined: Aug 26, 2000
    Posts: 10065
    Sorry, my fault. I mixed several problems in my posts. I'll try to list them separately:
    1) why there is no support for units in general-purpose programming languages? Is it a good idea to add dimension support to them?
    2) why there is (after Rob's post I will say was) no support for units/unit conversion in Java API?
    3) Why there is no notion of dimensions in XML data model?
    4) Why Web services specification gives an example of data interchange and doesn't notice that absence of units contradict the very idea of Web services, which are supposed to be accessible for world audience?
    5) If we agreed to introduce units in XML specifications, is it a good idea to require to use one set of them, say, SI?
    Are you saying you've changed your mind? Or are you trying to say you've always thought this way?
    Nether of your insinuations is true. I am only trying to make my mind, and I am ready for any changes or adjustments in my vision.
    But I understand your frustration, Jim. Sorry, I did not mean to ignore you. I was going to respond to your post after I responded to Frank's, but it was late and I needed some sleep. Next days I had a fight in MD so I was distracted.
    I don't think standardizing on a single unit set is going to fly for all applications - even when conversion factors are well-established constants. Leaving aside the issue of the US saddling itself with an archaic unit system, the simple fact is that SI isn't always used throughout the rest of the world.
    I also do not like this idea of imposing one single unit set on all applications. However, thinking about Web-services, it's hard to expect users to be able to convert whatever unit you use into whatever units they use. If we required Web-services providers to use, say, SI, then they would have to support "local system - SI" conversion table, and the Web-service user would have to support "SI - another local system" conversion. SI here would serve the same function XML serves for other data formats - it is "interlingua".
    Hence the usefulness of a DTD listing all allowed values. But what isn't vulnerable to typos?
    But you already said - a DTD (or another validation tool) for all allowed values!
    I don't think it need be particularly huge. Each unit just needs type (e.g.length) and a nice double-precision conversion factor to the apropriate SI unit (meter).
    This requires one global table for all possible world units. Hm... Maybe if your global conversion table isn't too big, it's not such an overhead... I guess "length" and "mass" dimensions contribute great part of unit diversity - for historical reasons. Other dimensions, "Electric charge unit set", for example, shouldn't be so huge.
    Anyone claiming to support this "XML Unit Standard" (whatever trendy name it might acquire) would just include that package in their code.
    So query "give me all elements where XXX attribute > 10 gram" would yield correct answer in whatever units XXX's attribute value was in original document
    There could be a standard "recommended" set of units which services are encouraged to use internally for maximum efficiency.
    ... and for data conciseness. "Base" unit can be omitted.
    I think there many more reasons against expecting this standardization
    Actually, "recommended" status for SI would be probably sufficient.
    indicate 5.9 meters. (Or is that metres?)
    And what is the difference?
    Thomas Paul
    mister krabs
    Ranch Hand

    Joined: May 05, 2000
    Posts: 13974
    Originally posted by Mapraputa Is:
    2) why there is (after Rob's post I will say was) no support for units/unit conversion in Java API?
    This is an interesting question. Java has done amazing things with internationalization that it is surprising that nothing has been done in this area. We do date transformations from one locale to another why not unit transformations from one locale to another? Then we could ask our database how much Map weighs and Map would see the answer in kilograms, Angela would see the answer in stones, and I wouldn't see the answer because women never tell men their weight.
    Frank Carver
    Sheriff

    Joined: Jan 07, 1999
    Posts: 6920
    Map wrote: There are only seven fundamental dimensions in SI - length, time, mass, electric charge and few other. We already have "time" datatype, all we need now is to add few more . Then, each dimension has "base unit", for length, for example, it is "meter".
    ...
    I guess "length" and "mass" dimensions contribute great part of unit diversity - for historical reasons.

    This worries me too. It's all well and good to speak of seven basic units in the SI system, but other systems of measurement are not so lean and self-consistent. In particular, other systems have completely different scales for (say) area and volume measurement and don't use simple groupings of the basic linear measure.
    A UK "pint" is not the same as a US "pint" (and thus, a UK "gallon" is not the same as a US "gallon"), and none of them bear any obvious relationship to a foot, inch, yard, mile or meter (metre). In US measurement, "ounce" (or "oz") is both a mass/weight measure and a volume measure, but in UK measurement the volume measure is always "fluid ounce" ("floz"). And that's without counting the "troy ounce" which is a completely different mass/weight unit. Many traditions, because of different measurement technology, have quite different systems for dry volume and wet volume :- in the UK system, dry volumes include "peck", "bushel" etc., wet volumes include "pint", "fluid ounce", "gallon", "hogshead", "tun" etc.
    Note, also, that "tun" is a volume unit, but "ton" and "tonne" are (different) mass/weight units, even though all are pronounced the same (there is also a "displacement ton", which is actually a volume unit, expressed as an equivalent mass/weight of water). Can you honestly say that a DTD which includes all these will still be proof against typos and transcription errors. And before you protest, all four of the above units are still in colloquial and commercial use here in the UK.
    Mapraputa Is
    Leverager of our synergies
    Sheriff

    Joined: Aug 26, 2000
    Posts: 10065
    Originally posted by Frank Carver:
    It's all well and good to speak of seven basic units in the SI system, but other systems of measurement are not so lean and self-consistent.

    Actually, SI isn't an etalon of leaneness (is there such word? ) and self-consistency either. Theoretically speaking, four base dimensions are enough (namely length, time, mass and electric charge), seven were introduced for "convenience", although I don't understand this logic. If we start to define "base" dimensions arbitrary, then why not 8, 11 or 42? :roll:
    In particular, other systems have completely different scales for (say) area and volume measurement and don't use simple groupings of the basic linear measure.
    My idea was about grouping dimensions, not units. These can be grouped and expressed via base ones, regardless of units they are represented in. My former co-worker published an article about "structural formula", where all dimensions are presented and define measurable quantity, like molecular formula defines substance. It looks very similar to type system that is used in programming languages. If we know "structural formula", we know which units can be used with data, and which cannot.
    What do we need to build this formula? Four dimensions and one operation - multiplication with added for convenience division. (addition doesn't yield a new dimension). XML Schema can accommodate it. However, for the aforementioned convenience we may want more predefined dimensions. MathCad, for example, lists 32: pressure, temperature, velocity etc. Then, users can construct their own, "bacteria per square millimeter", for example, would be quantity/length*length.
    Note, also, that "tun" is a volume unit, but "ton" and "tonne" are (different) mass/weight units, even though all are pronounced the same (there is also a "displacement ton", which is actually a volume unit, expressed as an equivalent mass/weight of water). Can you honestly say that a DTD which includes all these will still be proof against typos and transcription errors. And before you protest, all four of the above units are still in colloquial and commercial use here in the UK.
    If mistyped units belong to the same dimension, nothing will help. If not, Schema could help. If our data is of type "volume", then "tun" is correct and "ton" is not. (In your example "ton" can be "volume" also, but I believe it's a relatively rare case).
    Mapraputa Is
    Leverager of our synergies
    Sheriff

    Joined: Aug 26, 2000
    Posts: 10065
    Here it is!
    I knew there are other deep thinkers in this world, outside of JavaRanch!
    What XML Schema Designers Need to Know About Measurement Units
     
     
    subject: Lack of data support in contemporary CS