• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

URLyBird data file format

 
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello there,
I have a real problem. In the definition of the format of the data file in the URLyBird notes it says this:

All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.

The poor English in first sentance is not a typo on my part - its an exact quote. I assume what they mean is:

All numeric values that are stored in the header information use the formats of the DataInputStream and DataOutputStream classes.

Or even:

All numeric values stored in the header information use the of formats the DataInputStream and DataOutputStream classes.

Either sentance would make sense. Although its not a big deal, it makes me doubt the accuracy of what they say elsewhere in the paragraph (indeed, the whole document). Especially where it says:

The character encoding is 8 bit US ASCII.

I was going to use the constructor for String which takes a byte array and a charset name to convert an array of bytes into the correct character string. However, the documentation for the constructor referred me to the Charset class for a list of allowed charsets. The only US ASCII one was:

US-ASCII - Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set.

Thats seven bit US-ASCII, not EIGHT bit US-ASCII. I assume if I use this charset to decode the bytes I'm going to end up with the wrong characters. Am I missing something here?
 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Jon,

gosh!! This puzzled me too! Additionally I'm not a native english speaker.
I used the encoding format "US-ASCII" and the results were fine. Just test it.


Regards,

Franky.
 
Ranch Hand
Posts: 783
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Jon Poulton:
The character encoding is 8 bit US ASCII.

Hi Jon,
This has come up several times in the past. Some think it is a typo, others think Sun put it in to see if we would catch it (personally I think it is a typo). A little while ago (six months?) someone from this forum emailed Sun to inquire about it, and Sun replied saying that we could use ISO-889-1. That is what I used in my assignment and I recieved max points in the Data Store area.

Do a search on "character encoding" or "8 bit US ASCII" and you will see several threads discussing this problem. Whatever you decide, remember to document this issue in your choices.txt!
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic