aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes NX:  US-ASCII confusion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "NX:  US-ASCII confusion" Watch "NX:  US-ASCII confusion" New topic
Author

NX: US-ASCII confusion

Jacques Bosch
Ranch Hand

Joined: Dec 18, 2003
Posts: 319
Hi Guys.
I have never worked much with encoding, so please excuse...
My instructions have the following:

All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.

And the java doc has this:

US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set

My question is:
Is the *8 bit US ASCII* in the instructions the same thing as *US-ASCII Seven-bit*?
I.e. Will this be valid as *8 bit US ASCII*?


Jacques<br />*******<br />MCP, SCJP, SCJD, SCWCD
Philippe Maquet
Bartender

Joined: Jun 02, 2003
Posts: 1872
Hi Jacques,
My question is:
Is the *8 bit US ASCII* in the instructions the same thing as *US-ASCII Seven-bit*?
I.e. Will this be valid as *8 bit US ASCII*?

Yes, you're correct. "7 bit US-ASCII" characters span on 8 bits anyway, but with one of them unused, hence the confusion.
Regards,
Phil.
Jacques Bosch
Ranch Hand

Joined: Dec 18, 2003
Posts: 319
Phil. Thanx again. You should get paid for this.
Another one:
Is the line *if (bytes[i] == (byte) 0) {* in the below the correct way to find the null terminator that might exist.
Do these two methods look like a good way to read a field?
Xie Ruchang
Ranch Hand

Joined: Dec 25, 2003
Posts: 160

Don't forget the reverse from String to bytes.
Jacques Bosch
Ranch Hand

Joined: Dec 18, 2003
Posts: 319
No, I won't forget. Thanx.
Any comments on my two methods?
Philippe Maquet
Bartender

Joined: Jun 02, 2003
Posts: 1872
Hi Jacques,
Phil. Thanx again. You should get paid for this.

Thank you, Jacques, but *I am paid* for it : replying to other people's questions is the *best* way to learn more.
Do these two methods look like a good way to read a field?

Both look perfect ! You will test them anyway, right ?
Another technique could be to build a first String at full length, and then using indexOf() to "shorten" it (meaning building a second String BTW ). But yours should be more efficient. Nice shot !
Now I cannot read a piece of code without commenting it, though (sorry for that ) :
readTextField() :
  • What about a single "return new String(bytes, 0, findNullTerminator(bytes), "US-ASCII");" (2 lines saved)
  • "US-ASCII" could be replaced by some CHARSET_NAME String constant.


  • findNullTerminator() : perfect IMO !
    Regards,
    Phil.
    Xie Ruchang
    Ranch Hand

    Joined: Dec 25, 2003
    Posts: 160
    Hi,
    I examined the file given and I notice that there are no null-terminated fields. There are all packed with trailing spaces. That make programming easier as we just need to use the trim() method. When I write the fields back, I pack it with trailing spaces too.
    What do both of you think of this approach that came with the file supplied by SUN.
    Best Regards
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319
    Phil.
    Yes you are right. But I elected to leave it like this

    just for now because I was contemplating still playing with the string. (trimming it). But I don't think I'm going to trim it.
    I'll use a constant.
    And please do comment away. That's how I learn.
    J
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Hi Frankie,
    What should you "believe" ? The test file received from SUN or SUN's instructions ? I remember a detailed discussion on this a long time ago, but I must go now so I cannot perform the search right now .
    Best,
    Phil.
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319
    I think I've read what you are referring to.
    I think your oppinion was not to trim.
    But trimming feels more natural.
    However, I think I'll not trim and say why in my docs.
    Jay Bromley
    Ranch Hand

    Joined: Aug 09, 2003
    Posts: 48
    Frankie,
    I noticed the same thing as you: my instructions say the fields are null-terminated strings, but when I actually looked at the database file there was not a single null terminated string.
    My take is that I've got to stick with the format of the database file (space-padded fields) because supposedly this database file is used by other applications (that's why the format must be maintained), and so it would be better to ignore the instructions and not change the format of the data file. If I did change everything to null terminated strings, this might risk breaking other apps that relied on the current file format.
    I documented this in my choices.txt, does it sound reasonable?
    Thanks and regards,
    Jay Bromley
    Xie Ruchang
    Ranch Hand

    Joined: Dec 25, 2003
    Posts: 160
    Hi Jay,
    I agree with you, but I go one step further after been through this thread. The specification says

    "null terminated if less than the maximum length for the field."

    The database supplied may not be the only database used. Thus it may not have every case. The supplied database does not deviate from the specification because it chooses to pad with trailing spaces to the maximum length for the field.
    According to the spec, one day there may be a possibility that a field with a null appears, then, will our program able to handle that.
    My current stand is to handle the null if it appears too.
    Best Regards
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    I think you're right, Frankie. The file given does not deviate from the specs, it's just a special case of them. So handling the null-termination and not trimming looks reasonable.
    Regards,
    Phil.
    Jay Bromley
    Ranch Hand

    Joined: Aug 09, 2003
    Posts: 48
    Hello,
    Frankie - good point, the strings in the database file we've received are all the maximum field length and hence no null-termination. I do have one question though: If a field is null terminated, what fills the remaining bytes in the field? I think that a well-designed app would in this case fill the fields with zeroes, though this certainly is not required and maybe can't even be expected of the other apps.
    Phillippe - I'm not so sure about not trimming for a couple of reasons. First, I like to trim things so I'm not sending around a bunch of useless bytes. Second, assuming fields are null-terminated and zero-padded, String.trim should work the same as for space-padding (charaters with codes less than or equal to 0x20 are removed), so no special handling has to go on.
    The only argument for not trimming that I can see is the case where junk is allowed after a null termination, such as might happen when short information overwrites longer information In this case a null could be in the "middle" of a string and so trim would not work since it might find "valid" characters after the null. Was this your reasoning, Phillippe?
    Thanks to both of you for pointing this out, it was an area I had glossed over.
    Regards,
    jb
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319
    Hi guys.
    I have decided to trim after all. It's just comon sense.
    Below is what I wrote in my design choices concerning this (So please don't copy it directly)

    Since it is a fair assumption that users of the software will only want to see and work with the value of the fields without their padding spaces, I have decided to trim all white space from the values when they are read from the database file. I.e. if a field in the database has the value "123 ", only the value "123" will be returned.
    To keep the data format consistent, I have resolved that when field values are written back to the database, they will be padded with spaces up to the length of the field.
    Since future formats of the database file might contain null terminated values in the text fields, I have included support for reading values of this format. The value up to the null terminator will be read, and returned after trimming the white space. However, when these values are written back to the database they will be written according to the defacto standard explained above (not with the null terminator, but padded with spaces).

    Regards.
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11509
        
      95

    Hi everyone,
    I think that dealing with the spaces the way it has been suggested here is a good way to handle it.
    You may be interested in Sun's reply when Gareth asked them about the nulls versus spaces.
    Regards, Andrew


    The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319
    When I pad with spaces, should I first pad and then convert to US-ASCII byte array? Or padd after convert?
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319
    And, does this look right?

    Tnx much.
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11509
        
      95

    Hi Jacques,
    One thing you might like to consider is that the physical writing to disk is the slowest part of the operation. So creating and writing one large byte array which contains all fields will be more efficient than doing multiple small writes.
    Regards, Andrew
    Philippe Maquet
    Bartender

    Joined: Jun 02, 2003
    Posts: 1872
    Here are a few arguments for *not trimming* the field values *neither padding* them with spaces :
  • The specs in our instructions are clear : field values are null-terminated.
  • The file contents are compatible with the specs (the space character is a normal one after all).
  • Trimming field values when reading, without padding them with spaces when writing, would have the following bad side effect : all unmodified field values would be altered in the file on the first record update.
  • Trimming (read) + padding (write) would be a *restriction* to the specs : saving the value "abc " in a 10 characters field would be impossible. While the specs clearly allow it.


  • Regards,
    Phil.
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319
    Hi Andrew.
    Good point. I'll do it like that.

    When I pad with spaces, should I first pad and then convert to US-ASCII byte array? Or padd after convert?

    Probably pad first!?
    Phil, you said:

    Trimming (read) + padding (write) would be a *restriction* to the specs : saving the value "abc " in a 10 characters field would be impossible. While the specs clearly allow it.

    I don't follow what you mean.
    If you save "abc " with padding, it will be returned as "abc" when read and trimmed. Is that what you mean? I still think that it's reasonable in the real world to assume that trailing spaces are not wanted since the specs aren't complete enough.
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319
    When padding a US-ASCII byte array with spaces, is this OK

    Or do I have to do something like this?
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11509
        
      95

    Hi Jaques,
    Have you looked at what java.util.Arrays.fill() and System.arrayCopy() can do for you?
    Regards, Andrew
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319

    Have you looked at what java.util.Arrays.fill() and System.arrayCopy() can do for you?

    Have played with System.arrayCopy(), but I've forgotten about java.util.Arrays.fill(). .
    But aren't both of them concerned only with objects, not primitives?
    But I'll check it out.
    Thanx for the reminder!
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    [Phil]: Here are a few arguments for *not trimming* the field values *neither padding* them with spaces :
    I agree with Phil here. I would also note that a literal reading of the specs doesn't seem to allow for trimming or padding within the Data class. However you could do this [/i]outside[/i] the Data class.
    [Jacques]: If you save "abc " with padding, it will be returned as "abc" when read and trimmed. Is that what you mean?
    I'm not Phil, but yes.
    I still think that it's reasonable in the real world to assume that trailing spaces are not wanted since the specs aren't complete enough.
    Yes, but given Sun has said Data must implement a particular spec, I think it's safer to be very literal in the interpretation of that spec, and put corrections in other, outside classes.
    In the real world, it would be possible that the customer has (or plans to have) other applications which interact with the Data class but may not require or even allow padding/trimming. They didn't mention it in the specs, true, but they did give a precise API to follow. They're not required to explain the reasoning behind all their choicesv - it's enough that they've provided a requirement, and we implement it as written.
    ...
    [Jacques]: But aren't both of them concerned only with objects, not primitives?
    Nope. System.arrayCopy() takes Objects as parameters, but that's because they wanted to be able to take either any prmitive array type, or Object[], and for whatever reason they didn't feel like making nine overloads of the method for all the possible parameter types (as is done in many other cases, including Arrays.fill().)


    "I'm not back." - Bill Harding, Twister
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319

    Yes, but given Sun has said Data must implement a particular spec, I think it's safer to be very literal in the interpretation of that spec, and put corrections in other, outside classes.
    In the real world, it would be possible that the customer has (or plans to have) other applications which interact with the Data class but may not require or even allow padding/trimming. They didn't mention it in the specs, true, but they did give a precise API to follow. They're not required to explain the reasoning behind all their choicesv - it's enough that they've provided a requirement, and we implement it as written.

    OK, but then what about the answer Sun gave Gareth:

    a) The file is valid. You've just used the maximum length of the field by padding with spaces and I should do the same. i.e. You are are trying to simulate typical semi-clueless customers, who says one thing but actually means something else.
    They said go with (a).

    From: http://www.coderanch.com/t/183774/java-developer-SCJD/certification/URLyBird-data-file-Reply-SUN
    Jim Yingst
    Wanderer
    Sheriff

    Joined: Jan 30, 2000
    Posts: 18671
    I posted in the same thread you mention. Answer (a) was the best of the options Gareth provided, but that doesn't mean option (d) wasn't close, or that we should take the exact wording of (a) as gospel. They were Gareth's words, not Sun's, and Sun chose the best option. The problem with (d) IMO is that Gareth went too far when he said "when I write records I should always write null terminated Strings". You would only need to write null-terminated strings if the strings were less than the max field lenght - and you can arrange things so that it's a non-issue, when Data's update() is called, you always pass it Strings that have exactly the allowed length, because they've already been padded with spaces.
    Really, more than one approach will be acceptable here, as long as you document your reasoning. It's entirely possible Sun won't penalize you if you don't follow the spec exactly. But why rely on that, when there's a simple way to follow the spec and produce the behavior the customer appears to "really" want?
    Jacques Bosch
    Ranch Hand

    Joined: Dec 18, 2003
    Posts: 319
    Hi Jim.
    Thanx for your comments. Very valid.
    I'll have to reconsider what to do.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: NX: US-ASCII confusion