Hello, is it possible to calculate max string length by having max string size in bytes. As bytes count depends on encoding, I'm not sure how client side can validate input values(length in this case) without knowledge of db file encoding(which breaks all 2 or 3-tier logic). Seems like many developers takes ASCII encoding(e.g. it can not be changed) and uses size in bytes to validate string length. Any comments? Thanks
Hi Gytis, With the encoding scheme used in our assignment (US-ASCII), there is a fixed 1/1 ratio between characters and bytes. Hence you may safely use String.length() to validate input values as far as field sizes are concerned. Now what would happen in the case, they'd change the file's encoding scheme and get a fixed 1/2 ratio between characters and bytes? The file would have to be converted anyway (its size being multiplied by 2), meaning that your previous use of String.length() still would be valid. Regards, Phil.
That's why I talked about a fixed ratio. UTF-8 in one of the encoding schemes which use a variable ratio between characters and bytes, hence cannot be used to encode files made of fixed-length records. UTF-8 is just an example, but there are many other encodings which couldn't be supported for the same reason. Regards, Phil.
Joined: Feb 02, 2004
Hi Philippe, thanks for your replay. Seems like you are assuming that field size, given in db file, represents actual string field length, but not the size in bytes(as it is stated in assignment document). In this case changing charset encoding shouldn't harm the system, i.e. validation like this one will work:
But that if field size with UTF encoding will be doubled too? Then that validation will definitelly fail. This is my main concern. [Phil:corrected the "code" tags] [ April 20, 2004: Message edited by: Philippe Maquet ]
Joined: Jun 02, 2003
Seems like you are assuming that field size, given in db file, represents actual string field length, but not the size in bytes(as it is stated in assignment document).
As our assignment also states that the character encoding is "US-ASCII", field sizes expressed in characters equal field sizes expressed in bytes.
But that if field size with UTF encoding will be doubled too? Then that validation will definitelly fail. This is my main concern.
I believe that supporting charsets where characters are encoded on 2 bytes is out of scope for this assignment. If *you* believe that either (and justify it ), your test below is OK:
Now if you want to abstract a bit the conversion between characters and bytes as far field lengths are concerned, you *may* code it as:
where BYTES_PER_CHAR is currently 1 but could could be 2 in the future without breaking your test. BTW, I implicitely suggested a constant, but it could be a variable as well, the ratio being computed. Regards, Phil.