Niall Loughnane wrote:there are space characters 160 (non breaking space character) and 32 (space) - these represent a space characters but are distinct different individual characters,
That isn't correct. ASCII only defines characters in the range from 0 to 127. Now, Unicode does declare 160 as a non-breaking space character, but then it declares a whole lot of other characters as space characters as well. Here is a document which lists twenty of them but there could be others. As far as I can see the Unicode normalization algorithms don't do anything with those various space characters -- and speaking of normalization, have you built that into your specialized string comparison?
Paul is correct. The original American Standard Code for Information Interchange is a 7-bit code. The 8th bit was reserved for use as a parity bit for use with devices such as Teletype™ machines. The classic old modem settings "8N1" reflect that, indicating 8 data bits, no parity bit, 1 stop bit (2 stop bits were needed for some slower devices). "7E2" would be 7 data bits with even parity, 2 stop bits.
When the IBM PC became popular, a new de facto standard was defined: ASCIIZ which designated uses for characters with the 8th bit set. Graphics, accented text (including umlauts, etc. And the non-break space for typesetting.
Science is the process of replacing what we "know" with what is TRUE. Politics, alas, often prefers to be the opposite.
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop