JLS 3.2: " 1. A translation of Unicode escapes (�3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters. 2. A translation of the Unicode stream resulting from step 1 into a stream of input characters and line terminators (�3.4). 3. A translation of the stream of input characters and line terminators resulting from step 2 into a sequence of input elements (�3.5) which, after white space (�3.6) and comments (�3.7) are discarded, comprise the tokens (�3.5) that are the terminal symbols of the syntactic grammar (�2.3). " JLS 3.6: " White space is defined as the ASCII space, horizontal tab, and form feed characters, as well as line terminators. "
How are you using them? Have you tried the StringTokenizer class to get the words?