File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Java and C# encoding prob Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Java and C# encoding prob" Watch "Java and C# encoding prob" New topic

Java and C# encoding prob

Simon Harvey
Ranch Hand

Joined: Jan 26, 2003
Posts: 79
Hi everyone,
I'm doing some development in java and c# for my final year disertation at university. I'm having a small problem transferring xml between the two applications.
I'm sending xml text from c# to java.
When it arrives I use the SAX parser to try and split it up and then put the elements into a Vector. To get rid of newlines, in the characters method I have:
// Just ignore it

The problem I'm having is that this approach doesnt seem to be catching all the newlines when the xml is sent from the c# application. Does anyone have any ideas as to why this might be happening? I imagine there might be a slightly better way to deal with newline characters. I thought the ignorableWhitespace would maybe help but in my xml parse it didnt fire once so it must be for a different scenario.
I'm wondering if there is an escape character for a carrige return? I know about \n.
The only other thing I can think of is that c# might encode the text differently to java. Could this cause this problem?
Any help would be gratefully received and my thanks to anyone who can help
Kindest Regards
Simon Harvey
Cindy Glass
"The Hood"

Joined: Sep 29, 2000
Posts: 8521
Did you check for \r?
From the JLS

3.4 Line Terminators
the ASCII LF character, also known as "newline"
the ASCII CR character, also known as "return"
the ASCII CR character followed by the ASCII LF character

3.10.6 Escape Sequences for Character and String Literals
The character and string escape sequences allow for the representation of some nongraphic characters as well as the single quote, double quote, and backslash characters in character literals (�3.10.4) and string literals (�3.10.5).
\ b/* \u0008: backspace BS */
\ t/* \u0009: horizontal tab HT */
\ n/* \u000a: linefeed LF */
\ f/* \u000c: form feed FF */
\ r/* \u000d: carriage return CR */
\ "/* \u0022: double quote " */
\ '/* \u0027: single quote ' */
\ \/* \u005c: backslash \ */
OctalEscape/* \u0000 to \u00ff: from octal value */

"JavaRanch, where the deer and the Certified play" - David O'Meara
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
I had a similar problem and it was the \r that was causing the problem.

Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog
I agree. Here's the link:
subject: Java and C# encoding prob
It's not a secret anymore!