• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

reading and writing Strings with accent marks

 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm reading in Spanish words from a text file, with characters like á and ñ. Then I'm separating them into groups and writing them to new text files. The Strings with accent marks look fine in the input text file and look fine in the output text file. However, some of the logic in that separates them in the groups is behaving incorrectly. This is because accented characters are treated as multiple characters in the logic, i.e. á is treated as á. So for example, the logic thinks cádiz and cáliz have the same first 3 letters (cá), when in fact they only have the same first 2 letters (cá). So cádiz and cáliz are both put in the cá output file, instead of cádiz in the cád file and cáliz in the cál file.

How do I get the logic to treat accented characters correctly? How do I get cádiz in the cád file and cáliz in the cál file?

Does it have something to do with Locale? If so, exactly what code do I need to write, because I've been messing around with Locale for a while now with no luck.
 
Sheriff
Posts: 3063
12
Mac IntelliJ IDE Python VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It might have something to do with character encodings, depending how the text is stored in the text files. Once you pull it into Java Strings though, everything should be Unicode, and those special characters should be single chars, not a combination of two. How exactly do you read the input files? A Reader? An InputStream? Some combination of the two?
 
za zan
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here is my code for reading in the words.

Lots of words get printed out with á, none with á. However, the words in the input text file and later the output text files have á and not á. It looks like only java is treating the accented character as 2 characters.
 
za zan
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This forum will not allow me to attach files with extension .txt, .list, or no extension, so I've copied and pasted a sample of the input text file into this post, which contains some of the offending characters.

anexarán
anexarás
anexaría
anexasen
anexases
anexaste
anexemos
anexitis
anfibias
anfibios
angelita
angelito
angelote
angoleña
angoleño
 
Greg Charles
Sheriff
Posts: 3063
12
Mac IntelliJ IDE Python VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, I think the problem is that your files are encoded as UTF-8, and you are reading them as ISO-8859-1 (or Cp1252 if you're on Windows). I'm not sure why your files are encoded that way though, since the standard encoding should be fine for Spanish. In any case the FileReader will use the default encoding for the platform, so you want to do something like:

 
za zan
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks a lot. That seems to have solved my problem.
 
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Too difficult a question for the beginner's forum. Moving.

And have a look at the Joel Spolsky article: here.
 
Greenhorn
Posts: 2
Netbeans IDE Windows XP Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Greetings to all,
the post is outdated but i think this comes all the time. i had this issue myself since my company uses french mostly and it uses charcater with accents.
the solution i found is using unicode.
this is not from me, i found this on http://www.eteks.com/tips/tip3.html
it's in french but the table is very clear . the author offers HTML codes for accents too

enjoy.
 
Campbell Ritchie
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch
 
Neo Zoon
Greenhorn
Posts: 2
Netbeans IDE Windows XP Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks Ritchie, i hope i can help and get helped in here ^^
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic