• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

non-ascii character in UTF-8 string

 
Ranch Hand
Posts: 384
MyEclipse IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a UTF-8 string in from which i want to find out which are non-ASCII characters.

lets say i have char arr[] = "x√ab c"; , and it has 1 non-ASCII character (√')

one way it to find the ascii characters from given UTF-8 string , excluding those i'll get the non-ASCII characters.

Given the following information from https://en.wikipedia.org/wiki/UTF-8#Description:
info 1:

One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0



info :2

another way is to find the UTF-8 code for a character. All ASCII character are range from U+0000 to U+007F



Using the any of the above info , how can i find non-ASCII character ? (or if there is any other way to find )

FYI:using gcc compiler

Thanks


 
Marshal
Posts: 79178
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It says ckearky there which are ASCII characters. Anything < 128. So you can tell whether you have an ASCII character from the value of the corresponding char or *(myStringPointer + n)
 
Rancher
Posts: 280
VI Editor C++ Debian
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If your string is UTF-8, using a char array is a bad idea. Use a wchar_t array instead.

Check if you have an "isascii" function.

Edit: Corrected wchar to whcar_t
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic