Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes C / C++ and the fly likes non-ascii character in UTF-8 string Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Languages » C / C++
Bookmark "non-ascii character in UTF-8 string" Watch "non-ascii character in UTF-8 string" New topic
Author

non-ascii character in UTF-8 string

naveen yadav
Ranch Hand

Joined: Oct 23, 2011
Posts: 384

I have a UTF-8 string in from which i want to find out which are non-ASCII characters.

lets say i have char arr[] = "x√ab c"; , and it has 1 non-ASCII character (√')

one way it to find the ascii characters from given UTF-8 string , excluding those i'll get the non-ASCII characters.

Given the following information from https://en.wikipedia.org/wiki/UTF-8#Description:
info 1:
One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0


info :2
another way is to find the UTF-8 code for a character. All ASCII character are range from U+0000 to U+007F


Using the any of the above info , how can i find non-ASCII character ? (or if there is any other way to find )

FYI:using gcc compiler

Thanks


Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 40034
    
  28
It says ckearky there which are ASCII characters. Anything < 128. So you can tell whether you have an ASCII character from the value of the corresponding char or *(myStringPointer + n)
Anand Hariharan
Rancher

Joined: Aug 22, 2006
Posts: 258

If your string is UTF-8, using a char array is a bad idea. Use a wchar_t array instead.

Check if you have an "isascii" function.

Edit: Corrected wchar to whcar_t

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: non-ascii character in UTF-8 string