File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes C / C++ and the fly likes non-ascii character in UTF-8 string Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Languages » C / C++
Bookmark "non-ascii character in UTF-8 string" Watch "non-ascii character in UTF-8 string" New topic

non-ascii character in UTF-8 string

naveen yadav
Ranch Hand

Joined: Oct 23, 2011
Posts: 384

I have a UTF-8 string in from which i want to find out which are non-ASCII characters.

lets say i have char arr[] = "x√ab c"; , and it has 1 non-ASCII character (√')

one way it to find the ascii characters from given UTF-8 string , excluding those i'll get the non-ASCII characters.

Given the following information from
info 1:
One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0

info :2
another way is to find the UTF-8 code for a character. All ASCII character are range from U+0000 to U+007F

Using the any of the above info , how can i find non-ASCII character ? (or if there is any other way to find )

FYI:using gcc compiler


Campbell Ritchie

Joined: Oct 13, 2005
Posts: 46410
It says ckearky there which are ASCII characters. Anything < 128. So you can tell whether you have an ASCII character from the value of the corresponding char or *(myStringPointer + n)
Anand Hariharan

Joined: Aug 22, 2006
Posts: 272

If your string is UTF-8, using a char array is a bad idea. Use a wchar_t array instead.

Check if you have an "isascii" function.

Edit: Corrected wchar to whcar_t

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery
I agree. Here's the link:
subject: non-ascii character in UTF-8 string
It's not a secret anymore!