File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes C / C++ and the fly likes non-ascii character in UTF-8 string Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Languages » C / C++
Bookmark "non-ascii character in UTF-8 string" Watch "non-ascii character in UTF-8 string" New topic
Author

non-ascii character in UTF-8 string

naveen yadav
Ranch Hand

Joined: Oct 23, 2011
Posts: 384

I have a UTF-8 string in from which i want to find out which are non-ASCII characters.

lets say i have char arr[] = "x√ab c"; , and it has 1 non-ASCII character (√')

one way it to find the ascii characters from given UTF-8 string , excluding those i'll get the non-ASCII characters.

Given the following information from https://en.wikipedia.org/wiki/UTF-8#Description:
info 1:
One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0


info :2
another way is to find the UTF-8 code for a character. All ASCII character are range from U+0000 to U+007F


Using the any of the above info , how can i find non-ASCII character ? (or if there is any other way to find )

FYI:using gcc compiler

Thanks


Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 36452
    
  15
It says ckearky there which are ASCII characters. Anything < 128. So you can tell whether you have an ASCII character from the value of the corresponding char or *(myStringPointer + n)
Anand Hariharan
Rancher

Joined: Aug 22, 2006
Posts: 257

If your string is UTF-8, using a char array is a bad idea. Use a wchar_t array instead.

Check if you have an "isascii" function.

Edit: Corrected wchar to whcar_t

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: non-ascii character in UTF-8 string
 
Similar Threads
NX: URLyBird 1.2.1 - Character encoding
NX: reading the Datafile
String to int
java String UTF8
B&S: Yes... yet more 7 bit/8 bit US ASCII questions