wood burning stoves 2.0*
The moose likes Programmer Certification (SCJP/OCPJP) and the fly likes UTF vs Unicode (JQ+) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Certification » Programmer Certification (SCJP/OCPJP)
Bookmark "UTF vs Unicode (JQ+)" Watch "UTF vs Unicode (JQ+)" New topic
Author

UTF vs Unicode (JQ+)

Mariusz Szurnacki
Ranch Hand

Joined: Sep 12, 2001
Posts: 44
Question ID :988397479953
Which of the following encoding schemes is used by the jvm internally for storing identifiers etc.?
- Unicode
- UTF8
- ASCII
- 8859_1
- It depends on the platform.
I answered Unicode, but it's the wrong answer: according to JQ+ it should be UTF8. But I think I was right, cos
inside JVM, text is represented in 16 bit Unicode and for I/O, UTF is used.
Could you put me right?
Have a nice day,
Mariusz

<BR>
Fei Ng
Ranch Hand

Joined: Aug 26, 2000
Posts: 1242
For efficient reason they use UTF8 since unicode is not
particularly space efficient. But VM does translate them externally back from UTF8 to Unicode efficiently.
Correct me if i am wrong.
Marcus Green
arch rival
Rancher

Joined: Sep 14, 1999
Posts: 2813
Characters stored as Unicode always occupy 2 bytes. UTF8 is a way of storing both Unicode and ASCII text. If the text is within the ASCII range it will occupy 1 byte, if it is larger than the 1 byte range of ASCII it will use the Unicode encoding scheme and occupy more than 1 byte.
As much of the worlds text is stored within the range of ASCII UTF8 offers considerable space saving whilst allowing the huge character representation of the Unicode encoding scheme.
Marcus

------------------

http://www.jchq.net Mock Exams, FAQ,
Tutorial, Links, Book reviews
Java 2 Exam Prep, 2nd Edition by Bill Brogden and Marcus Green
=================================================
JCHQ, Almost as good as JavaRanch
=================================================


SCWCD: Online Course, 50,000+ words and 200+ questions
http://www.examulator.com/moodle/course/view.php?id=5&topic=all
Mariusz Szurnacki
Ranch Hand

Joined: Sep 12, 2001
Posts: 44
Hi again!!!
Thanks for your answers. I know what is Unicode and UTF, but I'm still not sure about the right answer to my question: "Which of the following encoding schemes (Unicode or UTF) is used by the jvm internally for storing identifiers etc.?".
According to RHE:
"Java uses two kinds of text representation:
- Unicode for internal representation of characters and strings
- UTF for input and output.
(...)
The outside-the-computer format for Unicode is known as UTF.".
So I think we all are sure that Java�s char data type uses Unicode encoding (and in this way String class too), and UTF is used for I/O.
But which encoding is used by the jvm internally for storing identifiers?
Have a nice day,
Mariusz
Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3335
    
    8
Please read section "4.4.7 The CONSTANT_Utf8_info Structure
" of JVM spec. ( http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#7963 )
-Paul.
------------------
SCJP2, SCWCD Resources, Free Question A Day, Mock Exam Results and More!
www.jdiscuss.com
Get Certified, Guaranteed!
JQPlus - For SCJP2
JWebPlus - For SCWCD
JDevPlus - For SCJD


Enthuware - Best Mock Exams and Questions for Oracle/Sun Java Certifications
Quality Guaranteed - Pass or Full Refund!
Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3335
    
    8
The method names, field names etc. are all represented using this CONSTANT_Utf8_info .
-Paul.
Marcus Green
arch rival
Rancher

Joined: Sep 14, 1999
Posts: 2813
I have read that explanation in RHE several times and I have concluded it doesn't tell me much at all. I did lot of research on the web to find supporting information without any luck at all. I have a copy of the excellent Rusty Harold I/O book and that doesn't throw much light on the topic.
I have not heard of this topic coming up in the exam, even though the objectives imply that it might.
Mr Earnest?
Marcus
------------------

<A HREF="http://www.jchq.net</A>" TARGET=_blank>http://www.jchq.net[/URL]
Mock Exams, FAQ,
Tutorial, Links, Book reviews
Java 2 Exam Prep, 2nd Edition by Bill Brogden and Marcus Green
=================================================
JCHQ, Almost as good as JavaRanch
=================================================
[This message has been edited by Marcus Green (edited November 06, 2001).]
Mariusz Szurnacki
Ranch Hand

Joined: Sep 12, 2001
Posts: 44
Thanks Paul!!!
Jose Botella
Ranch Hand

Joined: Jul 03, 2001
Posts: 2120
But UTF8 is not used by the JVM but a modified version of it, so I don't think the answer UTF8 is right either.
I guess this question has not an exact aswer, and besides that maybe it is not likely to appear in the exam:

I think that a SCJP-to-be should know that the source of a java program can utilize Unicode for String and character literals, identifiers and comments. But I don't think that she/he should know the exact format by which descriptors, special strings and the content of Strings is stored whithin the JVM.


SCJP2. Please Indent your code using UBB Code
Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3335
    
    8
Well yes, the spec. does say that this is a little different than "standard" UTF8 but all over the place they still call it UTF8.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: UTF vs Unicode (JQ+)