• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Rob Spoor
  • Devaka Cooray
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Jj Roberts
  • Al Hobbs
  • Piet Souris

Serious Question About PrintStream JavaDocs

 
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I was trying to get clear in my mind the distinction between PrintStream and PrintWriter.

I already think I am usually going to be leaning towards PrintWriter, but want to fully understand the distinction.

Looking at the Javadocs page for this:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/PrintStream.html

(Java SE 11 here because that is the latest LTS/covered on certifications)

We see originally that a PrintStream could only be created using default encoding.

That was a pretty big limitation, I think, so we see that in:
Java 1.4 we gained the following constructor:

and
in Java 1.5 we gained the following constructors:


and now just recently in Java 10 we now have even more constructors:


So far it looks like Oracle has bent over backwards to free us from the "Tyranny of the Default Charset", so far so good.

Trying to understand the differences between PrintStream and PrintWriter, I examined the methods offered for writing and found (among others):
public void print​(boolean b)
Prints a boolean value. The string produced by String.valueOf(boolean) is translated into bytes according to the platform's default character encoding, and these bytes are written in exactly the manner of the write(int) method.

public void print​(int i)
Prints an integer. The string produced by String.valueOf(int) is translated into bytes according to the platform's default character encoding, and these bytes are written in exactly the manner of the write(int) method.

public void print​(long l)
Prints a long integer. The string produced by String.valueOf(long) is translated into bytes according to the platform's default character encoding, and these bytes are written in exactly the manner of the write(int) method.

public void print​(double d)
Prints a double-precision floating-point number. The string produced by String.valueOf(double) is translated into bytes according to the platform's default character encoding, and these bytes are written in exactly the manner of the write(int) method.

Now, I guess these are going to be the same byte values regardless, (unless the character ENCODING was UTF-16 in which case wouldn't even these be different?  But maybe that never happens??  UTF-16 text files may be getting rarer, but I know I have worked with them??) so maybe I am just getting nervous about nothing here...

But seeing:
public void print​(char c)
Prints a character. The character is translated into one or more bytes according to the platform's default character encoding, and these bytes are written in exactly the manner of the write(int) method.

public void print​(char[] s)
Prints an array of characters. The characters are converted into bytes according to the platform's default character encoding, and these bytes are written in exactly the manner of the write(int) method.

public void print​(String s)
Prints a string. If the argument is null then the string "null" is printed. Otherwise, the string's characters are converted into bytes according to the platform's default character encoding, and these bytes are written in exactly the manner of the write(int) method.

Wait, if those descriptions are true, and they just always use the same default character encoding, what the heck was the purpose of all the fancy new constructors that allowed us to "Escape the tyranny of default encoding"??

Are the Java SE 11 Javadocs wrong, or am I misunderstanding the whole purpose of overriding the default character set encoding in PrintStream constructors?

As a bonus, the original question I was trying to answer was when one might opt to use a PrintStream instead of a PrintWriter.  Looking at these docs just made me confused about stuff I already thought I knew so far.
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Even more confused now, I have all the same (or nearly equivalent) questions about PrintWriter too!
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/PrintWriter.html

If all the nice overloaded print() methods use the default encoding, what is the point of giving us nifty overloaded constructors to specify an encoding in several different ways??

studyStream.setError();
 
Saloon Keeper
Posts: 24499
167
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Writers (and Readers) are more abstract than I/O Streams. A Stream is basically a transport channel to connect to the I/O endpoint (file or device), and you'd bind a Writer onto a Stream to get the high-level functionality specific to the format of the data that the Stream will be transporting.
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Writers (and Readers) are more abstract than I/O Streams. A Stream is basically a transport channel to connect to the I/O endpoint (file or device), and you'd bind a Writer onto a Stream to get the high-level functionality specific to the format of the data that the Stream will be transporting.



I know that is generally true and is a quotable quote.

PrintStream itself has a lot of high-level functionality.  I am currently confused about how it works, and writing code to tell what the heck is going on, but my confusion extends equally to PrintStream and PrintWriter classes, each with respect to print() overload behavior on instances created with other than the platform default character set selected.
 
Tim Holloway
Saloon Keeper
Posts: 24499
167
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, my memory is fuzzy, but I think that PrintStream came first and then PrintWriter was developed as part of the process of making things more abstract. But there's also another matter. I'm pretty sure that I/O buffers attach to Streams and not to Writers/Readers. And a particular quirk of print output is that unlike generic buffers, you don't wait for the buffer to completely fill before committing the data downstream. Instead, the output processor is sensitized to end-of-line indicators so that when your say "print 'enter command:'", abstractly speaking, the command prompt won't sit in the buffer and be invisible to the end user.

In other words, end-of-line automatically invokes the flush() method.
 
Master Rancher
Posts: 4052
56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
According to the PrintStream javadoc, most PrintStream constructors create a stream without automatic flushing.  The exceptions are the constructors that include a boolean argument, tellingly named "autoFlush" - if you set that to true, you get automatic flushing.  Otherwise, not.

PrintStream did come before PrintWriter (or any Writers).  When Writers came out, they were intended to better handle character-based data, while OutputStreams and InputStreams were more byte-based.  Unfortunately, PrintStream already had many character-based methods (e.g. println(String)), and the frequently-used System.out remains a PrintStream rather than PrintWriter.  So the attempt to get us to always use Writer for character-based output was always a bit schizophrenic, and I guess they eventually added further upgrades to PrintStream's character handling (e.g. for Charsets) because it just wouldn't die.
 
Tim Holloway
Saloon Keeper
Posts: 24499
167
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, and since Java isn't Microsoft, false starts tend to be supported forever. For example, the (m, d, y) constructor for java.util.Date has been deprecated since about the Year 2000, but it still works (you just get yelled at by the compiler).

In fact, while in most cases, we do want to read and write text in the most natural form for the operating system and hardware we're talking to, there are cases where it's still likely to be useful to think in terms of bytes. So I doubt you'll see extreme deprecation for PrintStream. It's also a base class for LogStream. Which is deprecated.

And to go straight to the authority:

javadocs wrote:
All characters printed by a PrintStream are converted into bytes using the given encoding or charset, or platform's default character encoding if not specified. The PrintWriter class should be used in situations that require writing characters rather than bytes.



Which is their official backing for what I just proposed.
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Yes, and since Java isn't Microsoft, false starts tend to be supported forever. For example, the (m, d, y) constructor for java.util.Date has been deprecated since about the Year 2000, but it still works (you just get yelled at by the compiler).

In fact, while in most cases, we do want to read and write text in the most natural form for the operating system and hardware we're talking to, there are cases where it's still likely to be useful to think in terms of bytes. So I doubt you'll see extreme deprecation for PrintStream. It's also a base class for LogStream. Which is deprecated.

And to go straight to the authority:

javadocs wrote:
All characters printed by a PrintStream are converted into bytes using the given encoding or charset, or platform's default character encoding if not specified. The PrintWriter class should be used in situations that require writing characters rather than bytes.



Which is their official backing for what I just proposed.



All of that is super-interesting, and I do care about it, but doesn't the last part of what you quoted scream out loud that all of the method descriptions are INCORRECT?
Which was my original point.

I made and ate lunch with my wife and did two long dog walks since I posted.

Also, I wrote a bunch of test code (so far just with PrintWriter).

I wrote this program to try to write a bunch of Unicode to a text file using default encoding vs. explicitly opening the PrintWriter using another Charset.
This was clearly Windows-1252 using defaults running Eclipse native on Windows 10.

I ran it there and got the expected fails:
stdout:
windows-1252
windows-1252

file contents:
Holy beans, batman!??? <-- useless!! <br /> <br /> I changed my source code in Eclipse to UTF-8, which I think actually changes the whole project including how Eclipse invokes Java. <br /> At this point running in Eclipse gave me: <br /> stdout: <br /> UTF-8 <br /> UTF-8 <br /> <br /> File contents: <br /> PS F:\Java\UTF8Write\bin> type ../../default.txt
Holy beans, batman!😉🤷‍♀️🎂😎🐱‍🚀

Well, that didn't directly address my point, because, as I guessed at lunch and STDOUT showed, when I tried to change the source code encoding, Eclipse changed how it invoked Java as well, that changed the default code page (oops, I mean Charset) itself.  I got the same garbage when I invoked Java and my class from the Windows 10 command line (even in Windows Terminal).

So I opened the PrintWriter() specifying the different character sets and found:
PS F:\Java\UTF8Write\bin> type ../../default.txt
Holy beans, batman!???
whether the default due to how I invoked Java was windows-1252 or UTF-8, as long as I opened the PrintWriter with "windows-1252"

and the output was:
PS F:\Java\UTF8Write\bin> type ../../default.txt
Holy beans, batman!😉🤷‍♀️🎂😎🐱‍🚀
whether the default was windows-1252 or UTF-8, as long as I opened the PrintWriter with "UTF-8"

So the lines in the docs are wrong.

There is no deep significance to the emojis chosen, I was in a rush because I couldn't stop wondering how this all worked...

The answer to my original question is that the words:

All characters printed by a PrintStream are converted into bytes using the given encoding or charset, or platform's default character encoding if not specified.

are correct, all descriptions of the behavior of individual methods cited by me are INCORRECT.

And me being confused about the use cases for PrintStream vs. PrintWriter is less about me being dense than the weird, wild, wooly history of how IOStreams evolved.  The take-away is:
PrintStream can be used to mix binary bytes of any variety with encoded strings, if you really know what you are doing and have some valid use case for that (or just wish to be perverse).  For the encoded Strings, it will use the encoding you specified if valid, or just default to the default character encoding the JVM started up with on that run if you didn't specify a Charset using any of the several constructor overloads enabling you to do so.

PrintWriter converts EVERYTHING first to Java Strings and then writes those Strings out with the encoding of your choice, or the default character encoding the JVM started up with if you were too lazy to specify it, or wrote your code before that was an option.
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Yes, and since Java isn't Microsoft, false starts tend to be supported forever....

And to go straight to the authority:

javadocs wrote:
All characters printed by a PrintStream are converted into bytes using the given encoding or charset, or platform's default character encoding if not specified. The PrintWriter class should be used in situations that require writing characters rather than bytes.



Which is their official backing for what I just proposed.



I am still confused by the quote you gave from PrintStream.
All the print() and println() methods are going to behave the same in both classes, unless I am still confused (easily possible!)

I see some extra write() overloads regarding char, char[] and String missing in PrintStream compared to PrintWriter, but who needs them, as you can just use the print() overloads anyway??

PrintStream will allow you to write raw random bytes, PrintWriter will not.

So, the quote should be something like:

javadocs wrote:
All characters printed by a PrintStream are converted into bytes using the given encoding or charset, or platform's default character encoding if not specified. PrintStream also provides overloads of write() allowing writing of raw bytes and arrays of raw bytes not emitted by any encoding.  The PrintWriter class should be used in situations that only require writing characters rather than raw bytes.

 
Mike Simmons
Master Rancher
Posts: 4052
56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Note that PrintStream's write(int) method still (confusingly but correctly) follows the contract of OutputStream's write method, which silently ignores all but the 8 lowest-order bits.  Which may happen to match the default encoding in many cases, but certainly may be different in others.  It's a nasty gotcha if you accidentally use write() rather than print().
 
Saloon Keeper
Posts: 13366
295
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hmm I just looked at the source code for PrintStream and it appears to just use the encoding passed into the constructor, regardless of what the Javadoc says.

This seems to me like they forgot to update the Javadoc of the methods after they added the constructors.

Congrats, I think it's rare to find something like this in the Javadocs. File a report with Oracle/OpenJdk maybe?
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Hmm I just looked at the source code for PrintStream and it appears to just use the encoding passed into the constructor, regardless of what the Javadoc says.

This seems to me like they forgot to update the Javadoc of the methods after they added the constructors.

Congrats, I think it's rare to find something like this in the Javadocs. File a report with Oracle/OpenJdk maybe?



Oracle seems to be useless on this, it was the most depressing non-political thing I've seen in some time, tho I am a bit worried about the Delta variant of COVID (not for myself, I am recently immunized):
https://coderanch.com/t/743470/java/Submitting-errata-Javadocs

Maybe there is OpenJDK process that is independent from Oracle that will actually fix serious doc bugs in less than 19 years?  I wrote code to confirm the Javadoc was wrong to confirm that I could actually do the things that Oracle's hard-working programmers now let us do -- the more I read the docs, the less I was sure until I wrote and ran the dang code.
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
They finally fixed part of the broken docs recently (only for Java 14 and up) and they did leave out other stuff that is still broken in the docs:
https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/io/FileWriter.html

We can pretend that we don't care about the encoding of numerics and boolean values because they are all "plain ASCII" so don't need to respect encoding anyway, so it is all moot.

Except that while UTF-16 isn't everyone's favorite, it is still a thing, still legal in most states and still used in some contexts.  Even UTF-32 is de-criminalized.

I confirmed that (of course) even numeric data DOES respect the chosen encoding for the PrintWriter/PrintStream, and even the docs they touched to update does not show this:
(run with UTF-16)
0000000: feff 0048 006f 006c 0079 0020 0062 0065  ...H.o.l.y. .b.e
0000010: 0061 006e 0073 002c 0020 0062 0061 0074  .a.n.s.,. .b.a.t
0000020: 006d 0061 006e 0021 0031 0032 0033 0034  .m.a.n.!.1.2.3.4
0000030: 0035 0036 d83d de09 d83e dd37 200d 2640  .5.6.=...>.7 .&@
0000040: fe0f d83c df82 d83d de0e d83d dc31 200d  ...<...=...=.1 .
0000050: d83d de80 0d0a                           .=....

I learned a lot today, but am a little jaded about the health of the Javadocs for PrintStream/PrintWriter

Unless writing strings in a non-default encoding is somehow considered truly arcane, it seems bad.
Running Java in "windows-1252" mode while living in a now "UTF-8" Universe, it doesn't seem even semi-weird to me.
I guess it will be less relevant when the whole world is universally uniformly UTF-8...
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My bad, guys -- this clearly belonged in the other forum:
https://coderanch.com/f/38/java-io

I normally don't make this mistake.

I should have known something was up when Rob Spoor didn't notice these.

If someone wants to move it for posterity, as I think the things that confused me about this could easily hit other people, I am good with that.

I will remain careful about posting to the most appropriate forum, not sure how I missed this after realizing it isn't specific to Certification I just jumped to "Java in General" which would have been correct if there were not a separate forum about IOStreams!
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic