• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Rob Spoor
  • Devaka Cooray
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Jj Roberts
  • Al Hobbs
  • Piet Souris

Re-ordering in streams??

 
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi:

First, three things out of the way up front:
1. I am very well aware that arguably the biggest advantage of Streams is increased readability.  On a Jane Street interview, where they went with someone else for that position, and was quite challenging packing many things we regularly discuss here in a very short time, by the way...they answered my question "Why do you use Functional Programming for almost everything here?"  The answer was that their math people / quants / business analysts used to not be able to participate in code reviews, the code was so far out of the problem domain that it was unrecognizable to them.  After they switched to functional, the quants and business analysts can and do participate in all the code reviews, because they can just look at it and see if the code expresses what they intended.  Case closed, nothing about parallelization, etc. it was "business and maths guys can read our code" FTW.

2. Premature optimization is dumb and a waste of time.

3. The declarative style of Streams processing directly leads to benefit 1 and probably precludes 2, but...

are some people taking this too far?  I mean, this isn't turning our Java into SQL, is it?

I mean specifically this.

In SQL, SELECT with ORDER is 100.00% declarative and non-imperative.

I can't repeat my mnemonic for SFWGHO in a family-friendly environment, but it is THE SYNTAX, nothing to think about

SELECT name, grade FROM grades
WHERE grade > 85
ORDER BY grade


Back to Streams...

I see lots of people doing this or the exact equivalent:



or the like.

My procedural brain screams "Why did you sort all of them just to throw away most of those results??!  sort the select few after your fussy predicate rejects most of the original stream, dummy!"

It makes no difference if we only have 30 students, of course.  The time I took typing this and you took reading it will never be made back in better runtime regardless of the answer.

But sometimes the original stream is VERY large and the predicate is VERY picky.

I know you aren't supposed to presume that every stage in your pipeline will get run, Java may optimize them out like if you do:



Java says "Dummy, I don't need to extract the actual grade or to sort anything to tell ya the count, bro.  I'll just go right to the .count() and ignore your silly unnecessary .map() and .sorted() steps."  Or maybe it doesn't, but it could, so don't presume.  If the steps aren't there we know for SURE Java won't run them, if they are there and unnecessary then maybe, maybe not...

What about my example of .sorted() before finicky filter?  Are others:
1. just smarter because they know it rarely makes much difference / premature optimization
2. smarter because they know that Java may choose to re-order steps in a pipeline because it sees they are equivalent in results and a different order is cheaper? (sort of like SQL)
3. well, their code is fine for small examples but really would not scale very well if for instance, they were both filtering and sorting a HUGE data set with a finicky filter.

I guess that 1 is not exclusive to 2 or 3, but I do go on Big Data interviews too, and even for just doing HackerRank problems, unnecessary work often means "no full points for your problem, several test cases timed out".  I am planning to start defaulting to Streams on HackerRank problems too...

 
Marshal
Posts: 74341
334
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Would you want to use .map() in the first place?
Does Grade implement Comparable<? super Grade>, or do you have a Comparator<? super Grade>? You need one or the other for .sorted().
What you could do is, move the .sorted() call to after .filter(...).
Do we know that the .sorted() call is elided in your second example?
I don't know whether the calls will be reordered as a JIT optimisation. Remember the following two code blocks are not mutually equivalent:-
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Stacking and unstacking LIFO-style:

I am watching a lot of videos to help things sink in, and they are helpful to me whether they contain mistakes obvious to me, things I wonder about, or things that I never thought of.

I just need a lot of practice looking at other people's streams as well as writing them until it all becomes second nature.  Different mindset.  Not there yet, enjoying getting there.

What you could do is, move the .sorted() call to after .filter(...).


I would always do that by nature, not sure if it actually matters, depends on whether Java ever re-orders your pipelines, it seems like it could but likely would not -- the nominal question of the post.

Do we know that the .sorted() call is elided in your second example?


I dunno, do we?  It seems clear to me that by static analysis (no intermediate steps would change the count) it COULD be, as the stream is dead at the moment .count() returns, but I really don't know.  A related question I was asking, since taking pains to eliminate unnecessary steps is more or less useful based on the answer to that.

I don't know whether the calls will be reordered as a JIT optimisation.


Neither do I.  I think the suggestion is "don't depend on them being either elided or executed, your code had best work right either way"?

Remember the following two code blocks are not mutually equivalent:-


I definitely recognize that.  On a large or infinite stream, the size of the resulting list will ALWAYS be a maximum of 123L in the first case, irrespective of the data and the test.

On an infinite stream, the second pipeline could hang forever if we aren't getting anything thru the filter, or even if we get less than 123 elements thru before we never see one again.  So whether it even ever terminates in the second case is dependent upon the combination of the data in the stream and the test.

Assuming they even terminate, the first one will never exceed a size of 123 for the list but could often be much smaller, as we are filtering down after, possibly reducing the size.
The second one will collect up to the first 123 elements that pass thru the filter, so a very large stream would be more likely to actually result in a List of size 123...

Would you want to use .map() in the first place?


I keep seeing streams examples that seem to contain unnecessary work in them.  In this case, .sorted() would either be operating on Student elements or their grades, which could yield different ordering or the same depending on what is passed as an explicit parameter to .sorted().

Does Grade implement Comparable<? super Grade>, or do you have a Comparator<? super Grade>? You need one or the other for .sorted().


I actually saw this from one of the people who do lots of work and lots of teaching and never properly implement Comparable or Comparator by the standards we both share.
They characteristically neglect to override .equals() to be consistent with either.
I could have immediately said "Don't watch or read anything from this guy" but still feel I have a lot to learn from their presentations despite that.
In this case they implemented Comparable to yield a partial sort that is not compatible with .equals() YUCK, and note that they are using that broken Comparableto sort Student elements in the stream.
 
Master Rancher
Posts: 4052
56
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:

Do we know that the .sorted() call is elided in your second example?


I dunno, do we?  It seems clear to me that by static analysis (no intermediate steps would change the count) it COULD be, as the stream is dead at the moment .count() returns, but I really don't know.  A related question I was asking, since taking pains to eliminate unnecessary steps is more or less useful based on the answer to that.


I know that it's elided on my machine.  (Use a debugger, or implement compareTo() to always throw a RuntimeException, which I see is never thrown.) We have no way of knowing if it's elided on other JVM / library implementations.  But the JavaDoc for Stream.count() does specifically note that it's possible to skip operations in the pipeline, if it can be determined that they aren't necessary to determine the count.

@Jesse, once again I don't think it's premature optimization to worry about these things.  You aren't making your code more complex to optimize here - at most, you're reordering a few things, and in other cases, you would simply eliminate some unnecessary calls.  So there's no downside to taking steps that result in code that is either equal to or better than your existing code.  Even if the system does optimize the differences away, since we don't know in advance that it will do that, I sleep better knowing I have already organized the code sensibly, rather than relying on optimizations that may or may not happen.
 
Jesse Silverman
Saloon Keeper
Posts: 1606
52
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The questions "What is Premature Optimization?" and the related "No, wait, what even is Optimization?" have both come up recently.

I really think the former category covers writing nearly-unreadable, over-complicated code in the perhaps Quixotic pursuit of Raw Speed, extra PO points if you break corner cases or mislead the next person maintaining the code causing them to do so and thereby PO your users.

Spending more time thinking before you check-in, start running your tests, or in this case, show your code to 10000 people in a video is not what I consider Premature Optimization.

Now, Tim has made clear a few times that there is an economic factor, many shops prefer the person who gets their code checked in the fastest, not she who checks in the fastest code.

That is real and needs to be recognized and dealt with (unless retired), in fact many places will cut corners on correctness too, not just performance...

But yeah, I think the answer to "Are these people treating Java streams like SQL taking the advantage of a more declarative style (a good thing) too literally?" is probably a "Yup".

It seems common and feels weird because we DO want them to be moving towards a more declarative style, but take a look at your pipeline and ask yourself "Does it make sense?"

Side note: I notice tons and tons and tons of examples showing optional.get() or optional.getAsInt() etc. in every single example and not even mentioning "Careful, this will blow up your program on an empty result set."  Maybe don't mention it every time, but don't reinforce a worst practice while showing one new place to apply it after another...
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic