• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Java 8 - Filter one list based on another

 
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Say you have a book, text, or whatever that you want to do frequency analysis on, but FIRST, you want to filter out words not to consider.

Below, I have three possible approaches I've been working on, but none correctly filters the "allWordsList" to avoid the "avoidWordList" words.

The output should avoid words "one" and "two", but when running this test code, I either get the entire "allWordsList" or nothing.

Thanks in advance for suggestions.

- mike


 
 
Mike London
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Got one of the ones above to work, finally.

Update on word list definitions:

 List<String> avoidWordList = Arrays.asList("one", "two");
 List<String> allWordsList = Arrays.asList("There", "were", "two", "people", "who", "always", "said", "one", "thing", "over", "and", "over");

.
.
.
// show all words except those in the avoidWordsList

       allWordsList.stream()
               .filter(word -> !avoidWordList
               .stream().collect(Collectors.toSet())
               .contains(word)).collect(Collectors.toList())
               .forEach(System.out::println);

--

All the permutations of putting these streaming pieces together can be very confusing and time consuming.

-- mike
 
Saloon Keeper
Posts: 15510
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That algorithm performs in O(m*n) though, where m is the size of your avoid list, and n is the size of your word list. Why not just build your avoid set right away?

This algorithm runs in O(n).
 
Mike London
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:That algorithm performs in O(m*n) though, where m is the size of your avoid list, and n is the size of your word list. Why not just build your avoid set right away?

This algorithm runs in O(n).



That's cool. How can you tell my posted code runs in O(m*n)? I don't see that.

There are a bazillion ways to put these streams together (which don't work as expected in many cases) and your posted one works fine -- though I don't know how long it would have take me to stumble on to your nice code without you posting it.

I really like the declarative model, but until you're an "expert", it takes more time than just writing the $*)#*$@# (imperative) code.

Thanks,

- mike
 
Stephan van Hulst
Saloon Keeper
Posts: 15510
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your algorithm runs in O(m*n), because for every word in your n-sized word list, the filter rebuilds the entire m-sized avoid set, and then throws it away again. When you build the set only once, the complexity becomes O(m+n), which is the same as O(n) if the avoid set is always shorter than the word list.

You should consider most terminal operations on streams (like collect()) to be equivalent to a for-loop. That means that if you perform a terminal operation inside a stream pipeline that is also terminated, it's essentially equivalent to a nested for-loop. Your method is equivalent to:

It all becomes much easier once you get a lot of practice with all of the stream operations and collectors. After a while all these operations become very natural to use, and piping them together will feel as straightforward as writing procedural code.
 
Mike London
Bartender
Posts: 1971
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Your algorithm runs in O(m*n), because for every word in your n-sized word list, the filter rebuilds the entire m-sized avoid set, and then throws it away again. When you build the set only once, the complexity becomes O(m+n), which is the same as O(n) if the avoid set is always shorter than the word list.

You should consider most terminal operations on streams (like collect()) to be equivalent to a for-loop. That means that if you perform a terminal operation inside a stream pipeline that is also terminated, it's essentially equivalent to a nested for-loop. Your method is equivalent to:

It all becomes much easier once you get a lot of practice with all of the stream operations and collectors. After a while all these operations become very natural to use, and piping them together will feel as straightforward as writing procedural code.



Stephan,

That's a very helpful explanation - especially helping me better understand terminal operations -- and at the right level for where I am.

Thank you very much for your patience and terrific help!

- mike
 
if you think brussel sprouts are yummy, you should try any other food. And this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic