Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Removing from a collection

 
Luke Murphy
Ranch Hand
Posts: 300
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Suppose, I have a fixed of collection of library books. Someone can remove a book from the list.

I wish to be able to figure out which book(s) were removed quickly.

So very simply, you could have two collections. One for your total set of books and one for your current set of books.
To figure what was gone, you could clone the total set and then invoke removeAll() for the current set.

Anyone think of a more per formant way of doing it and why?

Thanks.

 
fred rosenberger
lowercase baba
Bartender
Posts: 12122
30
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you simply have a flag on each item that marks it as 'present' or 'removed'? I see difficulty with trying to maintain two lists
 
Paul Clapham
Sheriff
Posts: 21107
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Take the sum of the book numbers from the complete list. Then take the sum of the book numbers from the potentially incomplete list. If the two numbers are the same then nothing is missing. If they aren't, then the difference is the book number of the missing book.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34375
345
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul,
That wouldn't work if two books were removed.

Luke,
I like Fred's solution. For yours, why not keep a list of the removed books too? Then you can get the info instantly? You'd have to have a wrapper class to keep the lists in sync. But you need to do that for two lists as much as you do for three.
 
Luke Murphy
Ranch Hand
Posts: 300
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeanne Boyarsky wrote:Paul,
That wouldn't work if two books were removed.

Luke,
I like Fred's solution. For yours, why not keep a list of the removed books too? Then you can get the info instantly? You'd have to have a wrapper class to keep the lists in sync. But you need to do that for two lists as much as you do for three.


Yeah I thought about the sum one on the train home.

Give every book a unique number.
Then the difference between the sum before and sum after is the missing book. It just fails when you have more than 2 books missing. Unless you do something fancy with prime numbers.

It depends on how long it takes to sum. I presume very fast. As opposed to doing equals equals equals :-)
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I too think that much more practical is to maintain a list of removed books. But to elaborate on the numbers trick: assign each book a number starting from 0 (or 1, that does not matter) and represent a collection of books using BitSet. Operations on BitSets are done using bitwise instructions, so comparing two BitSets is pretty fast, as is iterating through it.

And unless you want to solve knapsack problems, which is pretty hard sort of mathematical puzzles, this is probably as efficient way to store such information (both performance- and space- wise) as any other solution based on sum of numbers.
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Martin Vajsar wrote:represent a collection of books using BitSet. Operations on BitSets are done using bitwise instructions, so comparing two BitSets is pretty fast, as is iterating through it.

Fairly fast, but a bitset is big, as you only get 32 values per long, or 8 per byte. Depending on the application, you might have a lot of books. There are about a million "books in print" today. Using 128K of memory may not be a good use of space.

A better answer is that you can't define a good solution without knowing the usage characteristics of the collection. If it represented the books in a public library, there would be a lot of books added and deleted over time. Or consider books in my house: they come in but rarely leave. So a good design would focus on being fast and compact for the 99% static books, and not care very much about tiny number deleted.
 
Lalit Mehra
Ranch Hand
Posts: 384
Eclipse IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
why not use two static variables ...

1 that contains the total number of books
2 that contains the current number of books in the library

 
fred rosenberger
lowercase baba
Bartender
Posts: 12122
30
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Lalit Mehra wrote:why not use two static variables ...

1 that contains the total number of books
2 that contains the current number of books in the library



The OP asked to figure out WHICH books...not just how many.
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pat Farrell wrote:Fairly fast, but a bitset is big, as you only get 32 values per long, or 8 per byte. Depending on the application, you might have a lot of books. There are about a million "books in print" today. Using 128K of memory may not be a good use of space.

First and foremost, my reply was reaction to the immediately preceding post (Luke Murphy, February 23, 2011 20:19:14) and should be taken strictly in that context, having the original question in mind.

As you have mentioned too, the requirements are hazy. However, no external storage was discussed, so I assumed that list of all books will be kept in memory. As Luke dismissed the idea to keep removed books in separate list (which I would strongly prefer), one more bit per book already in memory is pretty thrifty.

Moreover, bitsets are one of most compact structures. 128KB of memory is not much by today's standard. And it would be just overhead of keeping 32K items in an array list (32K pointers, 4 bytes each). Set or Map would be several times worse still (at least three pointers per item, if I'm not mistaken, for the hash varieties, more for trees). A million books array would take 4MB of memory just for their references, and depending on the attributes kept (title, ISBN, publisher, author, ...), some 100-200 MB in total. A tenth of a percent overhead of that bitset is therefore quite good, I'd say.
 
Luke Murphy
Ranch Hand
Posts: 300
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Lads,
I have an even quicker answer.

Say you have two lists:

list1 {1, 2, 3, 4, 5, 6, 7...}
list2 {1, 2, 3, 4, 5, 6...}

list1 is the full set of numbers.
list2 is the full set of numbers minus the number that has been removed.

if you "exclusive or", the set of all numbers in {list1 + list2} (i.e. both lists together} all pairs cancel each other out and you are left with the odd number out.

This is using bitwise operation and I think will be much faster.

That's a class little riddle.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic