Suppose, I have a fixed of collection of library books. Someone can remove a book from the list.

I wish to be able to figure out which book(s) were removed quickly.

So very simply, you could have two collections. One for your total set of books and one for your current set of books.
To figure what was gone, you could clone the total set and then invoke removeAll() for the current set.

Anyone think of a more per formant way of doing it and why?

Take the sum of the book numbers from the complete list. Then take the sum of the book numbers from the potentially incomplete list. If the two numbers are the same then nothing is missing. If they aren't, then the difference is the book number of the missing book.

Paul,
That wouldn't work if two books were removed.

Luke,
I like Fred's solution. For yours, why not keep a list of the removed books too? Then you can get the info instantly? You'd have to have a wrapper class to keep the lists in sync. But you need to do that for two lists as much as you do for three.

Jeanne Boyarsky wrote:Paul,
That wouldn't work if two books were removed.

Luke,
I like Fred's solution. For yours, why not keep a list of the removed books too? Then you can get the info instantly? You'd have to have a wrapper class to keep the lists in sync. But you need to do that for two lists as much as you do for three.

Yeah I thought about the sum one on the train home.

Give every book a unique number.
Then the difference between the sum before and sum after is the missing book. It just fails when you have more than 2 books missing. Unless you do something fancy with prime numbers.

It depends on how long it takes to sum. I presume very fast. As opposed to doing equals equals equals :-)

I too think that much more practical is to maintain a list of removed books. But to elaborate on the numbers trick: assign each book a number starting from 0 (or 1, that does not matter) and represent a collection of books using BitSet. Operations on BitSets are done using bitwise instructions, so comparing two BitSets is pretty fast, as is iterating through it.

And unless you want to solve knapsack problems, which is pretty hard sort of mathematical puzzles, this is probably as efficient way to store such information (both performance- and space- wise) as any other solution based on sum of numbers.

Martin Vajsar wrote:represent a collection of books using BitSet. Operations on BitSets are done using bitwise instructions, so comparing two BitSets is pretty fast, as is iterating through it.

Fairly fast, but a bitset is big, as you only get 32 values per long, or 8 per byte. Depending on the application, you might have a lot of books. There are about a million "books in print" today. Using 128K of memory may not be a good use of space.

A better answer is that you can't define a good solution without knowing the usage characteristics of the collection. If it represented the books in a public library, there would be a lot of books added and deleted over time. Or consider books in my house: they come in but rarely leave. So a good design would focus on being fast and compact for the 99% static books, and not care very much about tiny number deleted.

Pat Farrell wrote:Fairly fast, but a bitset is big, as you only get 32 values per long, or 8 per byte. Depending on the application, you might have a lot of books. There are about a million "books in print" today. Using 128K of memory may not be a good use of space.

First and foremost, my reply was reaction to the immediately preceding post (Luke Murphy, February 23, 2011 20:19:14) and should be taken strictly in that context, having the original question in mind.

As you have mentioned too, the requirements are hazy. However, no external storage was discussed, so I assumed that list of all books will be kept in memory. As Luke dismissed the idea to keep removed books in separate list (which I would strongly prefer), one more bit per book already in memory is pretty thrifty.

Moreover, bitsets are one of most compact structures. 128KB of memory is not much by today's standard. And it would be just overhead of keeping 32K items in an array list (32K pointers, 4 bytes each). Set or Map would be several times worse still (at least three pointers per item, if I'm not mistaken, for the hash varieties, more for trees). A million books array would take 4MB of memory just for their references, and depending on the attributes kept (title, ISBN, publisher, author, ...), some 100-200 MB in total. A tenth of a percent overhead of that bitset is therefore quite good, I'd say.

list1 is the full set of numbers.
list2 is the full set of numbers minus the number that has been removed.

if you "exclusive or", the set of all numbers in {list1 + list2} (i.e. both lists together} all pairs cancel each other out and you are left with the odd number out.

This is using bitwise operation and I think will be much faster.