wood burning stoves 2.0*
The moose likes Performance and the fly likes Removing from a collection Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Removing from a collection" Watch "Removing from a collection" New topic
Author

Removing from a collection

Luke Murphy
Ranch Hand

Joined: May 12, 2010
Posts: 300
Suppose, I have a fixed of collection of library books. Someone can remove a book from the list.

I wish to be able to figure out which book(s) were removed quickly.

So very simply, you could have two collections. One for your total set of books and one for your current set of books.
To figure what was gone, you could clone the total set and then invoke removeAll() for the current set.

Anyone think of a more per formant way of doing it and why?

Thanks.

fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11479
    
  16

Can you simply have a flag on each item that marks it as 'present' or 'removed'? I see difficulty with trying to maintain two lists


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18907
    
    8

Take the sum of the book numbers from the complete list. Then take the sum of the book numbers from the potentially incomplete list. If the two numbers are the same then nothing is missing. If they aren't, then the difference is the book number of the missing book.
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30938
    
158

Paul,
That wouldn't work if two books were removed.

Luke,
I like Fred's solution. For yours, why not keep a list of the removed books too? Then you can get the info instantly? You'd have to have a wrapper class to keep the lists in sync. But you need to do that for two lists as much as you do for three.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Luke Murphy
Ranch Hand

Joined: May 12, 2010
Posts: 300
Jeanne Boyarsky wrote:Paul,
That wouldn't work if two books were removed.

Luke,
I like Fred's solution. For yours, why not keep a list of the removed books too? Then you can get the info instantly? You'd have to have a wrapper class to keep the lists in sync. But you need to do that for two lists as much as you do for three.


Yeah I thought about the sum one on the train home.

Give every book a unique number.
Then the difference between the sum before and sum after is the missing book. It just fails when you have more than 2 books missing. Unless you do something fancy with prime numbers.

It depends on how long it takes to sum. I presume very fast. As opposed to doing equals equals equals :-)
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3611
    
  60

I too think that much more practical is to maintain a list of removed books. But to elaborate on the numbers trick: assign each book a number starting from 0 (or 1, that does not matter) and represent a collection of books using BitSet. Operations on BitSets are done using bitwise instructions, so comparing two BitSets is pretty fast, as is iterating through it.

And unless you want to solve knapsack problems, which is pretty hard sort of mathematical puzzles, this is probably as efficient way to store such information (both performance- and space- wise) as any other solution based on sum of numbers.
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4659
    
    5

Martin Vajsar wrote:represent a collection of books using BitSet. Operations on BitSets are done using bitwise instructions, so comparing two BitSets is pretty fast, as is iterating through it.

Fairly fast, but a bitset is big, as you only get 32 values per long, or 8 per byte. Depending on the application, you might have a lot of books. There are about a million "books in print" today. Using 128K of memory may not be a good use of space.

A better answer is that you can't define a good solution without knowing the usage characteristics of the collection. If it represented the books in a public library, there would be a lot of books added and deleted over time. Or consider books in my house: they come in but rarely leave. So a good design would focus on being fast and compact for the 99% static books, and not care very much about tiny number deleted.
Lalit Mehra
Ranch Hand

Joined: Jun 08, 2010
Posts: 384

why not use two static variables ...

1 that contains the total number of books
2 that contains the current number of books in the library



http://plainoldjavaobject.blogspot.in
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11479
    
  16

Lalit Mehra wrote:why not use two static variables ...

1 that contains the total number of books
2 that contains the current number of books in the library



The OP asked to figure out WHICH books...not just how many.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3611
    
  60

Pat Farrell wrote:Fairly fast, but a bitset is big, as you only get 32 values per long, or 8 per byte. Depending on the application, you might have a lot of books. There are about a million "books in print" today. Using 128K of memory may not be a good use of space.

First and foremost, my reply was reaction to the immediately preceding post (Luke Murphy, February 23, 2011 20:19:14) and should be taken strictly in that context, having the original question in mind.

As you have mentioned too, the requirements are hazy. However, no external storage was discussed, so I assumed that list of all books will be kept in memory. As Luke dismissed the idea to keep removed books in separate list (which I would strongly prefer), one more bit per book already in memory is pretty thrifty.

Moreover, bitsets are one of most compact structures. 128KB of memory is not much by today's standard. And it would be just overhead of keeping 32K items in an array list (32K pointers, 4 bytes each). Set or Map would be several times worse still (at least three pointers per item, if I'm not mistaken, for the hash varieties, more for trees). A million books array would take 4MB of memory just for their references, and depending on the attributes kept (title, ISBN, publisher, author, ...), some 100-200 MB in total. A tenth of a percent overhead of that bitset is therefore quite good, I'd say.
Luke Murphy
Ranch Hand

Joined: May 12, 2010
Posts: 300
Lads,
I have an even quicker answer.

Say you have two lists:

list1 {1, 2, 3, 4, 5, 6, 7...}
list2 {1, 2, 3, 4, 5, 6...}

list1 is the full set of numbers.
list2 is the full set of numbers minus the number that has been removed.

if you "exclusive or", the set of all numbers in {list1 + list2} (i.e. both lists together} all pairs cancel each other out and you are left with the odd number out.

This is using bitwise operation and I think will be much faster.

That's a class little riddle.
 
 
subject: Removing from a collection