File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes HashMap interview Question. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "HashMap interview Question." Watch "HashMap interview Question." New topic
Author

HashMap interview Question.

Guy Emerson
Greenhorn

Joined: Dec 14, 2010
Posts: 27
Hi all,
In a recent interview I was given a scenario wherein there is a database which contains millions of records even duplicate records as well. Now in a java program if we access these records, how would I ensure that the data I am accessing is not duplicate?

They provided a hint that Maps can be used, or may be hashcode() method!!

Now the question is how?

Thanks

naved momin
Ranch Hand

Joined: Jul 03, 2011
Posts: 688

Guy Emerson wrote:Hi all,
In a recent interview I was given a scenario wherein there is a database which contains millions of records even duplicate records as well. Now in a java program if we access these records, how would I ensure that the data I am accessing is not duplicate?

They provided a hint that Maps can be used, or may be hashcode() method!!

Now the question is how?

Thanks


hashmap can store unique key , and duplicate values
so if empid is unique put it in key so in that manner your hashmap will never save two keys which is same
for eg : hashmap cannot store hashmap.put(1,"naved");
hashmap.put(1,"sam");
but this is possible
hashmap.put(2,"naved");

alternative you can print the hashcode of two stuff if they are same there hashcode will be same
however i m not very much sure with the hashcode thing but hashmap will be the solution for this problem
lets see what others says about hashcode .....


The Only way to learn is ...........do!
Visit my blog http://inaved-momin.blogspot.com/
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3606
    
  60

I don't know what the interviewer wanted to hear, but the correct way would be to obtain the data without duplicates right from the database, using proper SQL construct (ie. select distinct or group by). Databases were designed to handle tasks like these.

Now if I was forced to do this in Java, putting the records into a HashSet would be the natural way of doing it. Yes, hashcode() plays a role in this. If you're unsure why or how, I'd recommend reading any Java Collection Framework tutorial. You should definitely do it before your next interview.

If the question actually was to detect whether there are duplicates in the database, even this could (and should) be handled by pure SQL, eg. select key_columns from my_table group by (key_columns) having count(*) > 1, though - depending on a database and other requirements - a better approaches might exist. A pure Java solution in this case might actually employ a Map. Was this the gist of the question?

Anyway, it seems to me that either the question was more twisted, or your interviewer knew neither Java nor databases.
Guy Emerson
Greenhorn

Joined: Dec 14, 2010
Posts: 27
Thanks Naved and Martin for your replies.

Guy Emerson
Alok Aparanji
Greenhorn

Joined: Jan 08, 2012
Posts: 4
A million records in memory is a way too much heap. Not possible in a practical scenario. Duplicates have to be eliminated while fetching from the database.

Answering to the interview question, have a well defined key. Put the each row as an object. Duplicates will be eliminated, since HashMap cannot have multiple records with the same key. The older value will be overwritten (Older and newer value should be the same in your case).
Pavan Kumar Dittakavi
Ranch Hand

Joined: Feb 12, 2011
Posts: 104

I have one question here. With a hashmap, you can have a key that points to the value. And we can have two cases in this problem.

1. Database schema having a primary key:

In this case, we can capture the primary key of the database in the KEY position of the map. In this case, to be honest there wont be a need for KEY as a db having a primary key ensures that no duplicate records are stored in it. So, this case is handled automatically.

2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.

So, the appropriate way for doing this should be at the query level.

[ Experts please drop your views on this one ].


Thanks,
Pavan.
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3606
    
  60

Pavan Kumar Dittakavi wrote:
2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.

Of course that can be determined. When constructing a query to detect or remove duplicates, you need to know which columns uniquely identify each row object, so that you can group by them or distinct them. These columns would then be used as the map's key. You'd need do create a class for them and define the hash() and equals() method properly, of course, but that is true for any multi-column key.

So, the appropriate way for doing this should be at the query level.

I agree, but it does not mean it is not doable from Java.

Edited for clarity. The key columns do not uniquely identify rows, as in this case no duplicate rows could occur. Since there can be duplicate rows, the key columns must be chosen so that they identify the "objects" that are stored in the table.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38033
    
  22
Welcome to the Ranch Alok Aparanji
Alok Aparanji
Greenhorn

Joined: Jan 08, 2012
Posts: 4
Pavan Kumar Dittakavi wrote:I have one question here. With a hashmap, you can have a key that points to the value. And we can have two cases in this problem.

1. Database schema having a primary key:

In this case, we can capture the primary key of the database in the KEY position of the map. In this case, to be honest there wont be a need for KEY as a db having a primary key ensures that no duplicate records are stored in it. So, this case is handled automatically.

2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.

So, the appropriate way for doing this should be at the query level.

[ Experts please drop your views on this one ].


Thanks,
Pavan.


+1 to your first point.

For the second point, there can be several reasons why the DB might not have a primary / unique key
1) There is a column which is actually unique but is not declared to be the primary key.
2) The DB might be de-normalized for performance reasons.
There could be more reasons, I'm sure. But if you have domain knowledge about the table, you can be pretty sure if a column or a combination of columns could work like a unique key. Sounds like the interviewer's question was just to test the candidate's collections knowledge and not a real time scenario.

Campbell Ritchie wrote:Welcome to the Ranch Alok Aparanji

Thanks Campbell. Hope to learn lots of stuff here .
Matthew Brown
Bartender

Joined: Apr 06, 2010
Posts: 4343
    
    8

Alok Aparanji wrote:
For the second point, there can be several reasons why the DB might not have a primary / unique key
1) There is a column which is actually unique but is not declared to be the primary key.
2) The DB might be de-normalized for performance reasons.

3) The database designer doesn't know what they're doing. While 1 and 2 may be valid, from my experience 3 is more common .
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: HashMap interview Question.
 
Similar Threads
Pagination
Relationship-caching in weblogic 8.1 ?????
how jsp pagination can be done
Fetching Records From Database
DuplicateKeyException revisit