• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

HashMap interview Question.

 
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,
In a recent interview I was given a scenario wherein there is a database which contains millions of records even duplicate records as well. Now in a java program if we access these records, how would I ensure that the data I am accessing is not duplicate?

They provided a hint that Maps can be used, or may be hashcode() method!!

Now the question is how?

Thanks

 
Ranch Hand
Posts: 692
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Guy Emerson wrote:Hi all,
In a recent interview I was given a scenario wherein there is a database which contains millions of records even duplicate records as well. Now in a java program if we access these records, how would I ensure that the data I am accessing is not duplicate?

They provided a hint that Maps can be used, or may be hashcode() method!!

Now the question is how?

Thanks


hashmap can store unique key , and duplicate values
so if empid is unique put it in key so in that manner your hashmap will never save two keys which is same
for eg : hashmap cannot store hashmap.put(1,"naved");
hashmap.put(1,"sam");
but this is possible
hashmap.put(2,"naved");

alternative you can print the hashcode of two stuff if they are same there hashcode will be same
however i m not very much sure with the hashcode thing but hashmap will be the solution for this problem
lets see what others says about hashcode .....
 
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't know what the interviewer wanted to hear, but the correct way would be to obtain the data without duplicates right from the database, using proper SQL construct (ie. select distinct or group by). Databases were designed to handle tasks like these.

Now if I was forced to do this in Java, putting the records into a HashSet would be the natural way of doing it. Yes, hashcode() plays a role in this. If you're unsure why or how, I'd recommend reading any Java Collection Framework tutorial. You should definitely do it before your next interview.

If the question actually was to detect whether there are duplicates in the database, even this could (and should) be handled by pure SQL, eg. select key_columns from my_table group by (key_columns) having count(*) > 1, though - depending on a database and other requirements - a better approaches might exist. A pure Java solution in this case might actually employ a Map. Was this the gist of the question?

Anyway, it seems to me that either the question was more twisted, or your interviewer knew neither Java nor databases.
 
Guy Emerson
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Naved and Martin for your replies.

Guy Emerson
 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A million records in memory is a way too much heap. Not possible in a practical scenario. Duplicates have to be eliminated while fetching from the database.

Answering to the interview question, have a well defined key. Put the each row as an object. Duplicates will be eliminated, since HashMap cannot have multiple records with the same key. The older value will be overwritten (Older and newer value should be the same in your case).
 
Ranch Hand
Posts: 112
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have one question here. With a hashmap, you can have a key that points to the value. And we can have two cases in this problem.

1. Database schema having a primary key:

In this case, we can capture the primary key of the database in the KEY position of the map. In this case, to be honest there wont be a need for KEY as a db having a primary key ensures that no duplicate records are stored in it. So, this case is handled automatically.

2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.

So, the appropriate way for doing this should be at the query level.

[ Experts please drop your views on this one ].


Thanks,
Pavan.
 
Martin Vashko
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Pavan Kumar Dittakavi wrote:
2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.


Of course that can be determined. When constructing a query to detect or remove duplicates, you need to know which columns uniquely identify each row object, so that you can group by them or distinct them. These columns would then be used as the map's key. You'd need do create a class for them and define the hash() and equals() method properly, of course, but that is true for any multi-column key.


So, the appropriate way for doing this should be at the query level.


I agree, but it does not mean it is not doable from Java.

Edited for clarity. The key columns do not uniquely identify rows, as in this case no duplicate rows could occur. Since there can be duplicate rows, the key columns must be chosen so that they identify the "objects" that are stored in the table.
 
Marshal
Posts: 79177
377
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch Alok Aparanji
 
Alok Aparanji
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Pavan Kumar Dittakavi wrote:I have one question here. With a hashmap, you can have a key that points to the value. And we can have two cases in this problem.

1. Database schema having a primary key:

In this case, we can capture the primary key of the database in the KEY position of the map. In this case, to be honest there wont be a need for KEY as a db having a primary key ensures that no duplicate records are stored in it. So, this case is handled automatically.

2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.

So, the appropriate way for doing this should be at the query level.

[ Experts please drop your views on this one ].


Thanks,
Pavan.



+1 to your first point.

For the second point, there can be several reasons why the DB might not have a primary / unique key
1) There is a column which is actually unique but is not declared to be the primary key.
2) The DB might be de-normalized for performance reasons.
There could be more reasons, I'm sure. But if you have domain knowledge about the table, you can be pretty sure if a column or a combination of columns could work like a unique key. Sounds like the interviewer's question was just to test the candidate's collections knowledge and not a real time scenario.

Campbell Ritchie wrote:Welcome to the Ranch Alok Aparanji


Thanks Campbell. Hope to learn lots of stuff here .
 
Bartender
Posts: 4568
9
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Alok Aparanji wrote:
For the second point, there can be several reasons why the DB might not have a primary / unique key
1) There is a column which is actually unique but is not declared to be the primary key.
2) The DB might be de-normalized for performance reasons.


3) The database designer doesn't know what they're doing. While 1 and 2 may be valid, from my experience 3 is more common .
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic