Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

What's the suitable Data structure

 
Usha Pnatha
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Ranchers,

I am doing a small text file search engine and I have to make a good selection of data structures for this assignment.

Data structures are used for the purpose of storing words from text files & the particular word's location(i.e:path& name of the text file it contains).
So, there will be duplicate keys since the same word can repeat more than once. Also both the key&value will be strings.
I have to do searching also. But for that I have to implement the searching algorithm in my own.

So, I need to select suitable data structures to fulfill the above requirement.
Please note that I have to implement all those data structures using Java (i.e: I can't use the Collections in Java directly)

I am planning to use a HashTable mainly and for duplicate entries a LinkedList.

Can anyone please advice me on this? Is my selection correct or are there any more appropriate selections?

Also, if this topic is not suitable for this forum, please excuse me & direct me to the corrected forums.

Thanks in Advance,
Usha

 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think this is the correct place to ask that sort of question.

I think you are on the right lines, but be less specific and "program to the interface."If you wish to avoid duplicates, instead of using a List<URL>, try a Set<URL>.

I don't know why I am seeing >< because I only wrote < Sorry for that.
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you have to implement the data structures yourself, you might have to write your own Map and List and Set interfaces.
 
Steve Luke
Bartender
Posts: 4181
21
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:
I don't know why I am seeing >< because I only wrote < Sorry for that.


Not your fault, forum bug: See
 
Usha Pnatha
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much Campbell Ritchie .
Yes, I have to implement the data structures in my own.

What's the different between HashMap and HashTable? Is there any useful features in Map?
I read something about synchronization issues with hashtable.
Is it correct to implement the Hashtable and for duplicate entries, using a linked list as the value in the hashtable entry?


Thanks & Regards,
Usha
 
Parambir Singh
Ranch Hand
Posts: 40
Android Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since you'll be implementing HashTable and HashMap on your own, you can consider them to be the same. Hashtable (along with Vector) were the datastructures available in Java originally before the Collections API was introduced. Hashtable is thread safe (i.e. more than one thread can access/update it without and problems of data corruption). However, this degrades its performace. HashMap is not threadsafe. Infact, none of java collection APIs are thread safe.

You can consider using a combination of HashMap and List to store the data. The HashMap's key would be the string that you need to search for and the List can contain all the places that string was found.
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you can't use the Collections framework (as Parambir Singh suggests), then you will have to forget all about Hashtable, HashMap, etc.

By the way: there was a third collection class before the Collections Framework (Stack). There are actually some thread-safe versions of the Collections framework classes, which you can (probably) read about here.
 
Parambir Singh
Ranch Hand
Posts: 40
Android Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Ritchie... I didn't know about these classes!
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome
 
Usha Pnatha
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much Campbell Ritchie and Parambir Singh.

Yes, I have to implement the data structures by my own. So, I can't use any predefined classes in Java.
The link from Campbell Ritchie was new to me thank you very much for sharing it.

So, I can implement a HashMap and List to solve the problem.

Thank you for clarifying about HashMap & HashTable. It was very useful.

Thanks & Regards,
Usha
 
Usha Pnatha
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Ranchers,

With the idea I got from my previous posts, I managed to create a Hash table by my own and used a Linked list for duplicate entries.

Now, the problem I have is: In my hash table I used a hashing function which will results hashcodes form 0-16.
So, there will be 17 keys. A a value will be stored in the place of its key. The main purpose of my program is to search for values & to get the result, using a method which will take a lesser running time. So, when I search for a string, if I convert it, I will get a key & that key may contains several different strings stored in a liked list.
What is the best way to retrieve the particular word, with minimum running time?
Do I need to use another hash table to store values of a key(using a different hashing function)?
If so, normally how many level's I have to go like this?

The main requirement of my program is to retrieve the stored data with minimum running time.

Please help me in this regard as soon as possible.


Thank you,
Usha
 
Usha Pnatha
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In addition to my earlier post:
There will be at most 10,000 unique keys.
So can any one tell me suitable hashing function which avoids collisions when using two levels of hashing?

Regards,
Usha
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Surely hash codes should be evenly distributed across the whole range of the int primitive data type?
You increase capacity by repeated doublings, 16->32->64->128->256 etc. Then you use

. . . hashcode & (capacity - 1) . . . //I think the () are redundant)

instead of the % operator to get the right-hand end of the hash code and use that as the index to insert the key into. Look in books like Effective Java and Thinking in Java. By the way: it used to be possible to find Thinking in Java 3rd edition free of charge on the internet.
 
Usha Pnatha
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much Campbell Ritchie.
I searched in the net and found those books. They are very useful not only for this program but to sharpen my knowledge.

But, I couldn't find a hashing method there. If you don't mind, can you please give me a code example? Since I've to handle upto 10,000 different keys, I think going up to
16->32->64 is enough to fulfill the requirement.


Another question: Mainly I will have any array of 16 keys; then each element of that array should contain another array of 32keys. How can I link the element of the first level array and second level array? Is it just a object reference like: nodes1[0]=new Object[32] ? or is there any other way?

Do you think, the liked list I implemented earlier would be useful in the program if I'm using multiple level of hashing to handle all possible no of keys? Can't I use only object arrays to implement the multilevel hashtable?

Please clarify me as early as possible.

Thank you.

Regards,
Usha
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Keeping it simple, let's restrict ourselves to a class whose fields are all <= 32 bits wide, or are reference types which already have well-formed hashCode methodsThere are other ways to do it and there are ways to get hash codes from longs, floats, etc, which you should find in both those books.

You can go 256->512->1024->2048->4096->8192->16384, so you can design Maps any size up to 2^30 (1073741824) before you run out of ints for your indices. Whenever you find size > capacity * loadFactor, you double the size of the backing MapPair[] array. Every time you double the size, you would have to redistribute all your existing pairs, using 1 more bit of the hash code, so it is quicker to start with capacity 16384.
 
Usha Pnatha
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much Campbell Ritchie.
The code will definitely help me to understand hashing and implement my program.

Could you please give me some answers to my other question of the earlier post?
Another question: Mainly I will have any array of 16 keys; then each element of that array should contain another array of 32keys. How can I link the element of the first level array and second level array? Is it just a object reference like: nodes1[0]=new Object[32] ? or is there any other way?

Do you think, the liked list I implemented earlier would be useful in the program if I'm using multiple level of hashing to handle all possible no of keys? Can't I use only object arrays to implement the multilevel hashtable?


Thank you very much.

Regards,
Usha
 
Campbell Ritchie
Sheriff
Posts: 48652
56
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can have an array of arrays, by all means.

But you are probably better off simply using a single array, of MapPairs or something like that. You can make each MapPair into a LinkedList node as well. Have you seen how it is done in HashMap? It is worthwhile reading the source for it.

I presume you know there is a file called src.zip in your Java installation folder, and unzipping that will allow you to read the source code of all the well-known Java classes. HashMap is among them.
 
Rob Spoor
Sheriff
Pie
Posts: 20511
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:Keeping it simple, let's restrict ourselves to a class whose fields are all <= 32 bits wide, or are reference types which already have well-formed hashCode methods

You can use the following list to calculate hash codes for any field f:
  • boolean: (f ? 1 : 0)
  • byte, char, short or int: f
  • long: (int)(f ^ (f >>> 32))
  • float: FloatToIntBits(f)
  • double: Double.toLongBits(f), then use the long way
  • object reference: (f == null ? 0 : f.hashCode())

  • Taken from "Effective Java" by Joshua Block.
     
    Usha Pnatha
    Greenhorn
    Posts: 27
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thank you very much Campbell Ritchie and Rob Prime,

    @ Campbell Ritchie : Yes, I've refereed some of the classes in the src.zip folder. I'll go through the HashMap implementation and try to get the ideas. Thank you very much for answering all my questions immediately. You support it highly appreciated for beginners of Java like me. Thank you.

    @Rob Prime: Thank you for sharing that point on hashing. It's useful for me. Thank you.


    Thanks & Regards,
    Usha
     
    Campbell Ritchie
    Sheriff
    Posts: 48652
    56
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    You're welcome
    And Rob has saved you the trouble of looking up Bloch's book; he has told you about all the other types > 32 bits.
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic