aspose file tools*
The moose likes Java in General and the fly likes What's the suitable Data structure Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "What Watch "What New topic
Author

What's the suitable Data structure

Usha Pnatha
Greenhorn

Joined: Aug 01, 2008
Posts: 27
Hello Ranchers,

I am doing a small text file search engine and I have to make a good selection of data structures for this assignment.

Data structures are used for the purpose of storing words from text files & the particular word's location(i.e:path& name of the text file it contains).
So, there will be duplicate keys since the same word can repeat more than once. Also both the key&value will be strings.
I have to do searching also. But for that I have to implement the searching algorithm in my own.

So, I need to select suitable data structures to fulfill the above requirement.
Please note that I have to implement all those data structures using Java (i.e: I can't use the Collections in Java directly)

I am planning to use a HashTable mainly and for duplicate entries a LinkedList.

Can anyone please advice me on this? Is my selection correct or are there any more appropriate selections?

Also, if this topic is not suitable for this forum, please excuse me & direct me to the corrected forums.

Thanks in Advance,
Usha

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39810
    
  28
I think this is the correct place to ask that sort of question.

I think you are on the right lines, but be less specific and "program to the interface."If you wish to avoid duplicates, instead of using a List<URL>, try a Set<URL>.

I don't know why I am seeing >< because I only wrote < Sorry for that.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39810
    
  28
If you have to implement the data structures yourself, you might have to write your own Map and List and Set interfaces.
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Campbell Ritchie wrote:
I don't know why I am seeing >< because I only wrote < Sorry for that.


Not your fault, forum bug: See


Steve
Usha Pnatha
Greenhorn

Joined: Aug 01, 2008
Posts: 27
Thank you very much Campbell Ritchie .
Yes, I have to implement the data structures in my own.

What's the different between HashMap and HashTable? Is there any useful features in Map?
I read something about synchronization issues with hashtable.
Is it correct to implement the Hashtable and for duplicate entries, using a linked list as the value in the hashtable entry?


Thanks & Regards,
Usha
Parambir Singh
Ranch Hand

Joined: Sep 05, 2004
Posts: 40

Since you'll be implementing HashTable and HashMap on your own, you can consider them to be the same. Hashtable (along with Vector) were the datastructures available in Java originally before the Collections API was introduced. Hashtable is thread safe (i.e. more than one thread can access/update it without and problems of data corruption). However, this degrades its performace. HashMap is not threadsafe. Infact, none of java collection APIs are thread safe.

You can consider using a combination of HashMap and List to store the data. The HashMap's key would be the string that you need to search for and the List can contain all the places that string was found.


SCJP, SCMAD
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39810
    
  28
If you can't use the Collections framework (as Parambir Singh suggests), then you will have to forget all about Hashtable, HashMap, etc.

By the way: there was a third collection class before the Collections Framework (Stack). There are actually some thread-safe versions of the Collections framework classes, which you can (probably) read about here.
Parambir Singh
Ranch Hand

Joined: Sep 05, 2004
Posts: 40

Thanks Ritchie... I didn't know about these classes!
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39810
    
  28
You're welcome
Usha Pnatha
Greenhorn

Joined: Aug 01, 2008
Posts: 27
Thank you very much Campbell Ritchie and Parambir Singh.

Yes, I have to implement the data structures by my own. So, I can't use any predefined classes in Java.
The link from Campbell Ritchie was new to me thank you very much for sharing it.

So, I can implement a HashMap and List to solve the problem.

Thank you for clarifying about HashMap & HashTable. It was very useful.

Thanks & Regards,
Usha
Usha Pnatha
Greenhorn

Joined: Aug 01, 2008
Posts: 27
Hello Ranchers,

With the idea I got from my previous posts, I managed to create a Hash table by my own and used a Linked list for duplicate entries.

Now, the problem I have is: In my hash table I used a hashing function which will results hashcodes form 0-16.
So, there will be 17 keys. A a value will be stored in the place of its key. The main purpose of my program is to search for values & to get the result, using a method which will take a lesser running time. So, when I search for a string, if I convert it, I will get a key & that key may contains several different strings stored in a liked list.
What is the best way to retrieve the particular word, with minimum running time?
Do I need to use another hash table to store values of a key(using a different hashing function)?
If so, normally how many level's I have to go like this?

The main requirement of my program is to retrieve the stored data with minimum running time.

Please help me in this regard as soon as possible.


Thank you,
Usha
Usha Pnatha
Greenhorn

Joined: Aug 01, 2008
Posts: 27
In addition to my earlier post:
There will be at most 10,000 unique keys.
So can any one tell me suitable hashing function which avoids collisions when using two levels of hashing?

Regards,
Usha
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39810
    
  28
Surely hash codes should be evenly distributed across the whole range of the int primitive data type?
You increase capacity by repeated doublings, 16->32->64->128->256 etc. Then you use

. . . hashcode & (capacity - 1) . . . //I think the () are redundant)

instead of the % operator to get the right-hand end of the hash code and use that as the index to insert the key into. Look in books like Effective Java and Thinking in Java. By the way: it used to be possible to find Thinking in Java 3rd edition free of charge on the internet.
Usha Pnatha
Greenhorn

Joined: Aug 01, 2008
Posts: 27
Thank you very much Campbell Ritchie.
I searched in the net and found those books. They are very useful not only for this program but to sharpen my knowledge.

But, I couldn't find a hashing method there. If you don't mind, can you please give me a code example? Since I've to handle upto 10,000 different keys, I think going up to
16->32->64 is enough to fulfill the requirement.


Another question: Mainly I will have any array of 16 keys; then each element of that array should contain another array of 32keys. How can I link the element of the first level array and second level array? Is it just a object reference like: nodes1[0]=new Object[32] ? or is there any other way?

Do you think, the liked list I implemented earlier would be useful in the program if I'm using multiple level of hashing to handle all possible no of keys? Can't I use only object arrays to implement the multilevel hashtable?

Please clarify me as early as possible.

Thank you.

Regards,
Usha
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39810
    
  28
Keeping it simple, let's restrict ourselves to a class whose fields are all <= 32 bits wide, or are reference types which already have well-formed hashCode methodsThere are other ways to do it and there are ways to get hash codes from longs, floats, etc, which you should find in both those books.

You can go 256->512->1024->2048->4096->8192->16384, so you can design Maps any size up to 2^30 (1073741824) before you run out of ints for your indices. Whenever you find size > capacity * loadFactor, you double the size of the backing MapPair[] array. Every time you double the size, you would have to redistribute all your existing pairs, using 1 more bit of the hash code, so it is quicker to start with capacity 16384.
Usha Pnatha
Greenhorn

Joined: Aug 01, 2008
Posts: 27
Thank you very much Campbell Ritchie.
The code will definitely help me to understand hashing and implement my program.

Could you please give me some answers to my other question of the earlier post?
Another question: Mainly I will have any array of 16 keys; then each element of that array should contain another array of 32keys. How can I link the element of the first level array and second level array? Is it just a object reference like: nodes1[0]=new Object[32] ? or is there any other way?

Do you think, the liked list I implemented earlier would be useful in the program if I'm using multiple level of hashing to handle all possible no of keys? Can't I use only object arrays to implement the multilevel hashtable?


Thank you very much.

Regards,
Usha
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39810
    
  28
You can have an array of arrays, by all means.

But you are probably better off simply using a single array, of MapPairs or something like that. You can make each MapPair into a LinkedList node as well. Have you seen how it is done in HashMap? It is worthwhile reading the source for it.

I presume you know there is a file called src.zip in your Java installation folder, and unzipping that will allow you to read the source code of all the well-known Java classes. HashMap is among them.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19760
    
  20

Campbell Ritchie wrote:Keeping it simple, let's restrict ourselves to a class whose fields are all <= 32 bits wide, or are reference types which already have well-formed hashCode methods

You can use the following list to calculate hash codes for any field f:
  • boolean: (f ? 1 : 0)
  • byte, char, short or int: f
  • long: (int)(f ^ (f >>> 32))
  • float: FloatToIntBits(f)
  • double: Double.toLongBits(f), then use the long way
  • object reference: (f == null ? 0 : f.hashCode())

  • Taken from "Effective Java" by Joshua Block.


    SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
    How To Ask Questions How To Answer Questions
    Usha Pnatha
    Greenhorn

    Joined: Aug 01, 2008
    Posts: 27
    Thank you very much Campbell Ritchie and Rob Prime,

    @ Campbell Ritchie : Yes, I've refereed some of the classes in the src.zip folder. I'll go through the HashMap implementation and try to get the ideas. Thank you very much for answering all my questions immediately. You support it highly appreciated for beginners of Java like me. Thank you.

    @Rob Prime: Thank you for sharing that point on hashing. It's useful for me. Thank you.


    Thanks & Regards,
    Usha
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 39810
        
      28
    You're welcome
    And Rob has saved you the trouble of looking up Bloch's book; he has told you about all the other types > 32 bits.
     
     
    subject: What's the suitable Data structure