• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Fast searching Algoritham

 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

My application needs searching on ZipCode.
We have larger number of zip codes in our application n for zipcode there is some value assigned. We need to get the value of perticular zipcode entered by user.

Can anyone suggest which would be faster.

I am thinking if I can group zipcodes. Say zipcode ending with 1 in one HashMap then zipcode ending with 2 in another hashMap and then search accordingly.

Please suggest me. Performance is a criteria in this case.

Thanks in advance
Suhem
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Assuming that the zip codes in question are the ones used in the USA, there are less than 100000 of those. I wouldn't think that a HashMap of that size causes performance problems. You could experiment with different load factors, like 0.75 and 0.9, but first of all: Have you established that the HashMap is actually a performance bottleneck? You know, "premature optimization..." and all that.
 
suhem chauhan
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Ulf,

Thanks a lot for the reply! Yes! These are just US zipcodes only. Thing is even if we are getting some 100 miliseconds difference in performance it is add on.
 
suhem chauhan
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Ulf

I do not think so it is a premature optimization. As our application is already into production we are getting around 3 seconds of responce time but we want to reduce it as much as we can. I am not sure if HashMap is bottleneck, but the design is so that with each entry by user zipcodes are ckecked twice. So we thought may be we can optimize our zipcode entries.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sounds like you need to find out if the data structure really is the bottleneck. Why are you accessing it twice?

If it actually turns out to be problematic, you might try using an array with 100000 elements instead.
 
Ranch Hand
Posts: 368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you only nead to read zip codes(not write), you can consider putting them in a sorted form in an array and then use binary search to fetch them.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There's no need to use binary search - the zip code can be used as the index for direct access.
 
Satya Maheshwari
Ranch Hand
Posts: 368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ulf Dittmer:
There's no need to use binary search - the zip code can be used as the index for direct access.



Agreed
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm not real clear on why you are looking up the zip code - is there some auxilliary data attached or are you just trying to see if it is a legal string in the zip code field?

If the question is legality of the number, a boolean[100000] array indexed by the code would do the trick. Of course you first catch non-numeric String input, then index in bounds.

Bill
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The first post stated that there is value assigned to each zip code, which is what is of interest; possibly a string containing city and state.
 
suhem chauhan
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

Thanks a lot for replying!

William, Ulf is right there is some value attached to each ZipCode and we need to fetch that value depending on the ZipCode entered by user.

Ulf, Using array, will it not make performance even more slow? I am not sure about it.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You mean compared to a HashMap? With an array you get direct access - exactly one lookup. With a HashMap there will be collisions, taking up more time. But you can easily run timing tests, although I'm certain that using arrays will be faster.
 
lowercase baba
Posts: 13091
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
the term "premature optimization" generally means 'optimizing something that isn't really the problem". In other words, your code is probably doing LOTS of stuff, the lookup is one part. if it's taking 100 ms to do that lookup, but something else is taking 500 ms, you might be better off optimizing that part of the code. Almost inevitably, the bottlenecks are not where you think they are.

as to the array vs hashmap... Arrays are VERY FAST at doing lookups. They are poor when you have to add something into the middle. Since your array is pretty small, and you won't be inserting a lot of new elements (there aren't too many new zip codes between 63122 and 63123), it probably would be faster.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I downloaded this zip code file.
It has 29470 lines so the 100000 element array would be largely empty - now I am curious as to how many unique city names there are!

Other sources quote 40,000 zip codes total - hmm

Bill
[ September 15, 2008: Message edited by: William Brogden ]
 
Rancher
Posts: 4804
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by William Brogden:
now I am curious as to how many unique city names there are!


There are not many unique city names. If you mean city/state pairs, there are more. For example, there are something like 13 "Arlington" city names, I've been to three, one outside Boston, one outside Washington DC and one down by Dallas.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
After a little fun with Java collections I found:

29467 unique zip codes
16698 unique city names
3 cases where a city appears in two states but keeps the same zip code, for example Pinetta on the FL / GA border.

So - if you really needed a compact representation of the city and state corresponding to a zip code with rapid lookup, there is a lot of room for optimization.

Bill
 
suhem chauhan
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by William Brogden:

So - if you really needed a compact representation of the city and state corresponding to a zip code with rapid lookup, there is a lot of room for optimization.

Bill[/QB]



Hi William,

I would appreciate it if you could explain little bit more on this. Where exactly it need to be optimized.

Thanks you all for replying. Your replies are really very much helpful for me

Suhem
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by suhem chauhan:
Where exactly it need to be optimized.


Well, that's the question for you to answer - does it need to be optimized further? Using an array instead of a Map is a speed optimization, which you said is what you're after.

Optimizing the code to use 30000 elements instead of 100000 elements in the array is a memory optimization; it might well come at the expense of speed, thereby defeating your primary reason for optimization.

So, before even thinking along these lines, you should implement the array solution, and then run tests under realistic conditions to determine whether you need further optimizations, either in speed or memory usage. And even then you need to keep in mind that reducing one will frequently increase the other.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Where exactly it need to be optimized.



Ulf is exactly right, first you need to figure out what needs to be optimized.

Shrinking the memory used by the data can only be accomplished by spending extra CPU cycles when retrieving a value.

On the other hand, if your system has plenty of memory, the value stored could be something that will be useful for the fastest possible response.

It would help if we knew exactly what you need to do with the result of looking up a zip code.

Bill
 
suhem chauhan
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

Thanks a lot for your cooperation! I am convinced that using array would be a better option in my case or I can once try using Hashmap with load factor .75.


Thank you all once again
Suhem
 
I need a new interior decorator. This tiny ad just painted every room in my house purple.
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic