• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

US Address Standardization in java

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am trying to standardize US postal address in java. I started exploring with Street suffix, secondary unit designators and highways, etc. But it is too much to code and we do not want to connect to third party APIs since it is not that critical. Do any of you worked on similar thing like this?
 
Bartender
Posts: 565
14
IntelliJ IDE Spring Fedora
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What exactly are you trying to do?
 
Marshal
Posts: 74376
334
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

To expand on what Al said, please write down what your definition of an American address is, and how it is displayed. That is the first stage; after that you can readily design some nice object‑oriented code to encapsulate that design.
Please check carefully that not using third‑party code doesn't cross the nice divide between not wishing to appear timid and making life unnecessarily difficlut for yourself.
 
Saloon Keeper
Posts: 24560
168
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It is my understanding that way back when, IBM had some sort of address format standard so that addressed could be normalized as to form, abbreviations used and so forth. In fact, I think I've even seen a reference card with it once.

That was just a standard, however, and I don't think they ever had an actual API for it (and it probably would have been in COBOL if it was). But a web search may turn up a modern-day library.

A related issue is reconciling ZIP codes with street addresses, and there are some services for that. I think in fact, you can get something from the US Postal Service.
 
Marshal
Posts: 26912
82
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's certainly too much to code, since in real life there's going to be a huge number of special cases like addresses which aren't on streets (maybe on boats or on mountaintops) and who knows what else. The third-party API would take care of all of that for you.

But if you don't mind doing a half-done job which works for your set of addresses, I would start by looking for the official specs on the web -- which are likely to be large because they are comprehensive -- and implement the parts of those specs which look useful to you.

It's also possible that there exist services where you can send them your complete set of addresses and they will return a cleaned-up version. This of course works better if you don't get frequent additions to the set.
 
Venkattaramana Santhababu
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am really sorry for the late response. I was off from work for a couple of days. We have a fraud prevention system that gets manipulated addresses (like below) and I am trying to find the address that is slightly manipulated so I can run some velocity on it.
The below address is actually same address but with a slight modification in the spelling. Our upstream system doesn't do address standardization so when it comes to us, our system should detect that it is the same address and run a counter on it and block if it reaches say 2 or 3.

Sample address: 180 Elan Village Lane -> 180 Elan Village Ln -> 180 Elan Villag Lane -> 180 Elan Village Lan

I know this seems like a big ask. But if there is some basic library that I can use to solve 50% of this problem that would be great too. Thanks and let me know if you have any questions.
 
Marshal
Posts: 8127
572
Mac OS X VI Editor BSD Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you have no idea about what kind of "big" task it is a talk here. If to mention one of well known services who does address hygiene and standardization and related stuff, is company called Pitney Bowes, made a revenue of 3.4 billion USD back in 2016, for providing the services you wish to implement.

Now, I'm not a data scientist, but there are such techniques as fuzzy matching, where as you said, some characters could be misspelled/transposed/removed/doubled/you name it, but again, that's complex (very) in its own way, and again, for consolidation tasks, where you may want to consolidate the data on address, you'd need to attempt to hygiene it in some way and standardize it, and we go back again where we started - that's not simple.

So I think what you are left with if that's an in-house implementation, is to do some tricks, similar to:

1. Take the address, remove all spaces (maybe special characters), sort characters in the lexicographic order, and calculate the distance (Levenshtein's) between the two addresses - that what comes to my mind first as a blunt solution. But that's not immune to false positives, where two addresses deriving from the same block just different i.e. flats would have a distance of 1, so what do you do then? The same distance would be if in one of same exact addresses some insignificant character would be missed - so, the decision is yours. You could add more complexity to this logic, maybe to check the distance only on non-numeric characters. Many options, many things to consider.

2. Just removing special characters from addresses and upper casing them, also could cover some percentage of cases, but would leave out many other as well, i.e. missing/misplaced characters.


Venkattaramana Santhababu wrote:But if there is some basic library that I can use to solve 50% of this problem


I'm not aware of that. Maybe there are, google it. There are phone normalisation libraries created by Google, who deal with phone normalisation and standardization, but they don't deal with missing or misplaced digits within the phone numbers (what you are essentially looking for) - as that would be a guessing.
 
Tim Holloway
Saloon Keeper
Posts: 24560
168
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you vastly mis-under-estimate the number of times I can mistype my own name and address.

Is this an ongoing problem? Sounds more like a dictionary attack than anything else. While many scammers apparently deliberately mangle text to weed out the intelligent, I'd think that someone targeting a particular street address would be more direct.
 
Al Hobbs
Bartender
Posts: 565
14
IntelliJ IDE Spring Fedora
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Easiest way would be if the app has access to the internet, it can check the input on a map site.
 
Tim Holloway
Saloon Keeper
Posts: 24560
168
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Al Hobbs wrote:Easiest way would be if the app has access to the internet, it can check the input on a map site.



Maps, however, incorporate deliberate errors to detect copyright violations. In fact, one showed a street running through my own house.

So not the most authoritative resource there is.
 
reply
    Bookmark Topic Watch Topic
  • New Topic