Just wrote some code at work to solve this problem, and I sound the solution quite pleasing and elegant to my eye I'll drop the problem and see if anyone gets the same thing. (I'm running the data conversion on several thousand rows as I write) In the database, names are stored as a single field NAME, and this is in the form "<lastname> <firstname>" (space separated). We want to try to separate it into two fields. The firstname can be more than word, but we assume that anything after the last name is the firstname. The difficulty is finding the lastname. We assume that any last name that consists of several words is prefixed by one of: di, de, du, da, mac, van den, van der, van, de la. Then the next name is part of the last name too. Well, I enjoyed it, anyway. :roll:
Jim Yingst
Wanderer
Sheriff
Joined: Jan 30, 2000
Posts: 18670
posted
0
You might want to add "del", "della", "de los", "de las", and "mc". And who knows what else if we move past European languages. [ June 03, 2003: Message edited by: Jim Yingst ]
"I'm not back." - Bill Harding, Twister
SJ Adnams
Ranch Hand
Joined: Sep 28, 2001
Posts: 925
posted
0
(an aside) our database holds european data. we have a 3 columns - firstname, lastname, fullname. There are so many exceptions, and ways in how people like to be referred that we have to do this.
David Hibbs
Ranch Hand
Joined: Dec 19, 2002
Posts: 374
posted
0
Originally posted by Simon Lee: (an aside) our database holds european data. we have a 3 columns - firstname, lastname, fullname. There are so many exceptions, and ways in how people like to be referred that we have to do this.
This is why there are commercial packages especially for the purpose of data cleansing. A quick and dirty way is fine, but getting it right is another thing entirely.
"Write beautiful code; then profile that beautiful code and make little bits of it uglier but faster." --The JavaPerformanceTuning.com team, Newsletter 039.