Just wrote some code at work to solve this problem, and I sound the solution quite pleasing and elegant to my eye I'll drop the problem and see if anyone gets the same thing. (I'm running the data conversion on several thousand rows as I write) In the database, names are stored as a single field NAME, and this is in the form "<lastname> <firstname>" (space separated). We want to try to separate it into two fields. The firstname can be more than word, but we assume that anything after the last name is the firstname. The difficulty is finding the lastname. We assume that any last name that consists of several words is prefixed by one of: di, de, du, da, mac, van den, van der, van, de la. Then the next name is part of the last name too. Well, I enjoyed it, anyway. :roll:
Originally posted by Simon Lee: (an aside) our database holds european data. we have a 3 columns - firstname, lastname, fullname. There are so many exceptions, and ways in how people like to be referred that we have to do this.
This is why there are commercial packages especially for the purpose of data cleansing. A quick and dirty way is fine, but getting it right is another thing entirely.
"Write beautiful code; then profile that beautiful code and make little bits of it uglier but faster." --The JavaPerformanceTuning.com team, Newsletter 039.