Jack Bush wrote:Yes, it is a requirement to identify that it is a property number from its sequence (first number) as well as ensuring that it consists of at least one digit. Below is an example of what this input data is made up of:The first 2 words is the district name which can start from 1 – 3 letters or more...
Actually, it isn't. From what I can see from your original data (the one you posted in your last
thread), the district ends with the first
word
that contains a number as it's first character (which seems, pretty consistently, to be the start of an address).
That word seems to contain some combination of the following:
1. A house/building number, which may be suffixed with a letter (eg, '43a').
2. A range of building numbers, separated by a hyphen (-).
3. A suite or apartment number + plus a building number (or range), separated by a forward slash (/).
I reckon Fred's right. Once you've identified the first word of the address, I think you might be better off breaking down the possibilities,
maybe with
String.split(), and then using individual regexes to validate/extract the actual numbers.
BTW, as far as I can see, the document is also consistent about having all that "numeric stuff" in a single word (ie, no spaces),
so if you
don't need to actually parse the contents, you could simply use
"[0-9][^ ]*" to get the whole word.
Winston