My mind is saturated, I am deep in a total re-write. I need to build a few lines of java.util.regex to walk through a large buffer and pick up words - dropping the 's on plurals. This gets involved and not all sources are consistent, I seek suggestions and comment.
Are you talking about plurals, or posessives? Or both? Possessives have apostrophes, while plurals do not.
If you're talking about plurals, I think it's going to be nearly impossible to do this with a regex that is accurate in all cases, as there are too many special cases. Will you be OK with an expression that just often gets it right?
If you need to eliminate posessives but not plurals, that's probably more feasible, as there are fewer special cases there.
"I'm not back." - Bill Harding, Twister
Joined: Sep 17, 2006
I want to zip right past any issues that slow the search, using the 80/20 rule. For example, just now I was looking at several pages written by degreed authors writing in html 1.1 to have clean sample text to test on. It occured to me that the problem of discerning what is inside a pair of <> ( along with any punctuation, pictures + graphics and control characters that would need to be discarded ) are the immediate next phase of regex building. Right now, for purposes of this post, we want a Bottle Rocket driven blind watchmaker on skids hot bonded to polytetrafluoroethylene pads running on floating rails covered with polished ice == skip anything that does not fit fast into the definition of a word leaving off posessives, plurals, special cases and any permutation thereof I did not think of.
This is the IV of a feedback loop to populate the registers. Human intervention can occur after we come up with something to look at, I have a 700 page book on Swing at hand, along with fifteen browser windows open and working some ideas for a 2-dimensional language to take the feed from the operator and this phase, I just want to populate the registers with something that looks like a word to the normal human mind. Unicode or otherwise. [ February 26, 2008: Message edited by: Nicholas Jordan ]
subject: Regular expression to find words in a String