So basically I just want to find out if "any" of the keywords are present within the file and return true on first occurrence.
I imagine writing this algorithm would be easy enough but I don't trust myself to write efficient one.
I had in mind this:
I want to know if there is library that would allow me to run method like that very efficiently better yet with native methods.
I know about com.eaio.stringsearch.StringSearch but that is far passed my understanding level.
If I would have to breakup the string into list. That I would be better off working with StringBuffer or StringBuilder.
List to List operation is nice thing but doesn't apply to String objects. I know collections have ton of stuff to be used. What I'm looking for is more in the range of text.
This is really intended to filter 20000 charter long strings on application server against dictionary with couple of thousands words. Therefor performance is an issue. The sentence is just an example.
It doesn't feel efficient the way I envision it but could you give example of pseudo code that would describe how to break up the "sentence" into List<String> assuming there are multiple delimiters between words not just white spaces?
Also I head in mind something more complex that would be able to memorize words that already passed the test and would use pre-fetching to improve performance.
I will give this a try. I didn't realize that only the words from the file would stay in the collection. I guess I'm to tired today. This is even better than I thought. I'm working on scheduled profanity filter that will notify admin via email that there is something somewhere that needs attention.
Thanks for the sample code. I'll see if I can find some faster native libraries to do the job you sampled, since I'm talking several gigabytes of mysql database data that will be processed every night.
I use hibernate and I'm not experienced programmer with it either. I pretty much run select and update requests.
I have no idea how to do join and index on TEXT fields that have 20Kb each. Do you have link where I could take look on an example?
Thank you soo much for sticking around and helping me.
I'm also neither I don't know how to do it with hibernate (don't think that hibernate supports string split function because not all databases support that feature).
But where is an example for postgresql:
But if that is not going to work out you could try a batch processing approach based on the example I gave you earlier.
I tried some native stuff with regexp_split_to_table and it seems to be very slow with MySQL. Although when I use in memory handling and with little tweaks to the size of memory for JVM it is pretty fast.