aspose file tools*
The moose likes Java in General and the fly likes Parsing words Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Parsing words" Watch "Parsing words" New topic
Author

Parsing words

Maki Jav
Ranch Hand

Joined: May 09, 2002
Posts: 435
Hi,

I am trying to get a clean 15 characters after and before a target word but if the word is on either side repeated in those characters, I want to select the characters only before that. In the second method "java" I am trying to do this but not being very successful. Please read code and help





The class ConditionPosition is just a wrapper. See it here:



Thanks in advance,

Maki Jav


Help gets you when you need it!
Bill Shirley
Ranch Hand

Joined: Nov 08, 2007
Posts: 457
0th suggestion: use CODE tags rather than quote tags

First suggestion: use an List of Integers rather than an array of ints. I doubt that super-speed is an issue here.

If you're using Java 5.0 or highter:



with autoboxing, you can add an int to the array


[ February 07, 2008: Message edited by: Bill Shirley ]

Bill Shirley - bshirley - frazerbilt.com
if (Posts < 30) you.read( JavaRanchFAQ);
Maki Jav
Ranch Hand

Joined: May 09, 2002
Posts: 435
Hi,

I used code tags

I really need top-notch speed because this method will be used in applet that will be searching
in thousands of files on the client. Don't worry about permissions. I have tested that already
So you see I want int array.

Your reply is not related to my solution

Thanks,

Maki Jav
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41867
    
  63
I haven't looked at what the code does in detail, but if you're reading from that many files, then file I/O speed is going to drown out any difference there could conceivably be between using arrays and using collections.


Ping & DNS - my free Android networking tools app
Bill Shirley
Ranch Hand

Joined: Nov 08, 2007
Posts: 457
Yes, I agree with Ulf. If files are involved, the I/O cost will trump any reasonable processing.

I know this is NotACodeMill, but I was inspired to rework some of your code. Here's my cut (it could always be better) at the base of your code (with the crux of your problem not solved).



Edit: I think for readability's sake, a while loop is desired here:


[ February 08, 2008: Message edited by: Bill Shirley ]
Maki Jav
Ranch Hand

Joined: May 09, 2002
Posts: 435
Well guys,

I have worked on IO and I was able to search word in a 2mb Ms Word file in 500 milliseconds. Everything counts!

Thank you,

Maki Jav
Maki Jav
Ranch Hand

Joined: May 09, 2002
Posts: 435
I want a sentence in three parts

eg:
xxxxxxWORDxxxxx

or if there is WORDxxxxWORDxxWORD

then three lines as String []array
1)WORDxxxx
2)xxxxWORDxx
3)xxWORD

Thank you,

Maki Jav
Bill Shirley
Ranch Hand

Joined: Nov 08, 2007
Posts: 457
Sounds great. If you've got the I/O solutions under your belt, that's more than half the hurdle. There are certainly a million and one buffering, et al. solutions you can apply to those.

Still, I'd attack the problem as
1) solve your algorithm
2) test the speed
3) profile to find the culprit
4) refactor the snails out

If you solve it first in a "more OO" way, it should make your conversion to the bare-bones implementation much easier.

You could also code regression/unit tests against your code, so that when you start super-charging it you already have those sanity checks in place.

Just my 2�.
Maki Jav
Ranch Hand

Joined: May 09, 2002
Posts: 435
Thank you for your advice Bill.

I hope that I am having a bad patch programming-wise so I am not being able to get the results I am looking for in the second method, namely, "java"

Maki Jav
Maki Jav
Ranch Hand

Joined: May 09, 2002
Posts: 435
Hi,

This change to the method has done the trick

Thanks,

Maki Jav
[ February 09, 2008: Message edited by: Maki Jav ]
Maki Jav
Ranch Hand

Joined: May 09, 2002
Posts: 435
Bill,

I just tested using BufferedReader to text read file with following porperties:

location: d:/dump.txt
it contains 39245 lines - size 1.00 MB (1,048,576 bytes)

time taken by my code with following conditions:
median 469 milliseconds -
BufferedReader.readLine(); // read 39245 lines
// checked them all for condition
if(String.toLowerCase().indexOf("behave")>-1)

median 344 milliseconds -
BufferedReader.readLine();// read 39245 lines only.

125 milliseconds difference for if statement only.

Results are on Intel PIII 797MHZ 256 MB of Ram.

So you see it is good performance

What you say?

Maki Jav
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Parsing words