I have a program that parses through a file and checks for multiples using 2 loops. The first loop reads the file(about 15,000 kb) line by line; taking the initial line it is on and carries it to the next loop(nested) which opens and parses through that same file (about 15,000kb) checks for multiples records. Within these loops I have If statements printing records with no multiples to one log and printing the records with multiples (more then 1) to another log. I also have another log being open that prints other miscellaneous data as the program runs. I believe my code works but the run time is terrible slow (24+ HOURS) to finish. Is there any way(hopefully simple) I can improve the speed of my program?
the size of the file doesn't matter (I don't think), but the number of lines does. How many are in your file? based on how you describe your algorithm, you have a O(n^2) design, which is going to be slow as the file grows in size.
What you can do may also depend on what the data is. You may be able to use a Map. The line itself would be the key, and you check to see if that key already exists. If not, add it in with a value of 1. If so, increment the existing value.
Then, when you are done, you can iterate through the Map and print the appropriate data to the appropriate file.
There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
If the file is only 15 MB, you should be able to hold it in RAM. Each iteration of the inner loop doesn't need to open the file before reading it; file I/O is one of the slowest (if not the slowest) things you can do in a program.
You can also remove duplicates when you find them; the effectiveness of this is dependent on the data structure used. An ArrayList for example will be more expensive since every element has to be moved when you delete something from the middle. A LinkedList on the other hand only has to change one reference in order to remove an element. Removing items from a fixed size Array is pointless.
This assumes you want to do a nested loop solution similar to what you already have.
Fred's suggestion would likely run faster since it decreases the times the entire file must be processed to 2 (one for write and one for read). If you're looking for the items in the log to be in the same order as the file, try a LinkedHashMap.
Everything is theoretically impossible, until it is done. ~Robert A. Heinlein