I have 21 files containing 500000 lines of ids and I need to do a rest call to a server to convert these ids into a specific String value. I know the server can handles about 100 000 queries per seconds but I have issues with my program no even being able to query that quickly.
Basically by program takes each file put it into its own threads and query the server but the performance I get is about 100000 queries per minute. I'm just wondering what are the best ways to improve the performance?
So I don't know if I should look into Java 7 parallel processing with CPU or if threads are good?
Apart from CPU and multithreading, another important factor for this kind of operation is - secondary storage latency.
Are those 21 files on different storage? I doubt it. If those are on same physical storage, then simultaneously reading from those file has a speed limitation irrespective of your powerful CPU. So, if I were you, I wouldn't read all those files in parallel.
One approach I can suggest is -
1) Open one file
2) Read big chunk of data - making 'read' call for each record is simply performance killer
3) Spawn multiple threads and provide those records to them - that way, multiple threads will access data from heap, and primary memory is way too faster than secondary storage.
Of course, there are other approaches (which might be better than this), but just my two cents.
What is the size(in bytes) of these files. 500K numbers isn't really a big deal and I think you would not hit the file system bottleneck as the data will typically get into the file system cache (for a properly configured file system)
I would also look at the number of threads & the amount of context switches your program is making during execution. Higher amount of context switches will lead to poor performance.
So, I would typically see whether your process is CPU or I/O bound (top in *nix systems with thread view) and then see the context switches.
If CPU bound, I will keep the number of threads close to the number of cores on the box.If I/O bound I will tune the I/O (which I doubt it is)
Edit: Reading in bigger chunks from the files is anyways a good idea as specified by Anayonkar