What is the size(in bytes) of these files. 500K numbers isn't really a big deal and I think you would not hit the file system bottleneck as the data will typically get into the file system cache (for a properly configured file system)
I would also look at the number of threads & the amount of context switches your program is making during execution. Higher amount of context switches will lead to poor performance.
So, I would typically see whether your process is CPU or I/O bound (top in *nix systems with
thread view) and then see the context switches.
If CPU bound, I will keep the number of threads close to the number of cores on the box.If I/O bound I will tune the I/O (which I doubt it is)
Edit: Reading in bigger chunks from the files is anyways a good idea as specified by Anayonkar