File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes How to decide the optimal number for Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "How to decide the optimal number for "maximum threads to be running at a time"." Watch "How to decide the optimal number for "maximum threads to be running at a time"." New topic
Author

How to decide the optimal number for "maximum threads to be running at a time".

Monica. Shiralkar
Ranch Hand

Joined: Jul 07, 2012
Posts: 670
Using Thread pool I created a multithreaded application, which reads millions of files sequentially, then different threads process these files and write to different output files. Multiple threads are for the purpose of doing the over task faster. There is a parameter for maximum threads to be running at any time. I have set it to some number. How to decide what this number should be. Should it be 25 or 50 or 100 or 1000.How to decide what should be this number. Please advice.

thanks
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42946
    
  70
If they're all writing to the same hard disk, then a number closer to 10 would probably more be more adequate - the number of write heads the disk has plays a role in this. It also depends on the ratio of processing to file I/O those threads are doing. If, for example, the processing time is 10 times as large as the I/O time, it may make sense to use more threads. Then it also depends on how many CPU cores the machine in question has.

This is an area where you should do tests with 5, 10, 25, 50, 100 threads to see what problems arise.
Mike. J. Thompson
Ranch Hand

Joined: Apr 17, 2014
Posts: 305
    
    5
You may find that having multiple threads writing to the same disk is worse than having one thread writing because it causes the write head to constantly be moving around when the thread context switches. You might want to try having multiple threads doing the processing then have a single queue writing to the disk.

The only way you'll know for sure is to perform tests on the hardware this will run on in production.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 40052
    
  28
… and the optimum number of threads may change when you buy a new server.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19073
    
  40

It is also dependent on what other programs (or the OS) are doing.


As a side story, years ago, I was asked to check the same thing. In this case, it was an application that was doing a ridiculous amount of I/O. Lots of messages coming in from the network, that needed to get to the disk, with almost zero processing. What is the optimum number of threads for the disk write side? In that case, the recommendation was for the disk writes to be single threaded.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Monica. Shiralkar
Ranch Hand

Joined: Jul 07, 2012
Posts: 670
Henry Wong
In that case, the recommendation was for the disk writes to be single threaded.


How to make this happen. I mean the processing is happening in run method which each of which writes to a file. So the writing part is in run method which means writing parts is also multi threaded. Once the run method is called for each thread how to make writing part single threaded?

thanks
Mike. J. Thompson
Ranch Hand

Joined: Apr 17, 2014
Posts: 305
    
    5
You would need an extra thread responsible for writing date to files on disk. This thread would take jobs from a queue and process them. All of your data-processing threads would add file-writing jobs to the queue ready to be written.

Do remember though, we are only saying this might be more efficient. It will depend on what system you are running on, and all the other factors mentioned above.
Monica. Shiralkar
Ranch Hand

Joined: Jul 07, 2012
Posts: 670

Best performance in case of this application was seen with 15 threads.

You would need an extra thread responsible for writing date to files on disk. This thread would take jobs from a queue and process them. All of your data-processing threads would add file-writing jobs to the queue ready to be written.


Best performance in case of this application was seen with 15 threads.

How to write contents of each thread in a different file using queue?in what way data would be stored in a queue?while writing from queue how will program know which data will go in which separate file?
Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Monica. Shiralkar wrote:How to write contents of each thread in a different file using queue?in what way data would be stored in a queue?while writing from queue how will program know which data will go in which separate file?

Those are pretty good questions - things you will have to answer in order to come up with a design. But instead of asking us to design it for you, why don't you ask the questions to yourself and see if you can come up with a design. Then you can come back to us with your design and we can comment on it for you?


Steve
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8427
    
  23

Steve Luke wrote:Those are pretty good questions - things you will have to answer in order to come up with a design. But instead of asking us to design it for you, why don't you ask the questions to yourself and see if you can come up with a design. Then you can come back to us with your design and we can comment on it for you?

@Monica: And I'd go a step further and ask yourself: why would I want to do this?

You may have perfectly good reasons, but it sounds to me like you're coming up with an implementation before you've worked out WHAT you're trying to solve; and that's generally not a good way to work.

Make your design fit the problem; not the other way around.
Specifically, don't say: "I'll use multi-threading[, or reflection, or a Queue...] to solve this", unless you've already thought through at least some other alternatives and can actually justify your choice. Multi-threading (and reflection) are NOT simple, and can lead to all sorts of obscure and difficult-to-detect errors if you're not very careful. They're also infernally difficult to test properly.

So my general rule of thumb is: Use only if absolutely necessary.

It's possibly also worth noting is that a queue is a single pipeline, which would seem to defeat the object of using threads to begin with; but as I say, I could well be wrong, and you may have a perfectly good reason for doing it.
Using threads for direct I/O, on the other hand, makes a lot of sense, since I/O tends to be very slow compared to program code, so if you can have several threads working in parallel, some of them can continue while others are blocked.

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Monica. Shiralkar
Ranch Hand

Joined: Jul 07, 2012
Posts: 670
thanks all for all the help.


Multi-threading (and reflection) are NOT simple, and can lead to all sorts of obscure and difficult-to-detect errors if you're not very careful


Using multi threading was a decision and design of senior people in our company. What are some of these things one has to be careful about when using multi threading?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42946
    
  70
Lots more than can easily be explained in forum posts. Work through one of the two books mentioned in the ThreadsAndSynchronizationFaq, and you should have a better grasp on the issues. http://docs.oracle.com/javase/tutorial/essential/concurrency/index.html is also a good start, but it's more about what features Java has when it comes to multi-threading, than discussing the pitfalls, and when to use what.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How to decide the optimal number for "maximum threads to be running at a time".