File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to decide the optimal number for "maximum threads to be running at a time".

 
Monica Shiralkar
Ranch Hand
Posts: 825
1
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using Thread pool I created a multithreaded application, which reads millions of files sequentially, then different threads process these files and write to different output files. Multiple threads are for the purpose of doing the over task faster. There is a parameter for maximum threads to be running at any time. I have set it to some number. How to decide what this number should be. Should it be 25 or 50 or 100 or 1000.How to decide what should be this number. Please advice.

thanks
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If they're all writing to the same hard disk, then a number closer to 10 would probably more be more adequate - the number of write heads the disk has plays a role in this. It also depends on the ratio of processing to file I/O those threads are doing. If, for example, the processing time is 10 times as large as the I/O time, it may make sense to use more threads. Then it also depends on how many CPU cores the machine in question has.

This is an area where you should do tests with 5, 10, 25, 50, 100 threads to see what problems arise.
 
Mike. J. Thompson
Bartender
Pie
Posts: 689
17
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You may find that having multiple threads writing to the same disk is worse than having one thread writing because it causes the write head to constantly be moving around when the thread context switches. You might want to try having multiple threads doing the processing then have a single queue writing to the disk.

The only way you'll know for sure is to perform tests on the hardware this will run on in production.
 
Campbell Ritchie
Sheriff
Pie
Posts: 47281
52
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
… and the optimum number of threads may change when you buy a new server.
 
Henry Wong
author
Marshal
Pie
Posts: 20836
75
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is also dependent on what other programs (or the OS) are doing.


As a side story, years ago, I was asked to check the same thing. In this case, it was an application that was doing a ridiculous amount of I/O. Lots of messages coming in from the network, that needed to get to the disk, with almost zero processing. What is the optimum number of threads for the disk write side? In that case, the recommendation was for the disk writes to be single threaded.

Henry
 
Monica Shiralkar
Ranch Hand
Posts: 825
1
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong
In that case, the recommendation was for the disk writes to be single threaded.


How to make this happen. I mean the processing is happening in run method which each of which writes to a file. So the writing part is in run method which means writing parts is also multi threaded. Once the run method is called for each thread how to make writing part single threaded?

thanks
 
Mike. J. Thompson
Bartender
Pie
Posts: 689
17
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You would need an extra thread responsible for writing date to files on disk. This thread would take jobs from a queue and process them. All of your data-processing threads would add file-writing jobs to the queue ready to be written.

Do remember though, we are only saying this might be more efficient. It will depend on what system you are running on, and all the other factors mentioned above.
 
Monica Shiralkar
Ranch Hand
Posts: 825
1
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Best performance in case of this application was seen with 15 threads.

You would need an extra thread responsible for writing date to files on disk. This thread would take jobs from a queue and process them. All of your data-processing threads would add file-writing jobs to the queue ready to be written.


Best performance in case of this application was seen with 15 threads.

How to write contents of each thread in a different file using queue?in what way data would be stored in a queue?while writing from queue how will program know which data will go in which separate file?
 
Steve Luke
Bartender
Pie
Posts: 4181
21
IntelliJ IDE Java Python
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Monica. Shiralkar wrote:How to write contents of each thread in a different file using queue?in what way data would be stored in a queue?while writing from queue how will program know which data will go in which separate file?

Those are pretty good questions - things you will have to answer in order to come up with a design. But instead of asking us to design it for you, why don't you ask the questions to yourself and see if you can come up with a design. Then you can come back to us with your design and we can comment on it for you?
 
Winston Gutkowski
Bartender
Pie
Posts: 9480
50
Eclipse IDE Hibernate Ubuntu
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:Those are pretty good questions - things you will have to answer in order to come up with a design. But instead of asking us to design it for you, why don't you ask the questions to yourself and see if you can come up with a design. Then you can come back to us with your design and we can comment on it for you?

@Monica: And I'd go a step further and ask yourself: why would I want to do this?

You may have perfectly good reasons, but it sounds to me like you're coming up with an implementation before you've worked out WHAT you're trying to solve; and that's generally not a good way to work.

Make your design fit the problem; not the other way around.
Specifically, don't say: "I'll use multi-threading[, or reflection, or a Queue...] to solve this", unless you've already thought through at least some other alternatives and can actually justify your choice. Multi-threading (and reflection) are NOT simple, and can lead to all sorts of obscure and difficult-to-detect errors if you're not very careful. They're also infernally difficult to test properly.

So my general rule of thumb is: Use only if absolutely necessary.

It's possibly also worth noting is that a queue is a single pipeline, which would seem to defeat the object of using threads to begin with; but as I say, I could well be wrong, and you may have a perfectly good reason for doing it.
Using threads for direct I/O, on the other hand, makes a lot of sense, since I/O tends to be very slow compared to program code, so if you can have several threads working in parallel, some of them can continue while others are blocked.

Winston
 
Monica Shiralkar
Ranch Hand
Posts: 825
1
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks all for all the help.


Multi-threading (and reflection) are NOT simple, and can lead to all sorts of obscure and difficult-to-detect errors if you're not very careful


Using multi threading was a decision and design of senior people in our company. What are some of these things one has to be careful about when using multi threading?
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Lots more than can easily be explained in forum posts. Work through one of the two books mentioned in the ThreadsAndSynchronizationFaq, and you should have a better grasp on the issues. http://docs.oracle.com/javase/tutorial/essential/concurrency/index.html is also a good start, but it's more about what features Java has when it comes to multi-threading, than discussing the pitfalls, and when to use what.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic