• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

ParallelStream and jdbc connection

 
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi everyone,

I have a question about performing several tasks in db, lets say db cleanup, but with parallelStream.

Lets say:

copyOnWritearraylist.parallelStream.forEach(
                    { dbConnectionAndCleanup() } );
...
dbConnectionAndCleanup() {
Try(  var c = getconnection();
        ... ) {
...
PerformDbCleanup();
... }

A lot of data should be cleaned on db, so I am wondering about the best way to improve its performance.
I am not sure opening a connection inside parallelStream is good idea?
Or is there any alternative?

Thanks
 
Sheriff
Posts: 28329
97
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I see more than one question here:

1. Should I use a Stream and call parallelStream for the purpose of creating several threads to run parallel database updates?

2. Should I try to do these database tasks in several threads?

3. Is running database tasks in several threads going to improve performance?

Let me try to respond to those:

1. Personally I would use a more obvious way of creating several threads -- create an ExecutorService and have it create the threads and run the tasks for you. But on the other hand parallelStream will decide on your behalf how many parallel threads it uses. This is good because you don't have to make that decision, but it's also bad because you don't get to make that decision.

2. Databases are designed to receive and process many requests and process them in parallel. It doesn't make a difference that several requests are all coming from you, unless the requests interfere with each other. Presumably when these updates are running, other people should not be using the database to do regular business, but that's true no matter how you run the updates.

3. Impossible to tell. There's a good chance the answer is yes, but a lot depends on how the database is configured.
 
Saloon Keeper
Posts: 28321
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Obviously you can run multiple database requests concurrently or no database-intensive web application would function effectively.

On the other hand, running multiple parallel requests on the same Connection might be problematic. Otherwise webapp servers wouldn't use Connection Pools.

Bear in mind, also, that each Connection is attached to a different network reply port on the database client machine. The more concurrent Connections, the more you are taking ports from the limited supply available.

Finally, note that you should NOT multi-thread requests in a web application, period. The JEE standard absolutely forbids servlet services from spawning threads. Failure to heed this stricture can potentially crash your server at unpredicatable, but almost certainly inconvenient times. The threads that servlets run under are only on temporary load for the life of the request.
 
Björn Björnsen
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:I see more than one question here:

1. Should I use a Stream and call parallelStream for the purpose of creating several threads to run parallel database updates?

2. Should I try to do these database tasks in several threads?

3. Is running database tasks in several threads going to improve performance?

Let me try to respond to those:

1. Personally I would use a more obvious way of creating several threads -- create an ExecutorService and have it create the threads and run the tasks for you. But on the other hand parallelStream will decide on your behalf how many parallel threads it uses. This is good because you don't have to make that decision, but it's also bad because you don't get to make that decision.

2. Databases are designed to receive and process many requests and process them in parallel. It doesn't make a difference that several requests are all coming from you, unless the requests interfere with each other. Presumably when these updates are running, other people should not be using the database to do regular business, but that's true no matter how you run the updates.

3. Impossible to tell. There's a good chance the answer is yes, but a lot depends on how the database is configured.



Thanks everyone for the reply,

Maybe I should give a little more detail:
lets say the service tht is running towards db only, invokes Locker framework, copyOnWritearrayList, ExecutorService  and parallelStream like the code below.



I am not professional in Java concurrence, how much the logic above is close to the solution that you suggested?
If the solution looks feasible, what is your opinion bout fixedThreadPool (5) that matches the sqlStatement parallel degree on db side? should they match, or doesn't mather and cachThreadPool may increase the process Speed there?

Thanks in advance for your reply
 
Paul Clapham
Sheriff
Posts: 28329
97
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If your createDbConnectionAndexecuteSqlStatement is simply going to spawn a new thread, then it makes essentially no difference if you use parallelStream or not. If it isn't clear why, it's because starting a new thread takes very little time so starting 5 threads in parallel takes 1 * very little time whereas starting them sequentially takes 5 * very little time. Not a thing that's worth optimizing as it just raises questions.

Hopefully you're using a JDBC connection pool as Tim says.

As for my opinion about what decisions may or may not optimize your operations, I have no opinion. It would take somebody with knowledge of your database environment to produce an educated guess, and even so "guess" is an important word in that phrase. You might be better off trying various options and see what works better.
 
Tim Holloway
Saloon Keeper
Posts: 28321
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:Hopefully you're using a JDBC connection pool as Tim says.



This of course depends on whether the project is a stand-alone Java application or a web application. Again, spawning threads, much less setting up any sort of thread management system is absolutely forbidden in the processing code of a servlet/JSP which would render the whole idea moot. There are places in a webapp outside of request processing code that are not restricted, but that's another matter.

JDBC Connection pools are very important in web applications, but most stand-alone apps don't need one. Although if you do, the Connection Pool libraries that Apache provides can also be used in stand-alone Java apps.
 
Björn Björnsen
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the reply,
I am using JDBC to create connection towards Oracle DB.
Inside that method createDbConnectionAndexecuteSqlStatement() it simply create a statement to clean some tables. (and it calls execureUpdate , sorry my bad, it is not execureQuery)
As I mentioned, the sqlStatement part does that cleanup part in parallel with degree (5).
so JDBC asks Oracle to use parallelism in order to run the task.
My question is abut Java side, is it ok to use executeService like that?
I am wondering if you anyone has any idea that creating a service and having parallelStream inside is a good idea for the purpose that I explained or not?
 
Paul Clapham
Sheriff
Posts: 28329
97
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Björn Björnsen wrote:My question is abut Java side, is it ok to use executeService like that?


An ExecutorService is, these days, the standard tool for creating thread pools and using them to run tasks in parallel. My question would be why would it be necessary to use anything else.

I am wondering if you anyone has any idea that creating a service and having parallelStream inside is a good idea for the purpose that I explained or not?


It just seems weird to me. (And it doesn't let you control the number of concurrent threads either, which seems like it may be a requirement.) Seems to me the obvious way to process these updates is to put them into a Stream and have the forEach() method call something which gets a thread from the ExecutorService and runs the update on that thread. Actually you don't even need a Stream because List already has the forEach method.
 
Björn Björnsen
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok great,
so it looks like something like Executors.newCashThreadpool and java collection forEach() would be enough for that right?
The reason that I used parallelStream was to make the process faster. (Also number of thread is not important in  Java side, it is oracle side that should be degree 4)
e.g. if copyOnWriteArrayList has about 100 table name to be clean, then do not wait to pick them one by one from the list.
 
Tim Holloway
Saloon Keeper
Posts: 28321
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I'm not sure that I follow that last. The Oracle server is very heavily parallelized automatically, and if it were not, then running multiple threads on the Java side would just end up queuing work anyway.

The only justification for parallelizing on the Java side would be if you are executing long-running requests, since JDBC is a synchronous protocol.

And even then there are limits. If one or more of your Java threads was running a transaction that interlocked with another thread's transaction, then one thread would be held up from the Oracle server, thus holding up the Java client thread. This could especially be a problem if you're attempting to parallel-load a single table. Which would definitely risk dragging down the Oracle server's performance.

All in all, we're really not seeing the advantages that the more complicated multi-threaded approach would allow. Or at least, based on anything I've ever done, I certainly don't.

It also might be worth considering whether a JDBC feature such as batching would help.
 
Björn Björnsen
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok, so I will ignore Stream.parallel and go with ExecutorService.newFixedThreadPool(5)
 
Björn Björnsen
Ranch Hand
Posts: 36
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am back again

I have created a test class, to kind of simulate what we do in real scenario:



In the result, having parallel stream make the process much more faster.
and Using ExecutorService makes no difference at all.
It looks like having no ExecutorService and just invoke parallel stream does what I need.

What is your idea?
 
Paul Clapham
Sheriff
Posts: 28329
97
Eclipse IDE Firefox Browser MySQL Database
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Björn Björnsen wrote:In the result, having parallel stream make the process much more faster.
and Using ExecutorService makes no difference at all.



Indeed that's true if "using ExecutorService" refers to your commented-out code, which simply creates a single thread to run everything one element at a time. That's a pretty biased comparison, compared to a design where "runCleanupTask" actually submits one of the database update processes which you're running, so that they run in parallel threads.

And when you use something which takes 1 second to run as your test task, it's also hardly surprising that it runs very quickly. I'm assuming that your database update processes take more than one second to run?

Anyway what I'd suggest is this: Write code which you like and understand. Run it and see what happens. If it's good enough then there's no need to mess with it any more. If not, then evaluate in what way it isn't good enough and address that specific problem.
 
Paul Clapham
Sheriff
Posts: 28329
97
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As for what I would do, here's my version (based on your example code):


Notice that you have to get whatever it is which is in your list (maybe it's an SQL statement?) and pass it via parameters through to the task which is submitted to the ExecutorService.
 
Enthuware Software Support
Posts: 4885
60
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Another point that you might want to consider is that just creating multiple threads does not ensure parallel execution in real because parallel execution depends on the runtime environment of the JVM (specifically CPU cores).
So, using a copyOnWritearraylist.parallelStream.forEach({ dbConnectionAndCleanup() } )   (copied from your first post)  might be a better option than using any kind of thread pool because elements of a parallelStream will be processed by an optimum number of threads.

There are, of course, many unknowns in your example as others have pointed out, so it may not be possible to determine which approach will be better for sure unless you see the actual results.
 
Paul Clapham
Sheriff
Posts: 28329
97
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Anilprem wrote:So, using a copyOnWritearraylist.parallelStream.forEach({ dbConnectionAndCleanup() } )   (copied from your first post)  might be a better option than using any kind of thread pool because elements of a parallelStream will be processed by an optimum number of threads.



True. But also note: all of those threads will be doing nothing but waiting for a long-running JDBC execute() command to complete. So the optimum number of those threads depends entirely on what else the computer is being used for at that time. Maybe the parallelStream() method takes that into account? I don't know. This is another of those unknowns you mentioned, so I'm even more of the mind to just try something plausible and see how it goes.

Of course if the first iteration of "see how it goes" takes the database down for several hours then that would be a bad thing, but then this project does need a fair amount of considering the effect on the database server since that's where all of the work gets done.
 
You got style baby! More than this tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic