wood burning stoves*
The moose likes JDBC and the fly likes Approaches for OLTP on huge datasets in RDBMS Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Databases » JDBC
Bookmark "Approaches for OLTP on huge datasets in RDBMS" Watch "Approaches for OLTP on huge datasets in RDBMS" New topic
Author

Approaches for OLTP on huge datasets in RDBMS

Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3308
    
    7
Hi,

We are currently working on a project where the application needs to manipulate millions of rows at a time. The queries join multiple tables, some of which contains 100 million plus number of rows. For example, we have one query that selects a million rows based on a criteria and the inserts those rows again with some changes in the fields. The select and insert happen in the same query. We have several of such queries which are formed dynamically and the size of a query itself could be a few 100 lines.

The queries are optimized, tables are partitioned, temps tables are used, indexes are in place, sessions are parallelized. In short, we are doing what we can but the size of the dataset itself is really huge. We add more than a million rows per day in a some tables.

We are using Oracle's Exadata severs to host the database, which is quite expensive. Things are working but really slow.

I am curious to learn what other approaches might be useful to develop this kind of functionality where you have to manipulate millions of rows without resorting to batch mode.

Also, I am interested in knowing whether free RDBMS such as Mysql and postgresql are capable of handling such sizes and in what ways have other people leverages these databases to do similar stuff.

thank you,
Paul.


Enthuware - Best Mock Exams and Questions for Oracle/Sun Java Certifications
Quality Guaranteed - Pass or Full Refund!
Steven Squeers
Ranch Hand

Joined: Aug 19, 2013
Posts: 62
Sounds like you have a performance tuning problem. There are so many ways that SQL and PL/SQL (if you're using Oracle) can be optimised; is it possible that you've not exhausted the possibilities?
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1728
    
  14

Sounds like you've already done many of the things I would normally suggest, and I don't know what optimisations Exadata might already be performing. But here's a few common tricks you might be able to try.

If you're doing a lot of inserts, then you might actually be better off with fewer indexes, as they have to be updated when you insert records (or update indexed columns). You might consider dropping/re-building some indexes after big transactions - this will cause the indexes to be re-built for all your changes in one go instead of for each individual record, but it may take a long time.

Also think about disabling/re-enabling FK constraints before/after big transactions e.g. foreign keys can be checked after you've inserted all the data instead of for each record individually.

Make sure your table and index partition keys are consistent e.g. you can partition indexes in the same way as tables, so the index/table can be queried by the partition key instead of having to search all partitions. Inserts may also be faster if the table/index partition keys are consistent, as the DB only needs to modify the relevant partitions.

Look at how your inserts are being performed - can you use the APPEND hint for direct-path writes to disk (assuming Exadata works the same as Oracle in this respect)?

You might also want to look at your data model, as performing a lot of joins with these big operations is going to be slower than working on a simpler table structure e.g. denormalised (or star schema). If you have to keep joining the same data to perform these big operations, consider storing it in one place and accept the overhead of having some duplication in your data and extra maintenance to keep it up to date.

Dynamic SQL is usually slower than static SQL, because it cannot be parsed until runtime (it's also a maintenance nightmare). Make sure you are using proper bind variables in your dynamic SQL, so that the statement doesnt have to be re-parsed if you call it several times with different parameters.

Consider how you might use PL/SQL (is it available on Exadata?) to reduce your use of dynamic SQL e.g. you can pass REF CURSOR variables around to provide alternative queries at runtime, which might allow you to replace some dynamic queries.

As for PostgreSQL etc, my vey limited experience of PostgreSQL is that it's a good enterprise-level RDBMS for general use, but it lacks a lot of the more sophisticated mechanisms that Oracle provides for managing really large volumes of data e.g. partitioning is much more primitive than in Oracle.


No more Blub for me, thank you, Vicar.
chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1728
    
  14

Couple more thoughts:

It sounds like your queries are reading lots of data from, and writing to, the same table. So you're going to be having problems with I/O contention if your data is physically located in the same place --> look at partitioning/striping your data across separate disks etc?
Reading/writing lots of data in a single SQL statement is usually the most efficient way to do it, but sometimes you can hit memory bottle-necks, in which case you might need to look at batching the records e.g. using PL/SQL BULK operations to move smaller sets of records in each iteration. Not sure if this is likely to be an issue on Exadata, but it's something I've run into on regular Oracle RDBMS occasionally. It's usually easier just to give it more memory of course!
Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3308
    
    7
Steven Squeers wrote:Sounds like you have a performance tuning problem. There are so many ways that SQL and PL/SQL (if you're using Oracle) can be optimised; is it possible that you've not exhausted the possibilities?


Yes, it is possible that there is scope for more optimization and we are working on it. But it doesn't change the fact that there are a million rows that need to up updated and/or inserted in a single query.
Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3308
    
    7
Thanks for the suggestions, Chris. We are working on these.

Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3308
    
    7
Optimizing the current system (architecture, database structure, and queries) is one line of action that we are already working on.

What I am trying to get at is this: Is there any radically different approach that can do these kind of things better? If you were to develop an application that had similar requirement, what would you do? This is not a data warehouse application. It is a regular OLTP application but with huge amount of data. So we thought that we need the best OLTP database and we chose Exadata platform (which is basically Oracle database on top of Oracle's hardware optimized for the database). It is working. The whole thing is a lot lot faster than when we were using Oracle database on our regular machines.

But the question is, is that the best approach? Is there any approach that we can use to achieve the same result but by using cheaper alternative.


Second question is, is is possible to use MySQL or PostgreSQL on a similar hardware as the one used in Exadata and achieve the same performance? This is basically a comparison of the capabilities of the database s/w itself. Has anyone here used these databases for managing such large datasets on commodity hardware?
Steven Squeers
Ranch Hand

Joined: Aug 19, 2013
Posts: 62
How Do I Make It Run Faster?
The question in the heading is one I get asked all the time. Everyone is looking for the fast = true switch, assuming "database tuning" means that you tune the database. In fact, it is my experience that more than 80 percent (frequently 100 percent) of all performance gains are to be realized at the application design and implementation level - not the database level. You can't tune a database until you have tuned the applications that run on the database...
Note There are no silver bullets, none. If there were, they would be the default behavior and you would never hear about them...
You will only achieve large increments in performance by fixing the application, perhaps by making it do significantly less I/O.


Sir Thomas of Kyte (Expert Oracle Architecture)

There are sooooo many ways you can tune SQL, even silly little things like not using SELECT * if you only need to access a couple of columns, or not updating all the columns in a row, when you only need to update 1 column (because the db won't figure out for you that you're sucking up and overwriting duplicate data). Doing stuff in memory by creating associative arrays, caching data (there are so many ways to cache data - per session, cross session, PL/SQL cache, SQL result cache, whatever). There are whole books on this, on the SQL aspect, on the PL/SQL aspect, on the indexing aspect, on the holistic "tuning Oracle" aspect.

I'm sorry I can't myself be of more (or even any) help (sounds like a great challenge though); I hope though that you've also asked this question in a DBA or database forum, because this is a database issue; I'm not sure it's really got anything to do with JDBC.
Best of luck.



chris webster
Bartender

Joined: Mar 01, 2009
Posts: 1728
    
  14

Steven Squeers wrote:I hope though that you've also asked this question in a DBA or database forum, because this is a database issue; I'm not sure it's really got anything to do with JDBC.
Best of luck.

Good point: Have you tried Oracle Technet?
Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3308
    
    7
I guess I am not putting my questions in a coherent unambiguous way. Let me selectively quote from what I wrote above so as to remove all the noise from my lengthy post:

Yes, it is possible that there is scope for more optimization and we are working on it. Optimizing the current system (architecture, database structure, and queries) is one line of action that we are already working on.



What I am trying to get at is this: Is there any radically different approach that can do these kind of things better? If you were to develop an application that had similar requirement, what would you do?



Second question is, is it possible to use MySQL or PostgreSQL on a similar hardware as the one used in Exadata and achieve the same performance?


So again, yes, thanks a lot for tips on sql tuning. We ARE working on it. But just wondering what other approaches people have taken in this situation (besides working towards tuning queries). For example, if you want certain performance from your system you can either - spend money (cost of labor) and time on tuning the queries and database or you can spend money on expensive hardware (Or a combination of the two).


Yes, this has nothing to do with JDBC. I was looking for a database forum to post it but this was the closest I could find.

-Paul.

Steven Squeers
Ranch Hand

Joined: Aug 19, 2013
Posts: 62
Paul Anilprem wrote:

Yes, this has nothing to do with JDBC. I was looking for a database forum to post it but this was the closest I could find.



https://forums.oracle.com/community/developer/english

There are many subforums. Pretty sure you should be able to find your way around. You'll need to register of course before you can post a question.
Also, presumably you have a paid Oracle support contract, no? I don't know how they work but presumably you're entitled to assistance through that.

As to this:

For example, if you want certain performance from your system you can either - spend money (cost of labor) and time on tuning the queries and database or you can spend money on expensive hardware (Or a combination of the two).


Received wisdom is that you should always try to do the former (but if you can't, there's always Exadata...but then you're already using that).

As to your questions: *we get it*, we *know* that what you really want is somebody to come on here and say "yes, we're running MySQL and it happily chomps through millions of rows per second/minute/whatever". It's just not likely I think that you're going to get such a response here. I'm new here but it seems to me this is a JDBC developer's forum. What you're asking is an architect-level/system design/implementation/enterprise DBA question. Try the Oracle forums; get on a MySQL forum, there must be a few of them around. 1 quick search on google reveals this one: http://forums.mysql.com/ with 58k questions for starters.
One thing you can say for sure is, somebody on this blue marble either is or is not using MySQL with data loads at least as large as your own. Best ask away in the mysql forum, on stackoverflow.com, dbforums.com, wherever, I dunno - there are millions of forums. If I were you I'd be asking the same question on 10 different forums.
One thing NOBODY can say for sure, apart from your team, is what options are open to you besides switching your DB. To answer that question requires an intimate familiarity with your set-up. And that means paying somebody who "knows stuff" (if you don't have such a person available) to come and take a look.
There are many ways to skin a rabbit eh.

Like I say, best of luck, sorry I can't be of any help.
Cheers
SS





Paul Anilprem
Enthuware Software Support
Ranch Hand

Joined: Sep 23, 2000
Posts: 3308
    
    7
Steven Squeers wrote:
Paul Anilprem wrote:

Yes, this has nothing to do with JDBC. I was looking for a database forum to post it but this was the closest I could find.



on JavaRanch

thank you for your suggestions.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Approaches for OLTP on huge datasets in RDBMS