aspose file tools*
The moose likes JDBC and the fly likes Problem with large number of results Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Problem with large number of results" Watch "Problem with large number of results" New topic
Author

Problem with large number of results

Abdullah Khalaf
Greenhorn

Joined: Dec 02, 2009
Posts: 12
Dears,
Hope this finds you well.
I posted this in Other JSE/JEE APIs forum.
I have an issue and I need an architectural solution or opinion for this.
I have these three tables
1- Users table 2- Groups table 3- UserGroups table

The issue is that I have service allows users(these users are in Users table) to subscribe in specific groups (these groups are in Groups table), I managed this in UserGroups table.
The UserGroups contains two columns, group id (which is unique in Groups table) and user id (which is unique in Users table) -- one to many relationship --
The user table may contain over 1 000 000 of users.
The group id is number (1,2,....,8), so I use this SQL statement

I'm using this SQL statement to get the whole stuff as one record.
I'm using JDBC (prepared statement and result set), setting setFetchSize for both with specific parameter,
The return type is like this "Map<UserTO,List<GroupsTO>>", UserTO contains the details of the user, and GroupsTO contains a list of subscribe group.
The problem is when I execute this task, it takes very very long time and then a timeout exception thrown !
Any better solution? or any suggestion? from both sides the database and java code !
May environment is : Database: 9.5, application server: IBM - WebSphere Application Server 7, the task is hosted by EJB component as scheduler deployed in the application server.

Your quick help is highly appreciated.
Wendy Gibbons
Bartender

Joined: Oct 21, 2008
Posts: 1107

you are selecting every row in the users table multiplied by the rows in the user_groups table

you need forgot the

I have re-read this and why do you want every row?
what is your business reason for wanting every single row from the database in memory at once?
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3611
    
  60

"Database 9.5" - is it DB2?

The query fetches data of all users, right? It means your recordset will contain about million of rows, all of them a result of a group by operation. I cannot imagine how that could be made (and processed) fast.

Why are you loading all users at once? I believe proper indexes might help returning the data fast for individual users, but not for all of them at the same time.
Abdullah Khalaf
Greenhorn

Joined: Dec 02, 2009
Posts: 12
Wendy, thank you for your quick response, I select them all to send emails contain related information the group(s), one email per user,
for example, suppose user1 subscribed in group 1, group2 and group3, I have to send user1 email contains related things to these groups, I read these from IBM - WCM.

Martin, thank you too for you quick response, I'm sorry I didn't mention that, it is DB2 9.5,
And yes, you are right, about or even more than one million of rows, can you explain more about the indexes ? how may these help us?
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3611
    
  60

Indexes would help if you only wanted the data for one user. When you actually want all data, the DB will simply read the whole table, it is faster than using an index in such case.

You should probably find out which component causes the timeout (you cannot send million of emails in a few minutes anyway) and adjust that.

Also, you might tell the database to use "first row" approach (some databases allow you to specify whether you'll read all rows, or only first few rows of a resultset, and optimize for that. "All rows" minimizes the total time/work of the DB, but it might take a long while before the DB returns the first row. "First row" minimizes the time to obtain the first row, but the total time/work to obtain all rows this way may be significantly slower than in the "All rows" approach). Since the processing of individual rows will take some time anyway, the "First row" approach might be quite good for you. I don't know if DB2 can do this (although I suppose it does), and anyway you'd have to test it thoroughly.
Abdullah Khalaf
Greenhorn

Joined: Dec 02, 2009
Posts: 12
Wendy, thank you for your quick response, I select them all to send emails contain related information the group(s), one email per user,
for example, suppose user1 subscribed in group 1, group2 and group3, I have to send user1 email contains related things to these groups, I read these from IBM - WCM.

Martin, thank you too for you quick response, I'm sorry I didn't mention that, it is DB2 9.5,
And yes, you are right, about or even more than one million of rows, can you explain more about the indexes ? how may these help us?
E Armitage
Rancher

Joined: Mar 17, 2012
Posts: 892
    
    9
Just do a few records at a time using the ValueListHandler pattern. http://java.sun.com/blueprints/corej2eepatterns/Patterns/ValueListHandler.html
Abdullah Khalaf
Greenhorn

Joined: Dec 02, 2009
Posts: 12
E Armitage, Thank you for your help, I'll do this soon,
But I have question, is my scenario good solution for huge records ? is there any other solution ?
Jesus Angeles
Ranch Hand

Joined: Feb 26, 2005
Posts: 2057
Your data may grow even more in the future. You will then be affected by memory and processing speed, and your DB's limitations.

You can try like this (which I think will be at least 100 times faster than what you are doing now (total db-related time spent)).

1. Select all users. E.g. put in a List<String>. This will contain only the USERID. (If this list is still humongous, you can break it down based on other business factors, like, per department).
2. Sort the List if you want to.
3. Loop through this List
3.a. use your same query, but add 'where userid = ? ' that Wendy mentioned
3.b. Process this single user (i.e. send email to him, etc.)

Good luck!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Problem with large number of results