aspose file tools*
The moose likes JDBC and the fly likes Resultset with million rows Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Resultset with million rows" Watch "Resultset with million rows" New topic
Author

Resultset with million rows

Pinik Tilli
Greenhorn

Joined: Jul 19, 2006
Posts: 19
Hi,

I would be glad if someone could provide solution for this problem.

working ENV:
Jdeveloper 10.1.3 - Development
Oracle 9i - Database
Windows XP - OS

I want to retrieve about 30 rows at a time . The table contains over 2 million rows. I would like to begin at the first row and drag 30 rows over the network. Then get the next 30 and repeat the process untill all rows are displayed.

I made 2 approaches for this:
1) cache based
2) query based

Using cache based approach, i was running out of memory error because all millions rows cannt be cached. Thats obvious. This kind of approach was working good for 10,000s of rows.

Using query based approach, i was able to retreive all the million rows by limiting the query. I run the query everytime to get the rows. This is working absolutely fine but performance is not really good. And this kind of approach is not acceptable at my work place.

So, i need to do work with resultset itself. I tried to set max rows on statement using:
prepareStatment.setMaxRows(10);
resultset = prepareStatment.executeQuery();

Here, i can obtain first 10 rows but if i need to get from 11- 20 and 20-30 rows and so on.......how do i achieve it???

Any other suggestions are also acceptible.
Looking forward for the reply.

Thanks.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

What does "displayed" mean in your post? Surely you don't propose for some human being to sit there and click the "Next" button 600,000 times?
Scott Johnson
Ranch Hand

Joined: Aug 24, 2005
Posts: 518
Before you try to cache data try to improve the performance of your query.

Run an explain and review it. Can you add an index, eliminate a join, or add a where condition to improve performance?

If the above doesn't work, maybe you could run a query for just the keys and cache the keys on the client. Then add the key values to the where condition for each query. (Hopefully the keys are something like integers. 2 million integers don't take up as much memory.)
Brian Mozhdehi
Ranch Hand

Joined: Aug 17, 2006
Posts: 81
You dont mention whether or not the server where the query is running is hardware limited in terms of RAM, etc. That said, I'm going to assume it was made relatively recently and as such, there is no reason why the server application could not cache 2 million records.

I wonder if you have tried to use the memory settings of -Xms and -Xmx when you start the server side java application? By default Java doesnt grab all available system memory, unless you specify the sizing with these settings. -Xms is the initial size and the increment size and -Xmx is the max.

try java (your app) -Xms 32M -Xmx 1024M to allow up to a GIG of RAM to be used by your server application. Adjust accordingly if you run out of memory again. If performance slows, but you dont run out of memory, adjust the -Xms setting up as well, say to 64M.

If you're machine is hardware bound to smaller RAM, and pagesize cannot be adjusted, then the last post is the only approach I can think of. But, I dont recommend doing 600,000 database I/O operations in anything that needs to go remotely "fast". To that point, I'm just curious why 2 million rows are being retrieved 30 at a time?
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

Paul, nice catch


[My Blog]
All roads lead to JavaRanch
steve souza
Ranch Hand

Joined: Jun 26, 2002
Posts: 860
A quick calculation shows

66,667 pages=2,000,000 rows/30 lines per page

So if hitting "next" takes 1 second a page, and with 60 seconds a minute, and 60 minutes an hour then it would take 18.5 hours for a user to scroll through all the pages (66,667/(60*60))! (Of course with lunch and bathroom breaks it would even take longer)

Anyway, back to the point. Most users don't want to look at that much data. When there is so much data to choose from you must some how figure out what they are after. For example would it help to look up info by customer, or day or something like that. Presumably they want to take action on the data. If it is a dss system then they more likely care about summarization.

With proper indexing performance on even tables with 2 million rows can be good.


http://www.jamonapi.com/ - a fast, free open source performance tuning api.
JavaRanch Performance FAQ
Scott Selikoff
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3704
    
    5

I think for the person who does reach all 2 million records (after days of clicking) you should add a nice easter egg... like a built-in version of tetris.

In all seriousness, what I think everyone is driving toward is that while your problem may be theoretically possible, it is unlikely in the real world. Its much more likely that the original problem you're trying to solve (such as exporting all the records in a table) would be accomplish in a completely different way, because if it did work, you'd kill your network bandwidth and server resources (*especially* if multiple users are performing these queries).


My Blog: Down Home Country Coding with Scott Selikoff
Pinik Tilli
Greenhorn

Joined: Jul 19, 2006
Posts: 19
Hi,

Thanks for all your replies. But unfortunately none of the replies were helpful to me.

You all have same question why do i need to retrieve only 30 rows at one time. This was just an example. This number will be given by user, so this can be even in 100s. And i know very well that no one would sit and click "Next" many times.

So, i request you people to provide some solution rather than writing unnecessary stuff.

Thanks again.
Pinik Tilli
Greenhorn

Joined: Jul 19, 2006
Posts: 19
Hi Scott,

Thanks for your kind reply.

I worked using query based approach.Here is the example of my query:

select * from (select rownum as num, a.* from tablename a)where num>=1 and num<=100

NOTE :"100" rows is just an example. It could be any number given by user.

i run this query for next 100 rows again and repeat the process untill all the rows are read. This works fine but is not acceptible as said already.

So, i need to work with resultset somehow and trying to retreive like this.As the given input query is dynamic i create bean classes using "RowSetDynaClass".

prepareStatment= conn.prepareStatement(sql.toString(),
ResultSet.TYPE_SCROLL_INSENSITIVE,java.sql.ResultSet.CONCUR_READ_ONLY);
prepareStatment.setMaxRows(10);
resultset = prepareStatment.executeQuery();
rowsdc = new RowSetDynaClass(resultset);

Here, i can create 10 dynabeans for first 10 rows. Then how can i get retrieve next 10 and create the dynabeans???

Note: The query should be executed only once and should play with resultset back and forth to retreive the rows.

If not this approach let me know someother feasible approach.

Thanks.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Originally posted by Pinik Tilli:
You all have same question why do i need to retrieve only 30 rows at one time. This was just an example. This number will be given by user, so this can be even in 100s. And i know very well that no one would sit and click "Next" many times.

So, i request you people to provide some solution rather than writing unnecessary stuff.
No. The question is why you need to retrieve all 2 million rows. It doesn't matter whether you retrieve them 1 at a time or 1,000 at a time. The whole idea is infeasible and user-hostile.

I'm sorry if you think that is unhelpful. But I don't feel that I should be enabling such a bad idea. I believe that encouraging you to rewrite the design so that it could actually be used by people is helpful.
Pinik Tilli
Greenhorn

Joined: Jul 19, 2006
Posts: 19
Thankx for the reply Paul. Please read the above reply i wrote regarding this issue. May be you can write me some solution.

Thanks again.
seshu Palamanti
Ranch Hand

Joined: Jul 03, 2004
Posts: 59
I think Pinik wants some thing like this. Pinik Has a method which can accept a SQL query as input parameter and number of results the user wants

public RowSetDynaClass getResults(String sql,int resultNumber)
{
prepareStatment= conn.prepareStatement(sql.toString()),
ResultSet.TYPE_SCROLL_INSENSITIVE,java.sql.ResultSet.CONCUR_READ_ONLY);
prepareStatment.setMaxRows(resultNumber);
resultset = prepareStatment.executeQuery();
rowsdc = new RowSetDynaClass(resultset);
return rowsdc ;
}

So the user gets back a result for the first 10 rows, now the user again requests another set of 10 rows. This method is again called and here is the real problem starts he now wants to get results from 11-20. Simply put the question is how to maintain the state of the results with out modifying the query the user inputs??
[ August 30, 2006: Message edited by: seshu Palamanti ]
Pinik Tilli
Greenhorn

Joined: Jul 19, 2006
Posts: 19
Thanks Seshu.

I was looking for something like this.Please let me know if you can write some solution.
Purushoth Thambu
Ranch Hand

Joined: May 24, 2003
Posts: 425
In short I think you can't achieve without modifying the query and retaining the position in the user session/request context. If you want to pull records incrementally from DB then you need to use ROWNUM to retrict the query something like .


The important thing is to use ORDER BY in the inner query to make sure we actually get next set of records.

All you need to do is bind the values of "num" like (1,30) (31,60) in the JDBC call.
[ August 30, 2006: Message edited by: Purushothaman Thambu ]
Scott Johnson
Ranch Hand

Joined: Aug 24, 2005
Posts: 518


Why is the extra select necessary? Shouldn't this work:



I'd have to look at the explain to be sure, but it seems like that would be more efficient.
Scott Selikoff
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3704
    
    5

I think one thing people might be forgetting is this last technique listed is unstable. The data could change between queries such that 11-20 on the second query may not contain the data I thought it did. For example if a record was added between 4-5, then 11 would now be a repeat of 10. Even further if a number was deleted like record 5, then all the items would move up and I would never see the previously numbered record 11.

Oh and to Pinik, speak clearly and say what you mean when posting questions. It sounds like you wanted an iterate over an arbitrary number of records but you originally indicated you want to iterate over 2 million records 30 at a time. This caught everyone's attention because this is a very high number such that I can think of no viable case you would want to do this task. You need to think like a computer scientist and understand that there are limits to modern computing (unless you just have insanely good bandwidth and get hit next really really fast) and because of these limits, often times solutions (especially in the JDBC world) are based on strong performance indications rather than what you would doing in a perfect theoretical world.
[ August 30, 2006: Message edited by: Scott Selikoff ]
Scott Johnson
Ranch Hand

Joined: Aug 24, 2005
Posts: 518
Pinik,

Have you researched how to improve the query performance as previously suggested?

Assuming you are not taking Paul's advice, I can think of four options:

1. Run the query and store the entire resultset in memory. Obviously for a large resultset this will use a significant amount of memory.

2. Run a single query, store the result set is a safe place and iterate through it as the user clicks next. This is inefficient and will not scale well because you are holding resources (memory, db connections, etc) for a potentially long time.

3. Query for a list of key values and cache those on the client. Then do the query (with a list of keys in the where condition) for the next set of data values when the user hits next. This allows you to release connection/resultset resources between queries, but could still use a bit of memory for the key list. This also assumes the memory used by the key list is much smaller than a resultset containing all of the data otherwise you aren't gaining much.

4. Rerun the query each time a user clicks next as you are doing, but this isn't performing well.

Pick your poison.
Purushoth Thambu
Ranch Hand

Joined: May 24, 2003
Posts: 425
Scott,

4. Rerun the query each time a user clicks next as you are doing, but this isn't performing well.


Why do you think this won't perform. Before we get into the performance part, first I want to share some thing which was used for the problem we are discussing, in big products like ERP softwares.

- Each fetch from the DB usually doesn't just get one page of records but 'n' pages and this 'n' pages stored in the session. Ideally I have seen fetching somewhere around 300 records at a single JDBC call. Once user exhuast these records a call is again made to DB to get next set of 300 records and it continues as long as the max configured rows (like 5000). This off course has a small issue, the user might
- Miss the record if it's inserted in the visited bucket. This insertion may also result in user seeing duplicate rows in the consecutive pages. This is not really a drawback. However on the positive note, If the record is inserted after the current bucket/page then user can see them.
- Trying to achieve Read consistency in a large table may not be ideal. I believe this approach is quite acceptable.

Now to the performance of the query where you fetch in buckets, I don't know about other databases but Oracle does it better.

The Query

SELECT * FROM ( SELECT * FROM EMP, DEPT, LOC ORDER BY EMP.ENAME ) WHERE ROWNUM BETWEEN :1 and :2


Oracle will optimize the query so that retrieving the records from tables will be stopped when the upper bound of rownum is hit. In Oracle, inner query on first call will fetch 300 rows, 2nd call 600, 3rd call 900 and so on, the outer query trims the rows and gives back just 300 rows in the requested bucket. If you generate the explain plan you will observe something like "STOPKEY" which indicates rows are fetched with upperbound as stopkey.

Still better, if you have index on (EMP.ENAME) the inner query order by clause . I got records as fast as one could expect with very less number of consistent, physical gets and without any internal sorts.
[ August 31, 2006: Message edited by: Purushothaman Thambu ]
Scott Johnson
Ranch Hand

Joined: Aug 24, 2005
Posts: 518
Why do you think this won't perform


Because Pinik said it wasn't performing.

I made some suggestions on tuning the query, but Pinik has not posted the results.
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
Here are some notes I drag out now & then. Have we added anything new here?


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Christophe Verré
Sheriff

Joined: Nov 24, 2005
Posts: 14687
    
  16

A good question is never answered.

Neither is a misleading one.
Maria Smith
Ranch Hand

Joined: Apr 01, 2004
Posts: 40
Hi everyone

Thankfully I don't have to retrieve 2 milion rows. But I need to retrive 1000 rows and need to display 20 rows per page.
I am using DB2 database and and tried this query to limit number of rows

select emp_name
,rownumber() OVER (ORDER BY emp_name) AS ROW_NEXT
from EMPLOYEE
ROW_NEXT BETWEEN 20 and 40

But it doesn't seem to work and got following error.
[Error Code: -104, SQL State: 42601] [IBM][CLI Driver][DB2] SQL0104N An unexpected token "(" was found following "". Expected tokens may include: ", FROM INTO ". SQLSTATE=42601


then tried this

select emp_name
,rownumber() OVER ORDER BY emp_name AS ROW_NEXT
from EMPLOYEE
ROW_NEXT BETWEEN 20 and 40


and got this error.
[Error Code: -199, SQL State: 42601] [IBM][CLI Driver][DB2] SQL0199N The use of the reserved word "ORDER" following "" is not valid. Expected tokens may include: ", FROM INTO ". SQLSTATE=42601


I also tried use com.sun.rowset.CachedRowSetImpl creating CashedRow like this

CachedRowSet crs = new CachedRowSetImpl();
crs.setPageSize(20);
but get error for setPageSize method

Some one plase help with this or let me know if there is any better way of doing this.

I really appreciate your help

Maria
seshu Palamanti
Ranch Hand

Joined: Jul 03, 2004
Posts: 59
What really happens if there are 1 million Records in a table and you are trying to retrieve them all through the ResultSet. Does ResultSet get all thr results in the memory.(Cause for the OutOfMemoryError) Lets say we are restricting the min and max rows retrieved doesnt ResultSet flush out the previous data from the memory and get more from the connection every time we call rs.Next(); Is there any way we can flush the resultset before it fetches more from connection and possibly avoid OutofMemoryError.
Pinik Tilli
Greenhorn

Joined: Jul 19, 2006
Posts: 19
Hi,

Firstly i thank you all for helping me.

According to Stan James, i made an attempt to use first 3 algorithm approaches as stated. First 2 approaches seems not to be feasible. 3rd algortithm seems to be fine and started working on that from last 2 days.

But even this approach is giving me an Outofmemory error. I am able to retreive 21000 rows from resultset and then onwards i start getting Outofmemory error.

I think as seshu suggested is there anyway to flush previous data from the memory. If this is possible, then i can retreive all the rows and the problem will be solved.

Please suggest me a solution to overcome this error.

Thanks Again.
Scott Selikoff
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3704
    
    5

You could retrieve them in chunks of multiple results set. After you finish 20k records for example, execute a new query and create a new result set. Just make sure your previous object is closed and/or set to null so that it is not reachable. From there you can execute a system gc() call and hope that it clears the memory properly (not guarenteed).

What are you doing with the data from the result set? If you are storing it in memory it may be the source of your memory error and not the result set itself.
Maria Smith
Ranch Hand

Joined: Apr 01, 2004
Posts: 40
Can some one help with my DB2 issue

Thanks
seshu Palamanti
Ranch Hand

Joined: Jul 03, 2004
Posts: 59
Thanks for your suggestion Scott Selikoff but if we create a new query altogether the we would have to again start from the 1st Row of the Result and now not from 21000+1 th row of the result set. Yes Pinik can verify if her data structures are causing the OutOfMemory Error by maybe by maybe printing out the resultset. So how does this ResultSet Work ? so lets say if we use setFetchSize does it mean that every time we call rs.next() it goes and fetches the specified number of rows and removes or flushes the old ones? If no is there any way we can rs.flush()?
[ September 01, 2006: Message edited by: seshu Palamanti ]
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30352
    
150

Originally posted by Maria Smith:
Can some one help with my DB2 issue

Thanks

Maria,
The db2 clause for first X rows is "fetch first X rows only"


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30352
    
150

Originally posted by seshu Palamanti:
Thanks for your suggestion Scott Selikoff but if we create a new query altogether the we would have to again start from the 1st Row of the Result and now not from 21000+1 th row of the result set.

Seshu,
If you can return the data in a sorted order, you can add a clause to your query to only return the next X results after the one returned. Suppose you are returning usernames. The first 20 are adams - charlie. Then you do your next query and add "where ... and username > 'charlie'.
Maria Smith
Ranch Hand

Joined: Apr 01, 2004
Posts: 40
Thank you for your reply. But how to get next set of rows?
[ September 05, 2006: Message edited by: Maria Smith ]
Jeanne Boyarsky
internet detective
Marshal

Joined: May 26, 2003
Posts: 30352
    
150

You do the same query using the username that was last returned from the previous call.
Maria Smith
Ranch Hand

Joined: Apr 01, 2004
Posts: 40
Thank you for your reply
That looks very expensive to me. Because every time I am running my query against million rows to get my next 20 rows.
Is it possible to right a cursor like this "FOR SELECT OF" in DB2? I am using Java and I am unfamiliar with righting cursor for DB2.
Please give me an example for creating Cursor for DB2
Scott Selikoff
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3704
    
    5

Have you tried adding indexes to the table so that each query is not expensive? Have you run performance benchmarks to determine whether or not it really is expensive?
Maria Smith
Ranch Hand

Joined: Apr 01, 2004
Posts: 40
I tried CachedRowSet and it is working in my java class.

crs.setPageSize(20);

while (crs.nextPage())
{
System.out.println ("Page Number :"+ n++)
if (crs != null) {
crs.beforeFirst();
while(crs.next())
{
System.out.println("crs.getString(1));
}
}
}
How to call this method from JSP so that it should get only 20 rows when I click the NEXT button.

Gurus Please help me
[ September 06, 2006: Message edited by: Maria Smith ]
Yellapa Adepu
Ranch Hand

Joined: Jan 17, 2006
Posts: 34
Hi Pinik
Did you get a solution for the problem. I am in need of same kind of solution. If you can you please share.

Thanks,


Yellapa Adepu<br />SCEA
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Resultset with million rows