jQuery in Action, 3rd edition
The moose likes JDBC and Relational Databases and the fly likes handling large datasets Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » JDBC and Relational Databases
Bookmark "handling large datasets" Watch "handling large datasets" New topic

handling large datasets

nobu taku

Joined: Aug 06, 2003
Posts: 17
what is the best practice for presenting large datasets to users in a web application, say a 'product search'?
Do you obtain the entire dataset at once and then show 'x' at time?
Do you perform queries that only obtain 'x-sized' chunks of the dataset?
Adeel Ansari
Ranch Hand

Joined: Aug 15, 2004
Posts: 2874
first one is fine and right.
second one didn't make sense to me. sorry.
Bear Bibeault
Author and ninkuma

Joined: Jan 10, 2002
Posts: 63838

The second option makes perfect sense with a database that supports it. Why send all the data if you are only going to throw away most of it? (In-memory caching is another issue).

PostgreSQL, for example, offers the keywords OFFSET and LIMIT with which you can easily obtain only the data necessary for the "current page".

[Asking smart questions] [About Bear] [Books by Bear]
nilesh Katakkar
Ranch Hand

Joined: Oct 27, 2004
Posts: 35
Yeah agree with Bear Bibeault ! Second approach is the only correct approach !

If you have let's say 100,000 rows or 1 million rows and you are only showing lets say 25 rows at a time, you probably will end up overkilling your application by fetching all the data. Just caching all rows on middle tier wont make sense if your data is real time one. Once the data is send back to the client client may want to see next 25 records or provide another filter criteria to reduce the rows retrieved or whatever. Plus with web apps you will never know, how long it will take client to decide to click next. He may he may not. How long will you keep cached data on middle tier ? What happens when some other users modifies the data in the DB ? You will have stale copy of data. Now add 100 users simulteneously quering same data... # of copies on middle tier will increase linearly.

Bear in his message told the way you can do it in POSTgreSQL. In MS Sql server I think there's a TOP N clause. In oracle you can try using Analytical functions rank () etc. or.. other way is to use inline views..

( SELECT rownum r, a.* r FROM
( <YOUR SQL Statement Here Including WHERE GROUP ORDER BY> A
WHERE rownum < :toRow)
WHERE r > :fromRow
[ November 08, 2004: Message edited by: nilesh Katakkar ]

nilesh<br />neilindallas@hotmail.com
Jeanne Boyarsky
author & internet detective

Joined: May 26, 2003
Posts: 33102

Bear is definitely correct that it is best to get less rows.

Note that Nilesh's query still needs to do the whole query. It just returns less data. This can still be a big savings, but limit is often more efficient (if supported.)

[OCA 8 book] [Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Other Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, TOGAF part 1 and part 2
Rudy Simon Yeung

Joined: Jun 06, 2003
Posts: 15
Caching and memory are very expensive. If possible, I would also suggest you setting up validation rules to narrow down the search. This would reduce the number of records returned.


Adeel Ansari
Ranch Hand

Joined: Aug 15, 2004
Posts: 2874
Originally posted by adeel ansari:
first one is fine and right.
second one didn't make sense to me. sorry.

i think i was lost somewhere while replying this.
pardon. was really not in my senses though.

second one is far-better, as Bear and Jeanne said. i used to suggest the same thing, but dont know where were me. anyways thanks for all the comments to wake me up.
I agree. Here's the link: http://aspose.com/file-tools
subject: handling large datasets
It's not a secret anymore!