File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

handling large datasets

 
nobu taku
Greenhorn
Posts: 17
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
what is the best practice for presenting large datasets to users in a web application, say a 'product search'?
Do you obtain the entire dataset at once and then show 'x' at time?
Do you perform queries that only obtain 'x-sized' chunks of the dataset?
??
 
Adeel Ansari
Ranch Hand
Posts: 2874
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
first one is fine and right.
second one didn't make sense to me. sorry.
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 64205
83
IntelliJ IDE Java jQuery Mac Mac OS X
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The second option makes perfect sense with a database that supports it. Why send all the data if you are only going to throw away most of it? (In-memory caching is another issue).

PostgreSQL, for example, offers the keywords OFFSET and LIMIT with which you can easily obtain only the data necessary for the "current page".
 
nilesh Katakkar
Ranch Hand
Posts: 35
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah agree with Bear Bibeault ! Second approach is the only correct approach !

If you have let's say 100,000 rows or 1 million rows and you are only showing lets say 25 rows at a time, you probably will end up overkilling your application by fetching all the data. Just caching all rows on middle tier wont make sense if your data is real time one. Once the data is send back to the client client may want to see next 25 records or provide another filter criteria to reduce the rows retrieved or whatever. Plus with web apps you will never know, how long it will take client to decide to click next. He may he may not. How long will you keep cached data on middle tier ? What happens when some other users modifies the data in the DB ? You will have stale copy of data. Now add 100 users simulteneously quering same data... # of copies on middle tier will increase linearly.

Bear in his message told the way you can do it in POSTgreSQL. In MS Sql server I think there's a TOP N clause. In oracle you can try using Analytical functions rank () etc. or.. other way is to use inline views..

SELECT * FROM
( SELECT rownum r, a.* r FROM
( <YOUR SQL Statement Here Including WHERE GROUP ORDER BY> A
WHERE rownum < :toRow)
WHERE r > :fromRow
[ November 08, 2004: Message edited by: nilesh Katakkar ]
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 33703
316
Eclipse IDE Java VI Editor
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nobu,
Bear is definitely correct that it is best to get less rows.

Note that Nilesh's query still needs to do the whole query. It just returns less data. This can still be a big savings, but limit is often more efficient (if supported.)
 
Rudy Simon Yeung
Greenhorn
Posts: 15
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Caching and memory are very expensive. If possible, I would also suggest you setting up validation rules to narrow down the search. This would reduce the number of records returned.

Regards,

Rudy
 
Adeel Ansari
Ranch Hand
Posts: 2874
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by adeel ansari:
first one is fine and right.
second one didn't make sense to me. sorry.


i think i was lost somewhere while replying this.
pardon. was really not in my senses though.

second one is far-better, as Bear and Jeanne said. i used to suggest the same thing, but dont know where were me. anyways thanks for all the comments to wake me up.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic