There are frequent questions in the web app forums about pagination. When a user search yields many rows, show them page 1 with 1..10, page 2 with 2..20, etc. Nearly always somebody says this is frequently asked and suggests searching for previous replies. I did some of that, and found the number of solution roughly equals the number of replies, and found a few heated arguments as to their value. Could I invite some folks with solid experience to contribute what has really worked for them? I'll start by trying to categorize some of what I've found:
A few notes: Some factors depend heavily on user habits - eg if they rarely page forward or only to page 2 of 100, caching may not pay off. Cached results will not reflect recent changes. That may be good for one app, bad for another, or the changes may be rare enough to ignore. So waddya all think? Can we add more algorithms, fine tune the pros & cons to help ourselves choose in the future?
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Stan, This looks well researched. One thing I would add is that number two seems to have two parts. The cache could occur in the memory of the app server. It could also occur in a temporary database table (sort of like the table used to persist http sessions.)
Another tactic that I've employed is only available with certain DBMS systems. PostgreSQL for example provides the LIMIT and OFFSET clauses so that you can have the DB itself "slice" the dataset into pages for you. Pros: limits data xfer to and from 2nd tier, easy to program, efficient Cons: SQL statements are database-dependent bear
Been trying to post this for a few hours... Some other factors that effect the final decision are the number of requests returned, the likelihood viewing all results, and the behaviour of the data. Number of requests: if (based on sample or expected queries from users) the results are reasonably small, you may decide to push the load to the server rather than the database. Consider a shopping site (a few hundred items total) versus Google (how many billions?). Viewing all results: if you are in a situation where users are unlikely to view all results returned, only cache the ones they are likely to view. Imagine Google caching 1.7 million page results just in case you want them all. Unlikely. Behaviour of the data: If the results are unlikely to change, it removes on of the CONs for option 1. eg Shopping sites may only change on a monthly basis. There are also a couple of extra factors that can be looked at. One that I hinted at is that based on behaviour of your search, you might want to combine options to enhance performance. If the search size is extremely large, cache as much as you think is reasonable, then load the rest if required. You can also share the load between the server and the database by caching the primary key only and loading each item when it is required. Note also that the search at Google is a bit fuzzy - they don't actually know how many options are returned or how many pages they fill. Have you ever gotten to the second last page of a Google search and found that the link to the last page is no longer available? Another useful limiting factor is only allowing users to jump forward a maximum of 10 pages - this can be used to set the upper limit i you only want to cache part of the result. As you are obviously aware any decision must be based on plenty of factors, which is why I'm not a fan of people saying 'cache the results and load them into page1...10'. Dave
Thanks all for jumping in. Jeanne, I was hinting at a temp db cache with the "or something close to it". There are probably a few fast cache options just off main memory. Bear, we tried some row limiting options on huge queries, but they still buried the db because an ORDER BY clause forced the db to really complete the query even when we only got a few rows. Do the LIMIT and OFFSET type options have similar problems? Is there similar syntax in other db vendors?
Excellent well thought out question. I'll add a couple points. If possible I always prefer no caching. When to refresh a cache can be complex and so I try to avoid it. I have done a hybrid of the approaches (reexecuting the query and caching) when the query itself is expensive. For example instead of returning 1 page with my quer I return a 10 page buffer each time the query is executed. If the user goes past page 10 I reexecute the query and repopulate the buffer. I cache ResultSets by converting them to Object arrays. You can also use CachedRowSet objects, but for several reasons I prefer arrays. 1) You have access to all the array APIs such as sorting and searching. In several of my apps I allow users to sort the cached results by clicking on the column header. 2) The CachedRowSet is not available in all JDBC drivers for all DB vendors. 3) One Advantage of returning Object is all JDBC can be hidden and so you can get the ResultSet more simply than when using JDBC directly. The ResultSetConverter object below is part of my FormattedDataSet API and is a thin wrapper for an array for the ResultSetMetaData and another one for the ResultSet.
Also I rarely give the users the ability to go to a specific page number. Think about it this way if you have 500 pages what would ever make you want to go to page 343 directly? You would have no idea what was on that page. Usually the user wants to go to a page with some specific data on it (salaries over $100,000 for example) and a search capability would serve them better. The first and last pages may be exceptions, but going to numbered pages in general is usually pointless.