Systems Engineering and RDBMS

One more Paging Records Dilemma

Posted by decipherinfosys on September 4, 2007

We have blogged about the server side (DB side) paging of the record-sets before – you can read more on these at these links – link1 and link2. There are many other ways of addressing paging as well – one of those methods is the usage of CTEs (Common Table Expressions) and then using the data from the CTEs for retrieval of the records. CTEs are supported in all three major RDBMS (Oracle, MS SQL Server and DB2 LUW). However, that is not a highly performant solution due to the usage of multiple statements (we had shown that a single statement can help get the paging logic on the server side) and because as the queries become more complex and return a lot of data records, the CTE solution does not scale as well as the alternatives that we had presented before. We will look at CTE and the execution plans in one of our future posts. In this post, we will look at another common dilemma in paging of the record sets. One common requirement when paging the record-sets is to find out the total number of records that are being fetched by the query – this is done to display the x number of records and to form the page links. One common way that this is addressed by developers and DBAs is to issue a separate SQL statement using a count(*) over the query in order to get the total count of the records. Example: We will use the same example as we did for the links above:

select * from (
select x.*, RowNum row_num
from (select *, row_number() over (order by FirstName, LastName) as RowNum from big_test_tbl) x
where RowNum < 26
) as y where row_num >= 1
go

select count(*) from (select * from big_test_tbl) x

The count returned in this case is: 2,336,724

Not only is this slow, it can be wrong as well since the actual query and the execution of the count(*) query is done at two different points in time, this value could easily be wrong representation of the data due to the ongoing DML operations in the system. Another problem is the slowness of this method since the count(*) has to get to the last record for the result set and if the query returns a lot of records, this can be a problem. Remember that in paging, we are trying to optimize the query so that the first x records are returned back faster to us (read the COUNT(STOPKEY) optimization in one of the above links).

So, what could be the potential design changes or potential alternatives? Some that we have used successfully in our client engagements are given below. Not all are acceptable to all businesses in which case, we have to stick with the Count(*) approach.

Alternative #1: When a user runs a query or a search via the web application, you are providing them with this information:

a) The query ran successfully,

b) Here are the top N records (where N is the pagesize that was requested by the user – this is typically configurable per user in a good designed application),

c) If there are N+1 records, you show the “Next” link.

i.e. in this case, you do not tell the end user that you got 1 million records for this search. Chances of an end user going to more than the 5th or the 6th page are very less. They would typically refine their search criteria.

So, If the display of the count of records is not important, one can show the next button by doing this logic: When retrieving the records for the page, one can fetch an extra record and if the record exists, then use it as a flag to show the “next” link for paging. And do not print that extra row on that screen.  That way, you can easily page through to the first page, previous page and the next page at any time.  And there is no hit incurred for doing the counts.

Alternative #2: If one does need to show the number of records that a query qualifies, then one can use guesstimates. What we mean by that is that if you go to google.com and search for “Microsoft”, you will get an approximation – at the top right corner, you will see:

Results 110 of about 605,000,000 for Microsoft.

The keyword in the above statement is “about“. It is a guesstimate since till you get to the last record, there is no way that one can tell how many records were there since those can easily change between page clicks as well. You would have noticed it many times – at times, google results indicate that there are “1-10 of about 100” and if you click on page 10, you would be re-directed to say page 8 since it figures out that there were say only 85 hits. So, how can one do a guesstimate? In Oracle, you can use Oracle Text functionality (using the text index) and use “ctx_query.count_hits” in order to get that guesstimate. It is better than doing count(*) since it is less resource intensive. Alternatively, in Oracle you can also get the estimated cardinality from v$sql_plan view as well. Remember that these are just guesstimates and not real numbers.

Sorry, the comment form is closed at this time.

 
%d bloggers like this: