Systems Engineering and RDBMS

More on Rebuild/Reorganize: Deletion of data and reuse of the space

Posted by decipherinfosys on February 8, 2008

Yesterday, we had discussed the differences between Index Rebuilds and Index Reorganize. In this post, we will talk about how/when the data space created by the deletion of the data gets re-used and when it does not get re-used. That could also effect your decision of when to do the rebuild/re-organize operation. When data is deleted from a table, there is space that gets created on those pages. However, when a subsequent insert of the same data is made, then that same space gets re-used by the system.

Let’s see that with an example by creating a dummy table:

CREATE TABLE DECIPHER_DATA (COL1 INT, COL2 NVARCHAR(500));

And let’s look at the meta-data:

SELECT
alloc_unit_type_desc
,page_count
,avg_page_space_used_in_percent
,record_count
,index_level
,index_depth
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N’dbo.DECIPHER_DATA’), NULL, NULL, ‘Detailed’);

alloc_unit_type_desc                                         page_count           avg_page_space_used_in_percent record_count         index_level index_depth

 ------------------------------------------------------------ -------------------- ------------------------------ -------------------- ----------- -----------

 IN_ROW_DATA                                                  0                    0                              0                    0           0

There is no data in the table, so that is what is being displayed above as well when we queried the DMV. Now, let’s insert some data using a loop:

SET NOCOUNT ON
GO
DECLARE @I INT
SET @I = 1
WHILE (@I <= 500)
BEGIN
INSERT INTO dbo.DECIPHER_DATA (COL1, COL2) VALUES (@I, REPLICATE(‘A’, @I))
SET @I = @I + 1
END

And now, when we execute the same query again:

alloc_unit_type_desc                                         page_count           avg_page_space_used_in_percent record_count         index_level index_depth

 ------------------------------------------------------------ -------------------- ------------------------------ -------------------- ----------- -----------

 IN_ROW_DATA                                                  46                   71.4184457622931               500                  0           1

Let’s also create an unique index on this now:

CREATE UNIQUE INDEX PK_DECIPHER_DATA ON DECIPHER_DATA (COL1);
UPDATE STATISTICS DECIPHER_DATA WITH FULLSCAN, ALL;

The execution of the same command now gives:

alloc_unit_type_desc                                         page_count           avg_page_space_used_in_percent record_count         index_level index_depth

 ------------------------------------------------------------ -------------------- ------------------------------ -------------------- ----------- -----------

 IN_ROW_DATA                                                  46                   71.4184457622931               500                  0           1

 IN_ROW_DATA                                                  3                    55.5720286632073               500                  0           2

 IN_ROW_DATA                                                  1                    0.370644922164566              3                    1           2

You can see the additional recordsa above that are now displayed for the index. Let’s delete the data now and see how many pages do we have left.

DELETE FROM DECIPHER_DATA WHERE COL1 <= 250;
UPDATE STATISTICS DECIPHER_DATA WITH FULLSCAN, ALL;

And now, the meta-data shows:

alloc_unit_type_desc                                         page_count           avg_page_space_used_in_percent record_count         index_level index_depth

 ------------------------------------------------------------ -------------------- ------------------------------ -------------------- ----------- -----------

 IN_ROW_DATA                                                  46                   52.6176179886336               250                  0           1

 IN_ROW_DATA                                                  3                    18.6393254262417               250                  0           2

 IN_ROW_DATA                                                  1                    0.568322213985668              2                    1           2

As you can see from above, the page_count remains the same but the record_count is now reflecting the delete as well. Now, let’s insert the same data again to see whether the same space will get

used or will we see an increase in the page_count:

SET NOCOUNT ON
GO
DECLARE @I INT
SET @I = 1
WHILE (@I <= 250)
BEGIN
INSERT INTO dbo.DECIPHER_DATA (COL1, COL2) VALUES (@I, REPLICATE(‘A’, @I))
SET @I = @I + 1
END

UPDATE STATISTICS DECIPHER_DATA WITH FULLSCAN, ALL;

alloc_unit_type_desc                                         page_count           avg_page_space_used_in_percent record_count         index_level index_depth

 ------------------------------------------------------------ -------------------- ------------------------------ -------------------- ----------- -----------

 IN_ROW_DATA                                                  46                   71.4496046454164               500                  0           1

 IN_ROW_DATA                                                  3                    51.4537558685446               500                  0           2

 IN_ROW_DATA                                                  1                    0.568322213985668              3                    1           2

As you can see from above, the space was re-used. The page_count as well as the record_count is the same as before which means that the space that was created by the delete, since we inserted the same data again, the same space got re-used. And now, if we try to insert more data, we should see an increase:

SET NOCOUNT ON
GO
DECLARE @I INT
SET @I = 501
WHILE (@I <= 1000)
BEGIN
INSERT INTO dbo.DECIPHER_DATA (COL1, COL2) VALUES (@I, SUBSTRING(REPLICATE(‘A’, @I), 1, 500))
SET @I = @I + 1
END

UPDATE STATISTICS DECIPHER_DATA WITH FULLSCAN, ALL;

alloc_unit_type_desc                                         page_count           avg_page_space_used_in_percent record_count         index_level index_depth

 ------------------------------------------------------------ -------------------- ------------------------------ -------------------- ----------- -----------

 IN_ROW_DATA                                                  116                  83.2163948603904               1000                 0           1

 IN_ROW_DATA                                                  5                    70.39782554979                 1000                 0           2

 IN_ROW_DATA                                                  1                    0.963676797627873              5                    1           2

The page_count as well as the record_count has increased now.

There are a couple of key things to understand here. One, that if a delete operation happens and you have a chance of re-using that deleted space in subsequent operations, there is no need to do the rebuild of the index. However, if there are going to be holes created because of the deletes that are never going to be filled up, then you should do the reorganize/rebuild to reclaim that space and reduce fragmentation. As an example: Suppose we have a system in which there is Employee information. EMP_MASTER_ID is a sequentially created number using IDENTITY (or a SEQUENCE in the case of Oracle/DB2), then if there are deletes from this table, then these holes will never get filled because the new employees will be assigned new ID values which will be of a higher number. Now, consider that there is an index on EMP_FIRST_NAME and suppose we have data like:

Matt
Rocky
Nancy
.
.
.

And say that Rocky leaves the company and the record is now removed from this table and is archived. There is a gap (hole) now. And say we hire someone called Ray now – when we make the entry into the system, this entry will make use of the space in the index that was left vacant when Rocky left the company. So, bottom line is that you need to understand your data and how the indexes are used. In one scenario above (EMP_MASTER_ID), one should look for a particular threshold and then rebuild/reorganize the index while in the other case (EMP_FIRST_NAME), one should probably not rebuild it. There is no reason to get rid of that space only to re-use it at a later stage. So, if there is a good probability that the data that you are indexing can re-use the slots, rebuilding can actually slow one down over a period of time since it takes time to split the pages. With empty slots, the chances of splitting are less (this is also controlled by the fillfactor in SQL Server). But like any other advise, please benchmark this in your environment. Capture the performance statistics, IO’s etc. against the index and then after rebuilding compare the results again. If you do not see any performance advantage, that defeats the rebuild under such scenarios.

Sorry, the comment form is closed at this time.

 
%d bloggers like this: