The choice – use CTAS or Delete rows
Posted by decipherinfosys on February 27, 2009
Last night while working on a client issue, we had a need to purge a large amount of data from a big table in the Oracle schema. Here is what we were dealing with:
- This was on Oracle 10gR2,
- A single table with a little over 20 million rows,
- Total size was close to 100GB,
- It also had one column with the BLOB data-type,
- It had 5 indexes and was also involved in foreign key relationships – 2 tables referencing it and it referencing 1 table,
- We needed to delete close to 5 million records because of an issue with the interface code that pumped those records into this table,
- We needed to reclaim the space as well, and
- We had a maintenance window of 59 minutes to do so since every hour, the scheduled job pumps in the data records into this table and then the processing logic starts.
Now, three thoughts came to mind –
a) We can use CTAS (Create Table AS), or
b) The regular delete operation (followed by ALTER TABLE MOVE & a lob move & index rebuild or followed by an online redef), or
c) We could: create a dummy table with the same structure, use direct path insert qualifying the records that we needed, create indexes, drop the old table and then re-name the dummy table and collect stats.
Partitioning was not an option here since this was an un-partitioned table – we will be changing that in the days to come. Other options like export/import and unloading the data and then using SQL *LDR, or using the COPY command were ruled out of consideration because of obvious reasons.
Now, when trying out these approaches on the staging system, the gut feeling was that the delete will be slower than the CTAS. However, when I tried this on their staging system (everything is a clone of the production environment – all the settings both at the Unix level as well as the Instance/Database level and the data is a clone from prior night), much to my surprise, I found that the delete for 5 million records was getting done in close to 45 minutes where as CTAS was taking over an hour. Of course, we had taken all the good practices (that we knew) into consideration like using parallel operation for CTAS as well as the nologging option. So, after the client DBAs and we tried out a couple of things, we placed a call to Oracle support to help understand what was going on here. Did take a look at the tkprof trace files including 10053 but nothing was conclusive.
Came to know that the issue was really because of the blob data-type column in the table. When we do a CTAS, it has to copy the blob from one point to the other. When we do a delete, it simply removes the pointer to the blob and makes the space available. And quoting the Oracle support team here: “The lobs are not stored in the rollback since they are versioned in the lob space and if you use nologging, then of course they do not generate redo.” So, essentially, the delete operation had a lesser amount of work to do as compared to the CTAS which is the reason for the time gap difference between the two approaches. Learnt something new. They also mentioned about the “Order BY” and using an index hint when doing the CTAS operation which would improve the speed. Ultimately, using a combination of those hints and the ORDER BY clause, we were able to bring the time with the CTAS down to under an hour and went with that approach.