How do shrink *right* when required?

All we need is an easy explanation of the problem, so here it is.

I’ve been reading all the articles (and comments) linked from Stop Shrinking Your Database Files. Seriously. Now. I understand that shrinking is bad.

What I have a hard time figuring out is how to do it right when its actually required.

The backstory is that I have inherited this archive DB with a handful of tables containing PDF’s that should have been expired after 5 years, but the procedures to do this on a schedule was never set up. I don’t have the row counts at the moment but it’s a few million, and about 3TB. I guesstimate about half of this will be deleted and never come back.

I’ve been talking to our infrastructure providers DBA, but I find the advice they give (delete, reindex then shrink) to be highly dubious. I’m a system consultant not a DBA, but it seems I get to play one this time :-/

Paul Randal recommends two approaches

1

  • Create a new filegroup
  • Move all affected tables and indexes into the new filegroup
    using the CREATE INDEX … WITH (DROP_EXISTING = ON) ON syntax, to move the
    tables and remove fragmentation from them at the same time
  • Drop the old filegroup that you were going to shrink anyway (or shrink it way down if its the primary filegroup)

Unfortunately I don’t have access to the actual server just SSMS, and this is quite a bit outside my comfort zone anyway. I’m not actually a DBA..

2

DBCC INDEXDEFRAG or ALTER INDEX … REORGANIZE.

Yes but how.. there are quite a few parameters to choose from, and this seems optimized to get the size absolutely minimal, and not what I need which is to get rid of the bloat while keeping the DB as healthy as possible.

In other words I don’t care if the DB grows some, it needs space to operate and it will grow some as document production increases year by year, but the nightly, weekly or monthly purging will now keep this in check.

I should also add that I can take this DB offline at night without much trouble.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

The recommendation from your Providers DBA is sufficient for your needs. I would add the additional step of checking to make sure you don’t have any heaps, however. When you delete records from Heaps the space is not always made available and you won’t have a good way of reorganizing it. If you can, add a clustered index to the heap before deleting (or after, dealers choice). If you cannot, I would delete, add a clustered index and then remove it.

SELECT T.name AS TableName
    , I.* FROM sys.indexes AS I
    INNER JOIN sys.tables AS T ON T.object_id = I.object_id
WHERE I.type_desc = 'HEAP'

So, my steps would be:

  • Check for Heaps and decide on course of action.
  • purge data (preferably in chunks), check-pointing or backing up the log as
    applicable.
  • defragment indexes (Ola Hallengren’s scripts will help
    here if you don’t have your own). Do this first so that pages can be readjusted to the correct fill factor. Otherwise, space used estimate will be off since a partially used page is counted as a full page and you will probably have lots of partially full pages that can be consolidated without impact.
  • Look at current database file size
    used. Shrink to appropriate size (I like to leave at least 20% empty
    space in the file.
  • Defragment your indexes again since the shrink moved everything around.
  • Take the time to evaluate file growth settings
    to make sure they make sense for your environment. Usually growing
    in fixed size rather than as a percentage and make sure you have
    instant file initialization turned on.

This will tell you how much space is actually being used in the file.

SELECT DB_NAME() AS DatabaseName
, F.name
, F.type_desc 
, F.state_desc 
, F.file_id 
, F.growth 
, F.max_size
, F.is_percent_growth
, F.physical_name
, F.size/128.0 AS CurrentSize_MB
, FILEPROPERTY(F.name, 'SpaceUsed')/128.0 AS SpaceUsed_MB
 FROM sys.database_files F

Method 2

I’ve been talking to our infrastructure providers DBA, but I find the advice they give (delete, reindex then shrink) to be highly dubious.

Here is why they’re likely recommending that sequence: a shrink operation can only reorganize and release completely empty 8k pages back to the OS. Before you can do that, you have to make sure that the tables and indexes have released their internal free space back to the database. That’s what the reindex (or defrag) does.

Think of a storage unit, where each “unit” is an 8k database page:

enter image description here

A shrink operation will only move entire units around, but a reindex operation can combine contents from partially empty units (from the same table or index, of course).

Now this depends a lot on the kind of deletes you are doing. If you are dropping or truncating entire tables or deleting vast swaths of contiguous rows, it is pretty likely that those pages are being completely emptied, and you can do the shrink right after the delete. (Followed, ideally, by a reindex to fix the fragmentation the shrink helped create.)

In other situations, that won’t be the case. Let’s say you are deleting every other row of a billion-row table. Chances are that each 8k data page still contains some rows for that table, and you won’t see any difference in the amount of reported free space in the data file until you reindex.

Another relevant example: I recently dropped a very large column in an existing table, and since each data page still had all the other columns, I didn’t see any difference in free space until I did a reindex.

In your case, I can’t know for sure, but since you are deleting huge binary values, it seems likely that many of those pages will be completely emptied. Just take a close look at your total internal free space after you do your delete.

Method 3

I think you need to start facing the real problem here:

archive DB with a handful of tables containing PDF’s

While compression won’t do much for you, is there any way to get rid of storing the PDF’s in the DB and replace the blob with a link to an on-disk location?

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply