Space consumed by a column in non clustered index

All we need is an easy explanation of the problem, so here it is.

I m planning to add a key column index in one of the existing non clustered index in a big table which have millions of rows rather than create a new non clustered index. Is that any way for me to know how much the extra space consumed by the index by adding a new key column in existing non clustered index and is that there is any data/index ratio best practices, as I know its better not to have more than 5 index per table.

Just to be clear, I need to explain to storage team extra space that consumed by index after add the new column , then only I can add the column to existing index.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

It depends on the size of the values in the column you are adding. As David suggests the most accurate way of knowing is to create an index in a dev or test environment and see what effect it has there.

You can estimate though. If the added column is 8 bytes long (a datetime column for instance) and there are 100M rows, then you can expect it to add approximately 800,000,000 bytes to the index’s leaf pages. If it is a variable width column then you need to estimate from likely data lengths, or if you can run a query against the production DB you can read it from real data using SELECT SUM(DATALENGTH(ColumnBeingAddedToIndex)) FROM TheTable.

This only accounts for the extra data added to the leaf pages in the index, but it should be accurate enough as an estimate as that will be by far the largest factor. There will be a little extra space taken by non-leaf pages too.

Also the above does not take compression into account, if that is enabled for your index. Compressed data can be much more difficult to model, so in that case you are back to testing by creating the index on realistic data as the only really accurate way to go.

Without knowing your table/index definitions it is not possible to give a more precise answer.

Method 2

Is that any way for me to know how much the extra space consumed by the index by adding a new key column

Create a new index and check its size:

select schema_name(t.schema_id) schema_name,
       t.name table_name, 
       i.name index_name, 
       ps.used_page_count / 8. / 1024 size_mb,
       row_count,
       row_count / nullif(ps.used_page_count,0) rows_per_page
from sys.tables t
join sys.indexes i
  on t.object_id = i.object_id
join sys.dm_db_partition_stats ps
  on i.object_id = ps.object_id
 and i.index_id = ps.index_id
 order by table_name, index_name

is that there is any data/index ratio best practices, as I know its better not to have more than 5 index per table

Start with indexes for your candidate keys and foreign keys, then add additional indexes as necessary for query performance, monitoring both missing indexes and unused indexes.

Method 3

If you can’t create a new index or alter the index in a lower environment, then you can try taking a representative sample, say 100,000 and dumping them into a new table that you then create your index on this smaller sample. If your primary concern is space, then use the smallest representative sample that you feel comfortable with. Keep in mind: the smaller the sample, the poorer your final estimate will be.

Once you have the size of the index in your representative table created then you can ballpark the size for the real table by multiplying by the number of rows in the real table and divide my the number of rows in your representative sample table.

But again, creating the index on a full sized table in a lower environment would be the best solution. Keep in mind that the space required will also change as the index becomes fragmented over time when updates, inserts, and deletes occur.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply