If a DB table is 3TB in size and read speed is 10mbyte/sec. Does it mean that 83hours are necessary to create new index?

All we need is an easy explanation of the problem, so here it is.

One of our custom log tables is 3TB in size. We need to create a new index. Read speed on the disk itself is 10mbyte/sec.

Does this mean that the index creation process will be 3.000.000/10 = 86 hours? The index is written to another filegroup on a fast disk so write time is not a factor.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

It’s more complicated than that. When you create an index (B-Tree), not only does all the data the index is for, needs to be read, some of it is read multiple times as it is converted to a B-Tree. This is because SQL Server specifically uses a balanced B-Tree. I’m not sure if there’s any documentation on Microsoft’s exact algorithm under the hood but this article discusses one practical way of converting an unordered set of data to a balanced B-Tree, which is a two step process.

Additionally if any re-sorting occurs during creation, that’ll also add more time to the creation of the index. One example where this can happen, is if you’re doing an online index build operation in SQL Server, and new rows are added to the table during the operation.

As others have pointed out, 10MB/s is a suspiciously slow hard drive. For reference, currently the slowest EBS storage on AWS averages around 65 MB/s (Previous Gen – between 40–90 MiB/s) and I don’t think regular hard drives have been as slow as 10 MB/s since the 90s (though I couldn’t find any sources for this).

Depending on the purpose of your index, perhaps a filtered index would be useful to you, if you don’t need to index all rows. It theoretically should create faster since it only indexes a subset of the data based on the filter applied in the definition. This is useful in cases where you mainly query only a subset of the data, e.g. everything since 2015 or whatever static criteria you want to define in the filter. It also saves on space since less data is being indexed. But also please be aware of the limitations as well, which Brent Ozar discusses in What You Can (and Can’t) Do With Filtered Indexes.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply