Non-clustered Primary Key and Clustered Index

All we need is an easy explanation of the problem, so here it is.

It is my understanding that in SQL Server, you can have a Primary Key that is non-clustered, and have another index that is the clustered one.

To me, this seems the same as just having a Primary Key, and an extra UNIQUE key.

So I have two questions:

  1. if a Primary Key is non-clustered, does it store all the columns with it? Or only the Primary Key columns and the columns referencing the clustered index?

  2. I’ve just read that if the PK isn’t the clustered index, then the clustered index does NOT have to be UNIQUE (but it’s highly encouraged). Does this mean then that the table could be "randomly sorted" on the rows with the same key?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

if a Primary Key is non-clustered, does it store all the columns with it?

Nope, that is a characteristic of the clustered index, not the primary key. A nonclustered primary key will only store the fields it is defined on plus the clustered index’s key fields.

…the clustered index does NOT have to be UNIQUE (but it’s highly encouraged). Does this mean then that the table could be "randomly sorted" on the rows with the same key?

The rows of the table will always be logically sorted by the clustered index fields. In the case where the clustered index is not unique, and there are two rows with the same values for the clustered index key, those rows are given a unique row identifier that is stored behind the scenes (this 4 byte uniquifier is only added to the duplicate key rows). This is the determining factor in how they’re sorted relative to each other.

One reason why one might choose to make the primary key a nonclustered index is because they find performance benefits in having the data logically sorted differently than the fields that identify uniqueness for the row (regardless if the clustered index is unique or not).

An example would be if a GUID was used in a UNIQUEIDENTIFIER column as the primary key of your table. GUIDs are great for uniqueness (most times) but aren’t usually a great way to keep your data sorted because of their random-like values. Instead you may have a natural set of fields in that table, that your queries typically join or filter on, that aren’t necessarily guaranteed to be unique, which instead make for a good clustered index then. Or even a set of fields you typically order on in your queries could be a candidate for a clustered index, to eliminate a heavy sort operation from the query plan. The data will then be sorted in an order that makes sense instead of by the semi-random values of a GUID.

Please see SQL Server Clustered Indexes internals with examples for more information on how clustered indexes work.

Method 2

if a Primary Key is non-clustered, does it store all the columns with
it? Or only the Primary Key columns and the columns referencing the
clustered index?

The index type, clustered or non-clustered, implictly determines the non-key columns stored in the b-tree index leaf nodes. This is true regardless of whether the index is unique or not, or supports a primary key or unique constraint.

A clustered index organizes the table itself as a b-tree index (in contrast with a heap when no clustered index exists). All columns are stored in the leaf nodes of a clustered index because those are the actual data pages/rows. The unique row locator is the clustered index key plus an internal uniqueifier for non-unique key values when applicable.

A non-clustered index stores the non-clustered index key columns, row-locator (clustered index key), plus included columns (if specified) in the index leaf nodes.

I’ve just read that if the PK isn’t the clustered index, then the
clustered index does NOT have to be UNIQUE (but it’s highly
encouraged). Does this mean then that the table could be "randomly
sorted" on the rows with the same key?

A table is logically ordered by the clustered index key (unique or not). Rows with the same clustered index key will be adjacent, not random.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply