## All we need is an easy explanation of the problem, so here it is.

I have been doing some testing to try to better understand how SQL Server uses a histogram to estimate the number of rows that will match an equality predicate and also a < or > predicate

Given I am using the AdventureWorks2016 OLTP database

If can understand SQL Server’s estimation process for = and > predicates:

```
/* update stats with fullscan first */
UPDATE STATISTICS Production.TransactionHistory WITH FULLSCAN
```

Then I can see the histogram for the column `TransactionHistory.Quantity`

```
DBCC SHOW_STATISTICS (
'Production.TransactionHistory',
'Quantity')
```

The below screenshot is the top end of the histogram where I have run my tests:

The following query will estimate 6 rows as the value in the predicate is a RANGE_HI_KEY so uses the EQ_ROWS for that bucket:

```
SELECT *
FROM Production.TransactionHistory
WHERE Quantity = 2863
```

The following will estimate 1.36 rows as it is not a RANGE_HI_KEY so uses the AVG_RANGE_ROWS for the bucket it falls in:

```
SELECT *
FROM Production.TransactionHistory
WHERE Quantity = 2862
```

The following "greater than" query will estimate 130 rows which appears to be the sum of the RANGE_ROWS and the EQ_ROWS for all the buckets with a RANGE_HI_KEY > 2863

```
SELECT *
FROM Production.TransactionHistory
WHERE Quantity > 2863
```

A similar query below, but the value is not a RANGE_HI_KEY in the histogram. SQL Server again estimates 130 and appears to use the same method as above

```
SELECT *
FROM Production.TransactionHistory
WHERE Quantity > 2870
```

This all makes sense up to now so my testing moved onto a "less than" query

```
SELECT *
FROM Production.TransactionHistory
WHERE Quantity < 490
```

for this query, SQL Server estimates 109,579 rows but I can’t work out where it has got that from:

RANGE_HI_KEY + RANGE_ROWS of all buckets up to and including RANGE_HI_KEY 470 = 109,566 so we are 11 short somewhere.

How does SQL Server use the histogram to estimate the number of rows that will be returned by a "less than" predicate

## How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

### Method 1

for this query, SQL Server estimates

109,579 rowsbut I can’t work out where it has got that from:RANGE_HI_KEY + RANGE_ROWS of all buckets up to and including RANGE_HI_KEY 470 =

109,566so we are 11 short somewhere.

You’re **13 short**, not 11: 109,579 – 109,566 = 13.

The general idea, as shown in my related answer is to use linear interpolation within the partial step, assuming uniformity.

In your case:

So the question is how many of those 23 `RANGE_ROWS`

do we expect to match the predicate `< 490`

when they are assumed to be distributed uniformly within the histogram step with `RANGE_HI_KEY`

500:

```
DECLARE
@ARR float = 23e0 / 6e0, -- AVG_RANGE_ROWS
@DRR float = 6e0, -- DISTINCT_RANGE_ROWS
@PR float = 490 - 470, -- predicate range
@SR float = 499 - 470 -- whole step range (excluding high key)
SELECT (@DRR - 1) * ((@PR - 1) / @SR) / ((@SR - 1) / @SR) * @ARR;
```

This computation gives **13.00595**.

The `-1`

factors account for using `<`

which is assumed to exclude a `DISTINCT_RANGE_ROW`

row. When `<=`

is used, that row is assumed to match the predicate.

The whole thing is a modification of applying the fraction of the range you are asking for versus the range covered by the histogram step. Without excluding the unmatched value, it would be simply `@PR/@SR`

.

**Note: Use and implement method 1 because this method fully tested our system.Thank you 🙂**

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0