SQL Server Select orders which have all of a set of items

All we need is an easy explanation of the problem, so here it is.

I have the following tables, Order and OrderLine:

Order:

 id | total
 ----------
  1 | 55.09
  2 | 62.42

OrderLine:

 order_id | line_number | item      | qty
 ----------------------------------------
  1       | 1           | Product A | 50
  1       | 2           | Product B | 15
  2       | 1           | Product A | 23

I am looking to construct a query that will select all Orders which contain both Product A and Product B.

Some caveats:

  1. There may be multiple lines that have Product A and Product B. For instance, there could be a third line in the order 1 that has Product A
  2. There can also be situations with many items that need to be requested, not just 2 as it in this case.
  3. I am also looking to query by total quantity. So, for example, only orders which have more than 20 units of Product A in total (across all lines).

My first thought was do an inner join per item, so something like:

SELECT
    T0.id
FROM
    Order T0
    INNER JOIN OrderLine T1 on T1.order_id = T0.id AND T1.ItemCode = 'Product A'
    INNER JOIN OrderLine T2 on T1.order_id = T0.id AND T2.ItemCode = 'Product B'
GROUP BY
    T0.id

However, I’m not sure how to extend this to have the ability to select based on total quantity. Possibly using SUM and HAVING?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

This is a typical example of Relational Division With Remainder, where the number of divisors is unknown.

There are a number of ways to cut this, but fundamentally, the key to solving this neatly is to put your input data into tabular form. This could be a temp table, table variable or Table Valued Parameter.

Here is one typical Relational Division solution. Note that the OUTER APPLY could also be done with a grouped LEFT JOIN.

DECLARE @input TABLE (item varchar(20) PRIMARY KEY, quantity int);
INSERT @input VALUES
('Product A',20),('Product B',NULL);

SELECT
  o.id
FROM Order_tbl o 
WHERE EXISTS (SELECT 1
    FROM @input i
    OUTER APPLY (
        SELECT ol.item
        FROM OrderLine ol
        WHERE ol.item = i.item AND o.id = ol.order_id
        GROUP BY
          ol.item
        HAVING SUM(ol.qty) > i.quantity OR i.quantity IS NULL
    ) ol
    HAVING COUNT(ol.item) = COUNT(*)
);

Another method is a double NOT EXISTS although this is commonly not efficient.

SELECT
  o.id
FROM Order_tbl o 
WHERE NOT EXISTS (SELECT 1
    FROM @input i
    WHERE NOT EXISTS (SELECT 1
        FROM OrderLine ol
        WHERE ol.item = i.item AND o.id = ol.order_id
        GROUP BY
          ol.item
        HAVING SUM(ol.qty) > i.quantity OR i.quantity IS NULL
    )
);

A final option, which is partially used by one of the other answers, is to pre-calculate the count of input values, and then use a normal join. This is often the most efficient.

Again the INNER JOIN could be a CROSS APPLY.

DECLARE @count int = (SELECT COUNT(*) FROM @input);

SELECT
  o.id
FROM Order_tbl o 
WHERE EXISTS (SELECT 1
    FROM (
        SELECT
          ol.item,
          ol.order_id,
          SUM(ol.qty) totalQty
        FROM OrderLine ol
        GROUP BY
          ol.item,
          ol.order_id
    ) ol
    JOIN @input i ON ol.item = i.item AND o.id = ol.order_id
    HAVING COUNT(*) = COUNT(CASE WHEN ol.totalQty > i.quantity OR i.quantity IS NULL THEN 1 END)
       AND COUNT(*) = @count
);

db<>fiddle

Method 2

You can use having count with sum.

Try:

select o.id,o.total
from Order_tbl o 
inner join ( select order_id
             from OrderLine
             where item in ('Product A','Product B')
             group by order_id
             having count(distinct item)=2  and  sum(case when item = 'Product A' then qty end ) > 20
            ) as ol on o.id=ol.order_id;

having count(distinct item)=2 will select only the order_id which have both Product A and Product B values. If you would add new products , include the new one on the where condition and increase 2 with the total number of products .

sum(case when item = 'Product A' then qty end ) > 20 will return only the records where total SUM of Product A > 20

Testing fiddle

Method 3

use a GROUP BY query and place the required condition in the HAVING clause

Condition : Orders which contain both Product A and Product B

having sum(case when l.item = 'Product A' then 1 else 0 end) > 0
and    sum(case when l.item = 'Product B' then 1 else 0 end) > 0

Condition : Orders which have more than 20 units of Product A in total (across all lines).

sum(case when l.item = 'Product A' then qty else 0 end) >= 20

Query:

select o.id
from   [Order] o
       inner join [OrderLine] l on o.id = l.order_id
group by o.id
having sum(case when l.item = 'Product A' then 1 else 0 end) > 0
and    sum(case when l.item = 'Product B' then 1 else 0 end) > 0
and    sum(case when l.item = 'Product A' then qty else 0 end) >= 20

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply