All we need is an easy explanation of the problem, so here it is.
I have the following tables:
Vehicles(v͟i͟n͟, model,category)
Sales(s͟a͟l͟e͟I͟D͟, staffID,customerID,date)
vehicleSold(saleID,v͟i͟n͟,salePrice)
When I join these tables using:
select YEAR(Sales.saleDate)
, Vehicles.model
, count(Vehicles.model) 'Sold'
, Vehicles.category
from Vehicles
JOIN vehicleSold
on Vehicles.vin = vehicleSold.vin
JOIN Sales
on Sales.saleID = vehicleSold.saleID
group
by YEAR(Sales.saleDate)
, Vehicles.model
, Vehicles.category;
Result is:
+----------------------+-------------+------+----------------+
| YEAR(Sales.saleDate) | model | Sold | category |
+----------------------+-------------+------+----------------+
| 2020 | Altima | 1 | car |
| 2020 | Flying Spur | 2 | car |
| 2020 | Lifan E3 | 3 | Electric Moped |
| 2020 | Ridgeline | 2 | truck |
| 2020 | Shiver | 4 | motorbike |
+----------------------+-------------+------+----------------+
Out of this table I want to get the model that was most sold in a category. So, in this case I only want to return a 2020, Flying Spur, car as the only row in category car because it was the most sold in 2020 in its category. I tried using a subquery is MAX(COUNT(*)) but I guess that is not supported in mysql. If anyone could point out my mistake and has any idea how to do this then that would be big help!
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
This is a good example of why you should upgrade your servers regularly – this would be relatively trivial using window functions (see below) – however, it is also possible using version 5.6 of MySQL as follows (all the code below can also be found in this fiddle):
-
I did this using "pure" SQL – for solutions using MySQL variables, see here (quite a good site for queries generally), or do a search for "greatest-n-per-group MySQL 5.6" or similar terms.
-
Just to show how much easier this would be with window functions, take a look here – it’s reduced the SQL from 37 lines to 17 lines – the plan (I’ve used MySQL version 8.0.23 – from 8.0.16, you have EXPLAIN ANALYZE) is way more complex for the 5.6 (no window functions) solution – and the performance using profiling (8 runs – changed order) – 5.6 version takes 50% – 100% longer (sometimes more…)! The SQL is at the bottom of this answer – see the fiddle here!
Solution using SQL – no window functions (fiddle here):
Firstly, I put together the tables as per the question:
CREATE TABLE vehicle
(
vin VARCHAR (50) NOT NULL PRIMARY KEY,
model VARCHAR (30) NOT NULL,
category VARCHAR (30) NOT NULL
);
and:
CREATE TABLE sale
(
sale_id INTEGER NOT NULL PRIMARY KEY
-- nothing useful in sale - maybe rethink design?
);
and:
CREATE TABLE vehicle_sale
(
vs_vin VARCHAR (50) NOT NULL,
vs_sale_id INTEGER NOT NULL,
vs_price INTEGER NOT NULL,
CONSTRAINT vs_vin_fk FOREIGN KEY (vs_vin) REFERENCES vehicle (vin),
CONSTRAINT vs_s_id_fk FOREIGN KEY (vs_sale_id) REFERENCES sale (sale_id)
);
And populate them:
INSERT INTO vehicle VALUES
('v1', 'm1', 'car'),
('v2', 'm2', 'car'),
('v3', 'm2', 'car'),
('v4', 'm2', 'car'),
('v5', 'm5', 'elmo'),
('v6', 'm6', 'elmo'),
('v7', 'm6', 'elmo'),
('v8', 'm6', 'elmo'),
('v9', 'm9', 'truck'),
('v10', 'm10', 'truck'),
('v11', 'm10', 'truck'),
('v12', 'm10', 'truck'),
('v13', 'm10', 'truck'),
('v14', 'm10', 'truck');
INSERT INTO sale
VALUES
(1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (12), (13), (14);
INSERT INTO vehicle_sale
VALUES
('v1', 1, 100), -- m1
('v2', 2, 200), -- m2
('v3', 3, 300), -- m2
('v4', 4, 400), -- m2
('v5', 5, 500), -- m5
('v6', 6, 600), -- m6
('v7', 7, 700), -- m6
('v9', 9, 800), -- m9
('v12', 12, 900), -- m10
('v13', 13, 1000), -- m10
('v14', 14, 1100); -- m10
So, I have done some exploratory analysis queries – just to show the process – can be skipped if you’re experienced…
First SQL:
SELECT
v.vin, v.model, v.category,
s.sale_id,
vs.vs_vin, vs.vs_sale_id, vs_price
FROM
sale s
INNER JOIN vehicle_sale vs
ON s.sale_id = vs.vs_sale_id
INNER JOIN vehicle v
ON vs.vs_vin = v.vin;
Result:
vin model category sale_id vs_vin vs_sale_id vs_price
v1 m1 car 1 v1 1 100
v2 m2 car 2 v2 2 200
v3 m2 car 3 v3 3 300
v4 m2 car 4 v4 4 400
v5 m5 elmo 5 v5 5 500
v6 m6 elmo 6 v6 6 600
v7 m6 elmo 7 v7 7 700
v9 m9 truck 9 v9 9 800
v12 m10 truck 12 v12 12 900
v13 m10 truck 13 v13 13 1000
v14 m10 truck 14 v14 14 1100
Now, I don’t really understand why you have a vehicle_sale and a sale table – there is a complete 1-to-1 correspondance between them. Therefore, one of them is surplus to requirements. It also means that the SQL becomes quite messy because we always have to join the vehicle table to the vehicle_sale table through the sale table – requiring more JOINing and more subqueries – but anyway, it can still be done as follows:
There’s a couple more queries in the fiddle – you can ask about them if you need to, but our first requirement is a table (i.e. query result) of models with no. of that model sold as follows:
SELECT
v.category, v.model,
COUNT(v.model) AS cnt
FROM vehicle_sale vs
INNER JOIN vehicle v
ON vs.vs_vin = v.vin
INNER JOIN sale s
ON vs.vs_sale_id = s.sale_id
GROUP BY v.category, v.model
ORDER BY v.category, cnt DESC;
Result:
category model cnt
car m2 3
car m1 1
elmo m6 2
elmo m5 1
truck m10 3
truck m9 1
So, we see that in the car category, we have 3 sales of model m2 and 1 sale of model m1.
BUT, we need the greatest-n-per-group of this where n = 1 and the group = category/model. So, we have to do the following:
SELECT category, MAX(cnt) AS mcnt
FROM
(
SELECT
v.category, v.model,
COUNT(v.model) AS cnt
FROM vehicle_sale vs
INNER JOIN vehicle v
ON vs.vs_vin = v.vin
INNER JOIN sale s
ON vs.vs_sale_id = s.sale_id
GROUP BY v.category, v.model
ORDER BY v.category, cnt DESC
) AS a
GROUP BY category
ORDER BY category;
Result:
category mcnt
car 3
elmo 2
truck 3
We now have to join this result back to the table (i.e. query result above) containing the models to achieve our final result:
SELECT
a.category, a.model, a.cnt,
y.category, y.mcnt -- the y values are not required anymore! Left in for explanation
FROM
(
SELECT
v.category, v.model,
COUNT(v.model) AS cnt
FROM vehicle_sale vs
INNER JOIN vehicle v
ON vs.vs_vin = v.vin
INNER JOIN sale s
ON vs.vs_sale_id = s.sale_id
GROUP BY v.category, v.model
ORDER BY v.category, cnt DESC
) AS a
JOIN
(
SELECT category, MAX(cnt) AS mcnt
FROM
(
SELECT
v.category, v.model,
COUNT(v.model) AS cnt
FROM vehicle_sale vs
INNER JOIN vehicle v
ON vs.vs_vin = v.vin
INNER JOIN sale s
ON vs.vs_sale_id = s.sale_id
GROUP BY v.category, v.model
ORDER BY v.category, cnt DESC
) AS x
GROUP BY category
ORDER BY category
) AS y
ON a.category = y.category AND a.cnt = y.mcnt
ORDER BY a.category, a.cnt;
Result:
category model cnt category mcnt
car m2 3 car 3
elmo m6 2 elmo 2
truck m10 3 truck 3
Solution using window functions (fiddle here):
SELECT a.category, a.model, a.cnt
FROM
(
SELECT
v.category, v.model,
COUNT(v.model) AS cnt,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY COUNT(v.model) DESC) AS rn
FROM vehicle_sale vs
INNER JOIN vehicle v
ON vs.vs_vin = v.vin
INNER JOIN sale s
ON vs.vs_sale_id = s.sale_id
GROUP BY v.category, v.model
ORDER BY v.category, cnt DESC
) AS a
WHERE a.rn = 1
ORDER BY a.category;
Result: idem.
Method 2
You can use a window function like ROW_NUMBER()
to generate a unique ID for each record within a grouping (PARTITION
) of each value of category
like so:
SELECT SalesDateYear, model, Sold, category
FROM
(
select YEAR(Sales.saleDate) SalesDateYear
, Vehicles.model
, count(Vehicles.model) 'Sold'
, Vehicles.category
, ROW_NUMBER() OVER (PARTITION BY Vehicles.category ORDER BY COUNT(Vehicles.model) DESC) SortId
from Vehicles
JOIN vehicleSold
on Vehicles.vin = vehicleSold.vin
JOIN Sales
on Sales.saleID = vehicleSold.saleID
group
by YEAR(Sales.saleDate)
, Vehicles.model
, Vehicles.category
) Results
WHERE SortId = 1
(I normally like to use a CTE instead of subquery for these kinds of things, but I don’t think your version of MySQL has CTEs, so the above uses a subquery.)
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0