All we need is an easy explanation of the problem, so here it is.
Assume that there is a table
with these data:
If the data of the row date is empty color is not picked-up yet.
The information overview i need to view on website php page is:
Picked-up of RED: 2
Total RED: 2
Picked-up of Yellow: 0
Total Yellow: 1
Picked-up of Pink: 3
Total Pink: 3
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Hi and welcome to the forum!
Your schema requires a totally radical overhaul. This image gives some idea of what’s wrong with it:
Also, and very important, you should store
DATEs – you will see this discussed when you read around EAV – data types go out the window and this means that the query optimiser can’t generate an optimal plan. This applies to other types as well.
You can see this effect here:
CREATE TABLE date_test (x VARCHAR(10));
INSERT INTO date_test VALUES ('15-05-2021'), ('25-10-2000');
SELECT * FROM date_test ORDER BY x;
x 15-05-2021 -- <<==== WRONG order! 25-10-2000
One of the reasons that correct data types are important! Even if you are obliged to store dates as strings (I realise that programmers are subject to constraints), you should use the ISO date format (i.e. YYYY-MM-DD).
TL;DR – the answer!
If you just want the answer, then it’s:
SELECT dt.color AS "Color", COUNT(dt.c_date) AS "Collected", COUNT(*) AS "Total" FROM demo_tab dt GROUP BY dt.color ORDER BY dt.color;
Color Collected Total PINK 3 3 RED 2 2 YELLOW 1 2
It also has the advantage of being the fastest query of those proposed – see the
PROFILING section below!
Tackling the schema:
What you should do is (see fiddle here – adapted from @Akina’s – I added a collected garment of yellow colour. I don’t know what’s going on with the
id column and I’ve ignored it hereafter).
So, you create a
VIEW – if you can’t change the schema due to other constraints – I appreciate that DBAs/programmers don’t always have (complete) control over their environments.
CREATE VIEW demo_view AS ( SELECT x.user_id AS "uid", x.data AS "color", CAST( CONCAT ( SUBSTRING(y.data, 7, 10), SUBSTRING(y.data, 3, 4), SUBSTRING(y.data, 1, 2) ) AS DATE ) AS c_date FROM demo x JOIN demo y ON x.user_id = y.user_id AND x.title = 'COLOR' AND y.title = 'DATE' );
CREATE TABLE demo_tab AS ( SELECT * FROM demo_view -- -- Could also construct it as above! -- );
Then your query on the table (could be on the view also – see fiddle) will be as simple as:
SELECT dt.color AS "Color", SUM(CASE WHEN dt.c_date IS NOT NULL THEN 1 ELSE 0 END ) AS "Collected", COUNT(*) AS "Total" FROM demo_tab dt GROUP BY dt.color;
Result (identical for all queries):
Color Collected Total RED 2 2 YELLOW 1 2 PINK 3 3
Now, why, you may ask am I using the "complex"
SUM(CASE... instead of the relatively "easy"
SUM(t1.data IS NOT NULL)... of Akina’s answer here. If you try and run his fiddle using PostgreSQL (see here), it fails. The
SUM(... IS NOT NULL) construct is MySQL specific whereas the other works with PostgreSQL, SQLServer and MySQL! If in doubt – test on PostgreSQL – if it doesn’t work there, there’s usually something fishy going on!
If you want to (or are obliged to) persist with the schema as it currently is, then run this:
SELECT t2.data AS "Color", SUM(CASE WHEN t1.data IS NOT NULL THEN 1 ELSE 0 END ) AS "Collected", COUNT(*) Total, CAST(SUM(CASE WHEN t1.data IS NOT NULL THEN 1 ELSE 0 END ) AS FLOAT) * 100 / COUNT(*) AS "% collected" FROM demo t1 JOIN demo t2 ON t1.user_id = t2.user_id WHERE t1.title = 'DATE' AND t2.title = 'COLOR' GROUP BY t2.data ORDER BY 1;
Result – added a percentage collected field to it!
Color Collected Total % collected PINK 3 3 100 RED 2 2 100 YELLOW 1 2 50
I did a performance analysis – see fiddle – using profiling – check the manual:
SET PROFILING = 1;
Run your queries
Result (typical one shown – ran several times – may vary! Queries may be truncated):
Query_ID Duration Query 1 0.00056325 SELECT dt.color AS "Color", COUNT(dt.c_date) AS "Collected", COUNT(*) AS "Total" FROM demo_tab dt GROUP BY dt.color ORDER BY dt.color Query_ID Duration Query 2 0.00064050 SELECT dt.color AS "Color", SUM(CASE WHEN dt.c_date IS NOT NULL THEN 1 ELSE 0 END ) AS "Collected", COUNT(*) AS "Total" FROM demo_tab dt GROUP BY dt.color 3 0.00074950 SELECT t2.data AS "Color", SUM(CASE WHEN t1.data IS NOT NULL THEN 1 ELSE 0 END ) AS "Collected", COUNT(*) Total, CAST(SUM(CASE WHEN t1.data IS NOT NULL THEN 1 ELSE 0 END ) AS FLOAT) * 100 / COUNT(*) AS "% collected" FROM demo t1 JOIN demo t2 ON t1.user_id = 4 0.00075000 SELECT v.color AS "Color", SUM(CASE WHEN v.c_date IS NOT NULL THEN 1 ELSE 0 END ) AS "Collected", COUNT(*) AS "Total" FROM demo_view v GROUP BY v.color
SET PROFILING = 0;
The table-based query very frequently has the smallest duration (~ 90% of the time – we don’t know what’s going on with the rest of the dbfiddle server). The query plan for the table-based query is much shorter (
EXPLAIN ANALYZE – see fiddle) than the others – good rule of thumb, the shorter the plan, the faster the query!
If you want to use standard SQL (and you should!), you can use this command (see bottom of fiddle) – you can issue the
SET sql_mode='ANSI'; command – from the aforementioned Bill Karwin’s answer here.
user_id title data w x 1 DATE 01-01-2021 2021-01-01 2021-01-01 2 DATE NULL NULL NULL 3 DATE 04-01-2021 2021-01-04 2021-01-04 4 DATE 04-01-2021 2021-01-04 2021-01-04 5 DATE 02-01-2021 2021-01-02 2021-01-02 6 DATE 08-01-2021 2021-01-08 2021-01-08 7 DATE 07-07-2027 2027-07-07 2027-07-07
I leave it as an exercise for the reader to put what are now valid dates into the SQL above – it should work the exact same as the MySQL specific construct – you can see this in the PostgreSQL fiddle here.
I also leave it up to you to put in indexes on the necessary fields for large datasets – if you’re having problems, get back to us here with a new question containing a link to the this one!
EXPLAIN ANALYZE should give a few hints in this regard!
SELECT t2.data Color, SUM(t1.data IS NOT NULL) PickedUp, COUNT(*) Total FROM demo t1 JOIN demo t2 USING (userID) WHERE t1.title = 'DATE' AND t2.title = 'COLOR' GROUP BY t2.data;
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂