All we need is an easy explanation of the problem, so here it is.
Database is MariaDB 10.3.25
I have 2 columns that are relevant:
UserID and LastUpdate
UserID consists of [email protected]
LastUpdate is a date field.
Here is my issue – I have the current query:
select a.UserID,
substring_index(a.UserID, '@', -1),
max(a.lastupdate)
from MyTable a
group by a.UserID
having max(a.lastupdate) < '2020-03-31'
This shows all the Users that haven’t updated in just over a year and the domain. However there is the following scenario that I want to account for:
UserID LastUpdate
[email protected] 2020-08-16
[email protected] 2019-05-16
[email protected] 2021-05-05
With the current query, [email protected] will be captured, and therefore domain A.com will be captured as not in use, but the user [email protected] is still active – I want to do a Max(a.lastupdate) but have it grouped by substring_index(a.UserID, '@', -1)
but also spit out all the users for that domain.
I’m sure the answer is starring me in the face…
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
The best way to do this is probably the following. This solution makes use of the GENERATED COLUMN
functionality of MariaDB >= 10.3 (also in MySQL >= 5.7) – it’s really very handy for queries like this – it also makes the query much more readable.
All the code below is available on the fiddle here:
CREATE TABLE login
(
user_id VARCHAR (255) NOT NULL PRIMARY KEY,
last_login TIMESTAMP NOT NULL,
domain_name VARCHAR (255)
GENERATED ALWAYS AS ((SUBSTRING_INDEX(user_id, '@', -1))) VIRTUAL, -- can be STORED
INDEX (last_login), -- indexing up to you if you have lots of data...
INDEX (domain_name)
);
And then some sample records:
INSERT INTO login (user_id, last_login)
VALUES
('[email protected]', '2021-03-06'),
('[email protected]', '2021-03-06'),
('[email protected]', '2021-03-06'),
('[email protected]', '2021-03-06'),
('[email protected]', '2021-03-06'),
('[email protected]', '2020-03-20'),
('[email protected]', '2020-03-15'),
('[email protected]', '2020-03-14'),
('[email protected]', '2020-02-12'),
('[email protected]', '2020-01-31');
And then run the query:
SELECT * FROM
(
SELECT domain_name, MAX(last_login) AS last_login_by_domain
FROM login
GROUP BY domain_name
) AS tab
WHERE last_login_by_domain < DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR)
ORDER BY last_login_by_domain DESC;
-- DESC because you might want to deal with recent ones more urgently?
-- renew subscription... whatever
Result:
domain_name last_login_by_domain
xyz.ie 2020-03-20 00:00:00
This works on all versions of MariaDB >= 10.3 from dbfiddle.uk (a great resource) and also on versions of MySQL >= 5.7.
It also works when MySQL’s ONLY_FULL_GROUP_BY set! This is very important – if that variable is not set, queries can (and will – also see here and here) return erroneous results (see my comment to the other answer to this question here).
For an example, just look at the bottom of the fiddle, where I’ve pointed out the problem that arose with the other answer in this (relatively simple) case – in a complex statement, this issue can lead to all sorts of hard to find bugs – caveat emptor!
EDIT: After looking at the comments on the question (fiddle):
SELECT user_id
FROM login
WHERE domain_name IN
(
SELECT domain_name FROM
(
SELECT
domain_name, MAX(last_login) AS last_login_by_domain
FROM login
GROUP BY domain_name
) AS tab
WHERE last_login_by_domain < DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR)
);
Result:
user_id
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Method 2
See if its what you want
-- this main query retrieve all users under the subquery domain
SELECT a.UserID
, substring_index(a.UserID, '@', -1)
, MAX(a.lastupdate)
FROM MyTable a
WHERE substring_index(a.UserID, '@', -1) IN (
-- this subquery search all domain inactive in last 12 months
SELECT substring_index(a.UserID, '@', -1) AS domain
FROM MyTable a
GROUP BY substring_index(a.UserID, '@', -1)
HAVING MAX(a.lastupdate) < '2020-03-31'
)
ORDER BY substring_index(a.UserID, '@', -1), a.lastupdate
Method 3
(The main Question is handled by subqueries, as seen in other Answers. Here are some side issues.)
INDEX(UserID, lastupdated)
will help performance.
And why not do
HAVING MAX(a.lastupdate) < NOW() - INTERVAL 1 YEAR
So that you don’t have to keep changing the query.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0