Find Max() updated time for a group of users

All we need is an easy explanation of the problem, so here it is.

Database is MariaDB 10.3.25

I have 2 columns that are relevant:

UserID and LastUpdate

UserID consists of [email protected]
LastUpdate is a date field.

Here is my issue – I have the current query:

select a.UserID, 
substring_index(a.UserID, '@', -1), 
max(a.lastupdate) 
from MyTable a 
group by a.UserID
having max(a.lastupdate) < '2020-03-31' 

This shows all the Users that haven’t updated in just over a year and the domain. However there is the following scenario that I want to account for:

UserID LastUpdate
[email protected] 2020-08-16
[email protected] 2019-05-16
[email protected] 2021-05-05

With the current query, [email protected] will be captured, and therefore domain A.com will be captured as not in use, but the user [email protected] is still active – I want to do a Max(a.lastupdate) but have it grouped by substring_index(a.UserID, '@', -1)
but also spit out all the users for that domain.
I’m sure the answer is starring me in the face…

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

The best way to do this is probably the following. This solution makes use of the GENERATED COLUMN functionality of MariaDB >= 10.3 (also in MySQL >= 5.7) – it’s really very handy for queries like this – it also makes the query much more readable.

All the code below is available on the fiddle here:

CREATE TABLE login
(
  user_id VARCHAR (255) NOT NULL PRIMARY KEY,
  last_login TIMESTAMP NOT NULL,
  domain_name VARCHAR (255) 
    GENERATED ALWAYS AS ((SUBSTRING_INDEX(user_id, '@', -1))) VIRTUAL,  -- can be STORED

  INDEX (last_login),  -- indexing up to you if you have lots of data...
  INDEX (domain_name)
);

And then some sample records:

INSERT INTO login (user_id, last_login) 
VALUES 
('[email protected]', '2021-03-06'), 
('[email protected]', '2021-03-06'), 
('[email protected]', '2021-03-06'), 
('[email protected]', '2021-03-06'), 
('[email protected]', '2021-03-06'),
('[email protected]', '2020-03-20'), 
('[email protected]', '2020-03-15'), 
('[email protected]', '2020-03-14'), 
('[email protected]', '2020-02-12'), 
('[email protected]', '2020-01-31');

And then run the query:

SELECT * FROM 
(
  SELECT domain_name, MAX(last_login) AS last_login_by_domain
  FROM login
  GROUP BY domain_name
) AS tab
WHERE last_login_by_domain < DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR)
ORDER BY last_login_by_domain DESC;

-- DESC because you might want to deal with recent ones more urgently?
-- renew subscription... whatever  

Result:

domain_name   last_login_by_domain       
     xyz.ie    2020-03-20 00:00:00

This works on all versions of MariaDB >= 10.3 from dbfiddle.uk (a great resource) and also on versions of MySQL >= 5.7.

It also works when MySQL’s ONLY_FULL_GROUP_BY set! This is very important – if that variable is not set, queries can (and will – also see here and here) return erroneous results (see my comment to the other answer to this question here).

For an example, just look at the bottom of the fiddle, where I’ve pointed out the problem that arose with the other answer in this (relatively simple) case – in a complex statement, this issue can lead to all sorts of hard to find bugs – caveat emptor!

EDIT: After looking at the comments on the question (fiddle):

SELECT user_id 
FROM login
WHERE domain_name IN
(
  SELECT domain_name FROM
  (
    SELECT 
      domain_name, MAX(last_login) AS last_login_by_domain
    FROM login
    GROUP BY domain_name
  ) AS tab
  WHERE last_login_by_domain < DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR)
);

Result:

user_id
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Method 2

See if its what you want

-- this main query retrieve all users under the subquery domain
SELECT a.UserID
    , substring_index(a.UserID, '@', -1)
    , MAX(a.lastupdate)
FROM MyTable a
WHERE substring_index(a.UserID, '@', -1) IN (

    -- this subquery search all domain inactive in last 12 months
    SELECT substring_index(a.UserID, '@', -1) AS domain
    FROM MyTable a
    GROUP BY substring_index(a.UserID, '@', -1)
    HAVING MAX(a.lastupdate) < '2020-03-31'

)
ORDER BY substring_index(a.UserID, '@', -1), a.lastupdate

Method 3

(The main Question is handled by subqueries, as seen in other Answers. Here are some side issues.)

INDEX(UserID, lastupdated)

will help performance.

And why not do

HAVING MAX(a.lastupdate) < NOW() - INTERVAL 1 YEAR

So that you don’t have to keep changing the query.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply