Delete duplicate rows leaving recent 10 using mysql

All we need is an easy explanation of the problem, so here it is.

I have a table containing multiple fields but important fields (to be used in query) are

phone_number varchar(20),
entry_date datetime

This table contains records in millions and each phone_number have multiple entries. I want to delete phone_numbers having multiple entries but want to keep last 10 for each phone_number

Example:

SELECT phone_number,COUNT(*) 
FROM test_table 
GROUP BY phone_number 
HAVING COUNT(*) > 10 
ORDER BY COUNT(*) DESC;

Result:

Delete duplicate rows leaving recent 10 using mysql

Requirement:

I want a query that reduce all these counts to 10 but those must be recent 10 records for each phone_number

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

in addition to FaNo_FN‘s answer.

If the percent of rows to be deleted is high it can be faster to copy the rows to be saved into new table and replace source table with new one.

For MySQL 5.6 it can be performed with the query

INSERT INTO new_test
SELECT *
FROM ( SELECT CASE WHEN (@count := ((@phone = phone) * @count + 1)) > 3
                   THEN NULL
                   ELSE @phone := phone END new_phone,
              entry_date
       FROM test
       CROSS JOIN (SELECT @phone := 0, @count := 0) init_variables
       ORDER BY phone, entry_date DESC ) data
WHERE new_phone IS NOT NULL;

The amount of rows per phone is specified in CASE condition (in this query it is 3). The most recent rows are saved (this is defined by ORDER BY clause in the subquery).

fiddle (do not look at the MySQL version in the fiddle – the features which are specific for version 8+ are used for sample data generation only).

Method 2

You have to assign row numbers on each of the phone_number entries, sort by descending entry_date. Then you’ll do a query that will only return row number 1-10 for each phone_number. In this operation, you don’t need COUNT(*) and GROUP BY. Something like this will probably work:

SELECT phone_number, entry_date FROM
(SELECT phone_number, entry_date,
       ROW_NUMBER() OVER (PARTITION BY phone_number ORDER BY entry_date) AS Rownum
       FROM test_table) AS subquery1
   WHERE Rownum <=10
  ORDER BY phone_number, entry_date DESC;

ROW_NUMBER() function is available on MySQL version 8+.

For older MySQL version, you can emulate ROW_NUMBER() using this method:

SELECT phone_number, entry_date FROM
(SELECT phone_number, entry_date,
    @row_number:=CASE WHEN @pn = phone_number THEN @row_number + 1
                      ELSE 1 END AS rownum, 
    @pn:=phone_number PN
FROM test_table t
CROSS JOIN (SELECT @pn:=0, @row_number:=0) as n
ORDER BY entry_date) AS subquery1
       WHERE Rownum <=10
       ORDER BY phone_number, entry_date DESC;

Then simply add INSERT INTO new_table .. on top of the query for the insert process.

Demo fiddle

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply