MYSQL 5.7 Partition a table based on partition key of another related table

All we need is an easy explanation of the problem, so here it is.

Please understand a little background before getting to know the problem.

TABLE transaction (

t_id: AUTO INCREMENT BIGINT

t_execute_on_date: DATE,

timestamp: DATETIME,

... other columns
)

The table transaction has 300M (three hundred million) rows and growing (1M rows daily) and growing and we need to drop old data based on the column t_execute_on_date. I plan to drop old data (older than 2 years. approx 40M rows) on the basis of creation monthly range partitions based on KEY t_execute_on_date. I have tested this on a test server while using the procedure suggested in @RickJames blog post PARTITION Maintenance in MySQL: create a new partitioned table, with optimised indexes according to queries and then reinserting records, and drop old partitions, and this approach seems fine to me. Any suggestions in this approach are also welcome.

I have done this in master-master replication with one master offline, reorganising data, and recovering replication lag, and then drop the old master, bring the traffic to new one, if everything is fine.

TABLE transaction_history (

th_id: AUTO INCREMENT BIGINT,

t_id: FOREIGN KEY (t_id of TABLE transaction)

timestamp: DATETIME,

... other columns
)

This table has 1.5B+ (One billion five hundred million) rows growing 15M+ daily.
This table has related history of transaction and contains multiple rows corresponding to one t_id.

Once old transaction data from table transaction is dropped, I would like to drop the same from this table transaction_history as well.

This table does not have any key like t_execute_on_date based on which I can partition.
This table has 1.5B+ rows I think that adding an additional column t_execute_on_date is just not possible.

Like the approach discussed above would I have to manually pull data from table transaction_history based on t_id of TABLE transaction (reorganised ,old data dropped) and reinserting only the selected records into the new table transaction_history_partitioned?

The PROBLEM I face is how to purge old data from table transaction_history?

Using MYSQL 5.7
Running on
[2.8T HDD, Amazon Linux AMI 2017.03, 16 core, 63GB RAM]

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

NOTE: FOREIGN KEYS are not allowed in partitioned tables. Simply drop the FKs but keep the indexes that they generated. Presumably, the FK checks are redundant now that you have debugged your app.

adding an additional column t_execute_on_date is just not possible.

See pt-online-schema-change or gh-ost.

Using PARTITIONs is the optimal way to delete "old" data from the history table, but there are other approaches: http://mysql.rjweb.org/doc.php/deletebig

The one that comes to mind is to have a continually-running job that keeps looking at the "next 1000" rows to see if any of them should be deleted. (It must walk through the table using the PRIMARY KEY.) It might do a LEFT JOIN to see which ones are gone, as hinted at in this pseudo-code:

$a = 0;
Loop...
    $z = SELECT th_id FROM transaction_history order by th_id LIMIT 1000,1;
    exit loop if $z IS NULL
    SELECT th_id
        FROM      transaction_history AS the
        LEFT JOIN transaction AS t  ON th.t_id = t.t_id
        WHERE t.t_id IS NULL
          AND th.th_id >= $a
          AND th.t_id   < $z
    DELETE those rows -- probably better to do this as a single query
$a = $z;

When you exit the Loop, simply start over.

The hope is that this approach to deleting "old stuff" will be fast enough to keep up, though there will be some lag.

Still, I would hope that pt-online-schema-change or gh-ost can turn a non-partitioned table into a partitioned table with minimal downtime, regardless of table size. (No FKs.)

Caution: The optimal indexes for a partitioned table are usually different than the equivalent non-partitioned table. Toss any unused tables when adding partitioning; there is some overhead when building a table with the secondary indexes already in place.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply