Date based partitioning

All we need is an easy explanation of the problem, so here it is.

I have a table with a last_update column, I want to add a partition based on the last_update column to remove old values.

My problem is that I have a unique key in the table which does not include the last_update column.

What are my options?

Should I change the way I work with this table and remove the unique index?


CREATE TABLE `fileso` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `hostId` int(11) unsigned NOT NULL COMMENT 'code',
  `sha256` binary(32) NOT NULL COMMENT 'meir',
  `fileName` varchar(150) DEFAULT NULL COMMENT 'meir',
  `fullPath` varchar(350) DEFAULT NULL COMMENT 'meir',
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `uniqueness` binary(32) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniqueness` (`uniqueness`),

The unique key is an Hash representation of a couple of columns in the table(sha256, fullPath,hostId).

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

If you are going to remove ‘old’ records, then you are defeating the “uniqueness” of uniqueness — since you cannot check against already-deleted rows.

So, make that just a INDEX, not UNIQUE.

Meanwhile, of what use is id? You may as well promote uniqueness to be the PRIMARY KEY. (Yeah, this contradicts my previous recommendation.)

But, back to the question of PARTITIONing forcing the inclusion of last_update on all unique keys…

Plan A:

no `id`
PRIMARY KEY (uniqueness, last_update)

Uniqueness would need to be checked with a SELECT, not INSERT IGNORE.

Plan B:

no `id`
INDEX (uniqueness)

This has problems in Replication and other situations when you need to specifically identify a row.

Plan C:

keep `id`
PRIMARY KEY(id, last_update)

Uniqueness would need to be checked with a SELECT, not INSERT IGNORE.

I prefer Plan A, unless there is somthing to indicate that id is necessary.

Meanwhile, see my advice on using PARTITIONing for a sliding dataset — here


With Plan A, do something like this to prevent a race condition:

$result = SELECT uniqueness FROM tbl WHERE uniqueness = $x FOR UPDATE;
if $result, you have a dup; ROLLBACK and do whatever is appropriate.
INSERT INTO tbl (...) VALUES (...);

Be sure to check for errors at each step — deadlocks and timeouts can happen (rarely). If so, start over at the BEGIN.

If practical, consider batching the SELECT and INSERT so you can check and insert multiple rows at the same time. (I assume dups are rare?)

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from or, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply