All we need is an easy explanation of the problem, so here it is.
I have a table with a last_update
column, I want to add a partition based on the last_update
column to remove old values.
My problem is that I have a unique key in the table which does not include the last_update
column.
What are my options?
Should I change the way I work with this table and remove the unique index?
EDIT
CREATE TABLE `fileso` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`hostId` int(11) unsigned NOT NULL COMMENT 'code',
`sha256` binary(32) NOT NULL COMMENT 'meir',
`fileName` varchar(150) DEFAULT NULL COMMENT 'meir',
`fullPath` varchar(350) DEFAULT NULL COMMENT 'meir',
`datein` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`uniqueness` binary(32) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
PRIMARY KEY (`id`),
UNIQUE KEY `uniqueness` (`uniqueness`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The unique key is an Hash representation of a couple of columns in the table(sha256, fullPath,hostId).
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
If you are going to remove ‘old’ records, then you are defeating the “uniqueness” of uniqueness
— since you cannot check against already-deleted rows.
So, make that just a INDEX
, not UNIQUE
.
Meanwhile, of what use is id
? You may as well promote uniqueness
to be the PRIMARY KEY
. (Yeah, this contradicts my previous recommendation.)
But, back to the question of PARTITIONing
forcing the inclusion of last_update
on all unique keys…
Plan A:
no `id`
PRIMARY KEY (uniqueness, last_update)
Uniqueness would need to be checked with a SELECT
, not INSERT IGNORE
.
Plan B:
no `id`
INDEX (uniqueness)
This has problems in Replication and other situations when you need to specifically identify a row.
Plan C:
keep `id`
PRIMARY KEY(id, last_update)
INDEX(uniqueness)
Uniqueness would need to be checked with a SELECT
, not INSERT IGNORE
.
I prefer Plan A, unless there is somthing to indicate that id
is necessary.
Meanwhile, see my advice on using PARTITIONing
for a sliding dataset — here
Addenda
With Plan A, do something like this to prevent a race condition:
BEGIN;
$result = SELECT uniqueness FROM tbl WHERE uniqueness = $x FOR UPDATE;
if $result, you have a dup; ROLLBACK and do whatever is appropriate.
INSERT INTO tbl (...) VALUES (...);
COMMIT;
Be sure to check for errors at each step — deadlocks and timeouts can happen (rarely). If so, start over at the BEGIN
.
If practical, consider batching the SELECT
and INSERT
so you can check and insert multiple rows at the same time. (I assume dups are rare?)
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0