MySQL Dead Lock on Delete Issue

All we need is an easy explanation of the problem, so here it is.

I have recently updated my production code with the new query that is supposed to clean up orphaned records. It runs within the transaction as the last step.
And, as a result, I am seeing occasional dead lock exceptions coming from MySQL:

org.springframework.dao.DeadlockLoserDataAccessException: PreparedStatementCallback;
SQL [
DELETE ts FROM topic_subscriptions AS ts LEFT JOIN 
endpoint_to_topic_subscription_associations AS ctsa 
ON ts.id=topic_subscription_id WHERE ctsa.topic_subscription_id IS NULL
]; (conn=2699877) Deadlock found when trying to get lock; try restarting transaction; nested exception is java.sql.SQLTransactionRollbackException: (conn=2699877) Deadlock found when trying to get lock; try restarting transaction

Here are the DDLs for the 2 tables in question:

CREATE TABLE `topic_subscriptions` (
  `id` varchar(16) NOT NULL,
  `org_id` varchar(255) NOT NULL,
  `topic_subscription` varchar(255) NOT NULL,
  `virtual_broker_id` varchar(16) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `UK_ts_vbid_on_topic_subscriptions` (`topic_subscription`,`virtual_broker_id`),
  KEY `FK_topic_subscriptions_references_organizations_table` (`org_id`),
  KEY `FK_topic_subs_refs_virtual_brokers_table` (`virtual_broker_id`),
  CONSTRAINT `FK_topic_subs_refs_virtual_brokers_table` FOREIGN KEY (`virtual_broker_id`) REFERENCES `virtual_brokers` (`id`),
  CONSTRAINT `FK_topic_subscriptions_references_organizations_table` FOREIGN KEY (`org_id`) REFERENCES `organizations` (`org_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `endpoint_to_topic_subscription_associations` (
  `endpoint_id` varchar(16) NOT NULL,
  `topic_subscription_id` varchar(16) NOT NULL,
  `org_id` varchar(255) NOT NULL,
  PRIMARY KEY (`endpoint_id`,`topic_subscription_id`),
  KEY `FK_endpoint_to_topic_subscription_ref_org_tbl` (`org_id`),
  KEY `FK_endpoint_to_topic_subscription_ref_topic_subscriptions_tbl` (`topic_subscription_id`),
  CONSTRAINT `FK_endpoint_to_topic_subscription_ref_endpoints` FOREIGN KEY (`endpoint_id`) REFERENCES `endpoints` (`id`),
  CONSTRAINT `FK_endpoint_to_topic_subscription_ref_org_tbl` FOREIGN KEY (`org_id`) REFERENCES `organizations` (`org_id`),
  CONSTRAINT `FK_endpoint_to_topic_subscription_ref_topic_subscriptions_tbl` FOREIGN KEY (`topic_subscription_id`) REFERENCES `topic_subscriptions` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

As you can see I am not doing any explicit locking here or SELECT FOR UPDATE in the query above. I do use SELECT FOR UPDATE mechanism in other transactions but it doesn’t involve the mentioned tables directly.

Also, the default isolation level is REPEATABLE_READ.

Please help me understand the issue here.
Thanks!

==================UPDATE===================

------------------------
LATEST DETECTED DEADLOCK
------------------------
2021-04-13 08:34:28 0x2b2a3b191700
*** (1) TRANSACTION:
TRANSACTION 22501318, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 24 lock struct(s), heap size 1136, 881 row lock(s), undo log entries 8
MySQL thread id 2699877, OS thread handle 47459092735744, query id 13624170039 172.25.150.126 [db_name] Sending data
DELETE ts FROM topic_subscriptions AS ts LEFT JOIN endpoint_to_topic_subscription_associations AS ctsa ON ts.id=topic_subscription_id WHERE ctsa.topic_subscription_id IS NULL
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 824 page no 5 n bits 568 index FK_endpoint_to_topic_subscription_ref_topic_subscriptions_tbl of table [db_name].`endpoint_to_topic_subscription_associations` trx id 22501318 lock mode S waiting
Record lock, heap no 427 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 12; hex 32326672616638696f746770; asc 22fraf8iotgp;;
 1: len 11; hex 7638313876336d30307566; asc v818v3m00uf;;

*** (2) TRANSACTION:
TRANSACTION 22501319, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
24 lock struct(s), heap size 1136, 21 row lock(s), undo log entries 13
MySQL thread id 2699872, OS thread handle 47460380120832, query id 13624170054 172.25.150.126 [db_name] Sending data
DELETE ts FROM topic_subscriptions AS ts LEFT JOIN endpoint_to_topic_subscription_associations AS ctsa ON ts.id=topic_subscription_id WHERE ctsa.topic_subscription_id IS NULL
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 824 page no 5 n bits 568 index FK_endpoint_to_topic_subscription_ref_topic_subscriptions_tbl of table [db_name].`endpoint_to_topic_subscription_associations` trx id 22501319 lock_mode X locks rec but not gap
Record lock, heap no 427 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 12; hex 32326672616638696f746770; asc 22fraf8iotgp;;
 1: len 11; hex 7638313876336d30307566; asc v818v3m00uf;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 861 page no 5 n bits 192 index PRIMARY of table [db_name].`topic_subscriptions` trx id 22501319 lock_mode X waiting
Record lock, heap no 103 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 12; hex 323266723836636b67376572; asc 22fr86ckg7er;;
 1: len 6; hex 0000015694e6; asc    V  ;;
 2: len 7; hex e200000231014f; asc     1 O;;
 3: len 7; hex 6d616173646576; asc maasdev;;
 4: len 18; hex 746f706c6576656c2f6e6578746c6576656c; asc toplevel/nextlevel;;
 5: len 12; hex 323266723836636b67373666; asc 22fr86ckg76f;;

*** WE ROLL BACK TRANSACTION (1)

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Those two DELETEs are each doing a full table scan looking for certain rows to delete. Redundant. One will be automatically killed (by InnoDB); the other will to the task assigned.

Just looking at those Deletes, I see two issues —

  • Potentially inefficient way to delete rows. If this is a big table, let’s talk about ways to walk through it in chunks.
  • Unnecessarily repeating the command from separate connections. Have only one task in the background doing the delete. When it finishes, it starts over.
  • Perhaps a need for code elsewhere to prevent the need for the delete. When the row is deleted from ctsa, delete the corresponding row from ts.
  • Toss the deletes. Check the other code — perhaps it works correctly even when there is a missing row from ctsa. If not, maybe a minor change will make it work ‘correctly’.

Which of those would you like to discuss further?

If you don’t need the Deletes to be performed immediately, you could have a background task that continually does such Deletes — Delete some rows, sleep a few seconds, loop. If the table is, say, bigger than 100 rows, search for rows to delete in chunks. See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply