Optimising MySQL query with lots of self-joins

All we need is an easy explanation of the problem, so here it is.

I have an entity_relationship table that describes relationships between entities. Each relationship will have 2+ entities involved.

In some cases it can be said that one entity is a constituent of a constituency. This is stored in the entity_relationship table.

I then intend to add entities to the relationships based on this: If entity Joe Bloggs is a constituent of constituency University of Life, and the University of Life is itself a constituent of constituency Made Up Universities then I will add Joe Bloggs into the 2nd relationship as an implied-constituent.

When all the implied-consituent relationships are populated (by running the query until no records are added) I’ll have a quick way to find out who’s linked to what without needing to do recursion at that stage.

The entity_relationship table looks like:

+-----------------+---------------+
| Field           | Type          |
+-----------------+---------------+
| entity_id       | int(10)       |
| relationship_id | int(10)       |
| type            | enum(...)     |
+-----------------+---------------+

and has keys (currently LOADS of ’em, trying to optimise!)

+------------+----------+--------------+-----------------+-------------+------+
| Non_unique | Key_name | Seq_in_index | Column_name     | Cardinality | Null |
+------------+----------+--------------+-----------------+-------------+------+
|          0 | PRIMARY  |            1 | entity_id       |      179429 |      |
|          0 | PRIMARY  |            2 | relationship_id |      179429 |      |
|          1 | r_t_e    |            1 | relationship_id |      179429 |      |
|          1 | r_t_e    |            2 | type            |      179429 | YES  |
|          1 | r_t_e    |            3 | entity_id       |      179429 |      |
|          1 | t_r_e    |            1 | type            |           8 | YES  |
|          1 | t_r_e    |            2 | relationship_id |      179429 |      |
|          1 | t_r_e    |            3 | entity_id       |      179429 |      |
|          1 | t_e_r    |            1 | type            |           6 | YES  |
|          1 | t_e_r    |            2 | entity_id       |      179429 |      |
|          1 | t_e_r    |            3 | relationship_id |      179429 |      |
+------------+----------+--------------+-----------------+-------------+------+

And then the query I’m trying is:

INSERT INTO entity_relationship  
SELECT lt.entity_id entity_id, 
       py.relationship_id relationship_id,
       'implied-constituent' `type` 
FROM entity_relationship lt,
     entity_relationship ly,
     entity_relationship pt,
     entity_relationship py
WHERE lt.type='constituent'
  AND lt.relationship_id = ly.relationship_id
  AND ly.type='constituency'
  AND ly.entity_id = pt.entity_id 
  AND pt.type='constituent' 
  AND pt.relationship_id = py.relationship_id 
  AND py.type='constituency';

The problem is this is taking 42s to run (even when the query results in zero rows to insert). The output of EXPLAIN (on the SELECT) shows:

+--------+-----+----+-------+---+------------------------+-----+------------------------+
| s._type|table|type|key    |len|ref                     |rows |Extra                   |
+--------+-----+----+-------+---+------------------------+-----+------------------------+
| SIMPLE |lt   |ref |t_r_e  |2  |const                   |89714|Using where; Using index|
| SIMPLE |ly   |ref |r_t_e  |6  |lt.relationship_id,const|    1|Using where; Using index|
| SIMPLE |pt   |ref |PRIMARY|4  |ly.entity_id            |    1|Using where             |
| SIMPLE |py   |ref |r_t_e  |6  |pt.relationship_id,const|    1|Using where; Using index|
+--------+-----+----+-------+---+------------------------+-----+------------------------+

Which looks OK – except perhaps the 3rd line where it does not say Using Index.

Can anyone see a way to optimise this?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

How to fix it in MySQL

As can be seen from line 3 of the EXPLAIN statement, a [edit:]covering index is not being used here. It needs to join on the entity_id and the type and no key is available for that.

ALTER TABLE entity_relationship ADD KEY e_t_r 
   (entity_id, type, relationship_id);

This makes the key available, but MySQL chooses not to use it. It can be forced with: USE INDEX (e_t_r) :

SELECT lt.entity_id entity_id, 
       py.relationship_id relationship_id,
       'implied-constituent' `type` 
FROM entity_relationship lt,
     entity_relationship ly,
     entity_relationship pt USE INDEX (e_t_r),
     entity_relationship py
WHERE lt.type='constituent'
  AND lt.relationship_id = ly.relationship_id
  AND ly.type='constituency'
  AND ly.entity_id = pt.entity_id 
  AND pt.type='constituent' 
  AND pt.relationship_id = py.relationship_id 
  AND py.type='constituency';

This now runs in 0.8s. (compare to 17.7s without forcing it to use that index.) This is with MySQL 5.1.63.

How to fix it in MariaDb

Well, you don’t have to!

MariaDb executes the query very fast with or without the USE INDEX intervention (0.25s, but it’s on a different host so I would expect this to be nearer the 0.8s in the optimised version above).

Also, MariaDb does not use the new index and is quite happy with the t_e_r index, including using it as a covering index (i.e. uses the index for the entire data fetch).

I’m getting more and more impressed with MariaDb and now considering switching.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply