MySQL: How to optimize a certain SELECT statement which causes very high load?

All we need is an easy explanation of the problem, so here it is.

Having a table with 25.000.000 entries, I have the following query which is started every ~ every 2 seconds which causes very high loads (up to 40 AAS in Amazon AWS). The execution takes 20 seconds up to 5 minutes, which even causes timeouts in user browsers and a high drop rate.

SELECT COUNT ( * ) AS `chk` 
  FROM ( SELECT `item_id` 
           FROM `items` 
          WHERE `item_status` IN (...) 
            AND `item_type` = ? 
            AND `user_id` != ? 
            AND `item_name` IN (...) 
          LIMIT 3 
       ) AS OTHERS
;

Index optimization is already done – user_id, item_name, item_type and item_status are all indexed (one single index per column).

More information:

  • One user has 1 – 1.000.000 Million entries
  • item_name is a varchar 128
  • item_type has a cardinality of 7
  • item_status has also a cardinality of 7
  • Just need to know if there are three or more matches

Please note that in ~ 50% of the cases MySQL has to check the complete table, because less then 3 items are found. So the limit only helps in case there are MORE than 3 items.

While I do Redis caching for certain results, for this query it is not possible because an exact result is always needed. With the database growing with ~ 1 entry per second, the query performance is getting worse pretty fast.

While I could solve most problems with indexing in the past, I have a serious problem here. Thought about solving the problem with some trigger or a view, but I am not sure if that helps? Doing such a change without knowing if it solves the problem is dangerous in a highly active production database.

My question to the pros here is: How can I solve that problem with MySQL?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

You could try using a composite index.

Your index could look like this:

INDEX (item_type, item_name, item_status, user_id)

This index should allow MySQL to find all rows efficiently and without having to lookup any rows from the actual table.

The order of the columns is important here:

  • item_type comes first since you have an equality filter on the column.
  • item_name and item_status come as second and third since both use an IN. The order of the columns could be changed but since item_name probably as a higher cardinality than item_status having it first may be a bit faster.
  • user_id comes last since it uses an inequality filter which can not be efficiently handled with an index.

In general MySQL will only use 1 index per table¹, so having more than 1 index does not help.

Also MySQL will not use an index if it does not look like the index will filter enough rows (with enough being calculated by the query planer based on table an index statistics). The reason for this is that when using indexes MySQL still needs to look up the original row in the table (unless you have a covering index²) which is somewhat expensive, so using an index to look up a lot of rows can actually be slower than doing a full table scan.

Given the number of rows in your table, most if not all of your 4 existing indexes will probably have too many rows per value and won’t even be considered by MySQL for this query. Though this really depends on the data distribution and cardinality.

You can check which indexes are used and if they are covering or not using EXPLAIN. For example

EXPLAIN SELECT COUNT ( * ) AS `chk` FROM ( SELECT `item_id` FROM `items` WHERE `item_status` IN (...) AND `item_type` = ? AND `user_id` != ? AND `item_name` IN (...) LIMIT 3 ) AS OTHERS;

If you use MySQL 8 you can also use EXPLAIN ANALYZE to get an actual understanding on how MySQL reads and filters the data with timings.

Rick James has written a great document on how to build the best index for a SELECT which I recommend for you to read. He also wrote a document about compound (composite) indexes which I can also recommend.

In the future please try to also provide at least the table schema and indexes (you can use SHOW CREATE TABLE ? for this) and the output of running the problematic query with EXPLAIN as well as the MySQL version. This reduces the amount of guessing we have to do and allows us to give more concrete and better answers.

¹ There is an optimization where MySQL can use more than 1 index called Index Merge Optimization, but it only works in some rare cases and should be avoided.

² A covering index is an index that contains all columns used in a query. Since all columns are already in the index, MySQL does not need to fetch other columns from the table itself. The index I proposed in this answer is an example of a covering index.

Method 2

On top of a composite index try making sure that all columns you are indexing are foreign keys to another table, this should make the index smaller on your large table, assuming your lookups have significantly less records than your main table. Even if they don’t it may still work:

For example:

5000Kb/sec disk scan speed (100GB disk is lowest you can reasonably commission on AWS)
1,000,000 records in item_name, < 100 in item_status:

  • item_name table is 125KB, index scan would take 25ms
  • main table indexes are ~ 400K, index scan would take 80ms

If you cannot change your data model you can force this way of approaching the query by using a temporary table:

create temporary table search_item_name(
    id int(11) autoincrement not null,
    primary key(id)
)

insert into search_item_name
select id from main_table where item_name in ()

Then in your main query, join search_item_name with an inner join.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply