Why does GROUP BY on table A cause “using temporary; using filesort” on table B?

All we need is an easy explanation of the problem, so here it is.

Given the following contrived example code:

SELECT      A.author_name
          , COUNT(T.title) AS num_titles

FROM        authors  A
JOIN        titles   T  ON  T.authorFK  =  A.authorPK

GROUP BY    A.authorPK

In explain the Extra column shows Using temporary; Using filesort alongside table titles.

If I delete the COUNT and the GROUP BY, Extra is now blank.

This is obviously a contrived simple example; I can give a more real code sample, but I’m hoping someone knows what I’m talking about and can educate me based on this. Thank you!

Using the current version of MariaDB.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

  • "Using temporary; Using filesort" is usually put on the first Explain line, regardless of which table needs them.
  • There could be more than one "sort". Use EXPLAIN FORMAT=JSON SELECT ... to get this type of detail.
  • The "file" in "filesort" does not necessarily mean that the sorting is done on disk. When possible, it is actually done in RAM.
  • COUNT(T.title) could probably be replaced by COUNT(*) and get the same result without checking title for being non-NULL.
  • If an author might have zero titles, you won’t see that from your query — there will be no row for that author. (Cf: LEFT JOIN)

Method 2

What catches my eye with your sample query is "there is no WHERE clause".

That being the case, the query is basically doing a full join.

Even if authorFK is indexed in the titles table, the MySQL query optimizer decided that doing Using temporary; Using filesort is better than trying to use the index in the join process. If your real query features Using temporary; Using filesort in the middle of the EXPLAIN plan, then "it is what is".

You could do one tangible thing without changing the query: increase tmp_table_size and max_heap_table_size so that the temp table stays in RAM and not go to disk. This is contingent on your DB Server having enough RAM. This might help a simple query (such as the one you posted here), but may be counterintuitive if Using temporary; Using filesort appears multiple times in an EXPLAIN plan (implying more consumed RAM).

Also keep in mind that each JOIN clause that does not use an index creates an additional join buffer (See join_buffer_size documentation paragraph 1). Thus, more RAM is still needed.

You may want to change the JOIN into a LEFT JOIN. EXPLAIN plan may still remain the same but might change performance.

I hope this gives you a start in improving your overall real query.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply