Where MySQL is taking time either in disk read or computation?

All we need is an easy explanation of the problem, so here it is.

From SQl case study

System configuration is

Software: Percona 5.6.15-63.0. Hardware: Supermicro; X8DTG-D; 48G of RAM; 24xIntel(R) Xeon(R) CPU L5639 @ 2.13GHz, 1xSSD drive (250G)

Query is select yeard, count(*) from ontime group by yeard where yeard is indexed column

The query is simple, however, it will have to scan 150M rows. Here is the results of the query (cached):
The query took 54 seconds and utilized only 1 CPU core

My understanding :- Per my calculation it should have finished in much less time based on above system shared configuration and data to scan.
I know I am wrong and missing something but what is that ?

Here is my calculation for seconds

  1. For HDD, average time taken to read 100MB of data from disk is around 1 sec. For SSD is 5 to 10X faster. But still conservatively if I assume the speed of
    100MB per seconds , time to read the data will be size of data in MB/100 seconds . Based on this size of data is 150 * 10^6 * 4/10^6= 600 MB
    assuming each year is of 4 byte long. So total time to read the data from disk should be 600/100 = 6 secs

  2. Now it 2.13GHz CPU which means that 2 billion cycles per seconds which on average means 2 billion instruction per second per core. Now actual
    time taken to execute the query was 54 seconds which means time taken to compute the instruction was approx 54-6 = 48 seconds. Does it mean to it had to execute around 48 * 2 = 96 billion instructions just to calculate
    the count and group by year or am I missing something ?

May be more time in disk read or count looks simple but internally it involves number of instructions ?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Rule of Thumb: If the necessary data is cached, the query will run 10x faster than if not. I have seen this in a wide variety of queries. (Of course, the “10” varies a lot, too.)

Assuming

 SELECT yeard, COUNT(*) FROM t GROUP BY yeard;

 INDEX(yeard)

The Optimizer will scan the entire INDEX(yeard) of 150M ‘rows’. It does not have to do much more (at a high level) than count rows until yeard changes. That is the INDEX and the GROUP BY work well together; they are not two separate steps.

  • About 400 index entries per 16KB block.
  • 100 blocks/second is a more realistic estimate for InnoDB hitting HDD. (This goes away for any blocks that are cached.)
  • 150M/400/100 = 3750 seconds. So, 54s for SSD seems about optimistic. Or some of the blocks were cached.

If you run the query a second time (after all the blocks are cached), it may run 10x faster.

It’s hard to say how many CPU cycles a query will take. The code to “get the next row” (even of an index) is somewhat generic, and meanwhile, it is building (probably) an in-memory hash of the results. There are also steps to parse the query, decide how to optimize it, deliver the results, etc.

The EXPLAIN in this example will say “Using index”, meaning that it only used the index’s BTree and did not need to touch the data’s BTree.

The BTree is really a B+Tree, meaning that consecutive blocks are linked, making linear scans (as with your query) efficient.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply