# Where MySQL is taking time either in disk read or computation?

From SQl case study

System configuration is

`Software: Percona 5.6.15-63.0. Hardware: Supermicro; X8DTG-D; 48G of RAM; 24xIntel(R) Xeon(R) CPU L5639 @ 2.13GHz, 1xSSD drive (250G)`

Query is `select yeard, count(*) from ontime group by yeard` where yeard is indexed column

The query is simple, however, it will have to scan 150M rows. Here is the results of the query (cached):
`The query took 54 seconds and utilized only 1 CPU core`

My understanding :- Per my calculation it should have finished in much less time based on above system shared configuration and data to scan.
I know I am wrong and missing something but what is that ?

Here is my calculation for seconds

1. For HDD, average time taken to read 100MB of data from disk is around 1 sec. For SSD is 5 to 10X faster. But still conservatively if I assume the speed of
100MB per seconds , time to read the data will be `size of data in MB/100` seconds . Based on this size of data is `150 * 10^6 * 4/10^6= 600 MB`
assuming each year is of 4 byte long. So total time to read the data from disk should be `600/100 = 6 secs`

2. Now it 2.13GHz CPU which means that 2 billion cycles per seconds which on average means 2 billion instruction per second per core. Now actual
time taken to execute the query was 54 seconds which means time taken to compute the instruction was approx `54-6 = 48 seconds`. Does it mean to it had to execute around `48 * 2 = 96 billion instructions` just to calculate
the count and group by year or am I missing something ?

May be more time in disk read or count looks simple but internally it involves number of instructions ?

## How to solve :

### Method 1

Rule of Thumb: If the necessary data is cached, the query will run 10x faster than if not. I have seen this in a wide variety of queries. (Of course, the “10” varies a lot, too.)

Assuming

`````` SELECT yeard, COUNT(*) FROM t GROUP BY yeard;

INDEX(yeard)
``````

The Optimizer will scan the entire `INDEX(yeard)` of 150M ‘rows’. It does not have to do much more (at a high level) than count rows until `yeard` changes. That is the `INDEX` and the `GROUP BY` work well together; they are not two separate steps.

• About 400 index entries per 16KB block.
• 100 blocks/second is a more realistic estimate for InnoDB hitting HDD. (This goes away for any blocks that are cached.)
• 150M/400/100 = 3750 seconds. So, 54s for SSD seems about optimistic. Or some of the blocks were cached.

If you run the query a second time (after all the blocks are cached), it may run 10x faster.

It’s hard to say how many CPU cycles a query will take. The code to “get the next row” (even of an index) is somewhat generic, and meanwhile, it is building (probably) an in-memory hash of the results. There are also steps to parse the query, decide how to optimize it, deliver the results, etc.

The `EXPLAIN` in this example will say “Using index”, meaning that it only used the index’s BTree and did not need to touch the data’s BTree.

The BTree is really a B+Tree, meaning that consecutive blocks are linked, making linear scans (as with your query) efficient.

