Why linear read ahead (prefetch) may improve performance

All we need is an easy explanation of the problem, so here it is.

In MySQL,

Linear read-ahead is a technique that predicts what pages might be needed soon based on pages in the buffer pool being accessed sequentially… if you set innodb_read_ahead_threshold to 48, InnoDB triggers a linear read-ahead request only when 48 pages in the current extent have been accessed sequentially.

In the default setting, an extent contains 64 pages. So the prefetch mechanism will "asynchronously read-ahead the entire following extent" (i.e., 64 – 48 = 16 pages). The assumption here behind linear read-ahead is that the remaining 16 pages are very likely to be read after 48 sequential page reads.

Qestion

Without prefetching, even though the remaining 16 pages are going to be read next, MySQL can just read them sequentially. This should have the same performance as using prefetching. So what are the benefits of linear prefetching? Only when the remaining 16 pages may get randomly accessed can make it meaningful?

Thanks!

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

The pre-fetch can help the OS disk access scheduler, especially on mechanical drives, as it knows the other pages need to be read so might be able to save some head movements. It can similarly help other caching layers that may be present in the IO subsystem, though you don’t want it to be too aggressive about this as it could risk dropping useful things from cache to replace them with pages that won’t be needed.

If the pages are being accessed sequentially immediately then there is likely to be little difference, but if there is some latency between the page reads (perhaps some CPU processing is done between them, which takes long enough for another thread to request pages, or maybe the calling application if cursoring through results and paginating), that potential to avoid head movements and such can make a measurable difference.

Method 2

When InnoDB asks for an "extent" to be allocated, and if the OS is willing to provide "adjacent" blocks on disk, then the read can be a lot faster. But that is on HDD (spinning drive), not on SSD. Since most disks these days are SSDs, such adjacency is not relevant.

On the other hand, if "readahead" is really referring to starting the I/O before it has been asked for, then this may provide some overlap between CPU and I/O. This applies to both HDD and SSD, except that SSDs are something like 10 times as fast.

In my experience, I/O is the main factor in speed of a query.

If you have a query that needs to read megabytes of data, then we should look at the query and see if there are better ways to formulate and/or index the query. That may speed up the query more than depending on read-ahead.

For Data Warehousing, building and maintaining a "summary table" can avoid massive table scans, leading to perhaps a 10x speedup.

(And see David’s Answer, which goes into more detail on some of what I say.)

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply