Why do we need multiple levels of cache memory?

All we need is an easy explanation of the problem, so here it is.

In general a cache memory is useful because the speed of the processor is higher than the speed of the ram (they are both increasing in speed but the difference still remains). So reducing the number of memory accesses is desiderable to increase performance.

My question is why do we need multiple level of caches (L1, L2, L3) and not just one?

I know that L1 is the fastest and the smallest, L2 is a bit slower and a bit bigger and so on… but why do they create the hardware this way?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

This is to do with the physical size on the die. Each bit in a cache is held by one or more transistors, so if you want a lot of cache you need a lot of transistors. The speed of the cache is essentially inversely proportional to locality to the unit that wants to access it – in tiny devices such as this, communication gets slower when your signal path gets longer. This is partially to do with substrate impedance, but at this level there are some more complex physics phenomena involved.

If we want to include a large singular cache, it has to be within a very short distance of the MMU, ALU, etc. at the same time. This makes the physical design of the processor quite difficult, as a large cache takes up a lot of space. In order to make the cache “local” to these subunits, you have to sacrifice the locality of the subunits to one-another.

By using a small, fast local cache (L1) we can maximise locality and speed, but we lose size. So we then use a secondary cache (L2) to keep larger amounts of data, with a slight locality (and therefore speed) sacrifice. This gives us the best of both worlds – we can store a lot of data, but still have a very fast local cache for processor subunits to use.

In multicore processors, the L3 cache is usually shared between cores. In this type of design, the L1 and L2 caches are built into the die of each core, and the L3 cache sits between the cores. This gives reasonable locality to each core, but also allows for a very large cache.

The functionality of caches in modern processors is very complicated, so I’ll not even attempt a proper description, but a very simplistic process is that target addresses are looked for in the L1, then the L2, then the L3, before resorting to a system memory fetch. Once that fetch is done, it’s pulled back up through the caches.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply