10B rows table in MySQL

All we need is an easy explanation of the problem, so here it is.

We have a table in MySQL (RDS) with 4B rows. It has 3 indexes, one of which is UNIQUE. Table size including indexes is around 400GB. Currently it’s on db.r5.2x – 8 CPUs, 64GB RAM.
Write patterns are a mix of INSERT and UPDATES, no DELETE.
The table is always queried using one of indexes. Most queries return 1-5 rows.

Both write and read performance is OK, and scales reasonably well (sub linear) when instance size increases.

We can live with expensive schema changes and inability to run select count(*).

My questions are:

  1. Are there hidden performance penalties for such a large table?
  2. When we will hit a wall, what will be the reason? Storage over >X GB or # rows > 10B ?
  3. Will this continue to work with 10B rows? 20B rows?


CREATE TABLE `xyz_s3_metadata` (
  `rowId` bigint(20) NOT NULL AUTO_INCREMENT,
  `sId` varchar(50) NOT NULL,
  `ruid` varchar(200) NOT NULL,
  `s3Pointer` varchar(250) NOT NULL,
  `latest` tinyint(1) NOT NULL DEFAULT '1',
  `version` int(11) NOT NULL,
  PRIMARY KEY (`rowId`),
  UNIQUE KEY `ruidId_UNIQUE` (`ruid`),
  KEY `updateTime_idx` (`updateTime`),
  KEY `Id_idx` (`sId`))

Most of the queries are

select s3Pointer 
  from xyz_s3_metadata
where  sId = :x

select s3Pointer 
  from xyz_s3_metadata
where  ruid = :x
  and  latest

The table is updated by

INSERT INTO {table} (sId, ruid, s3Pointer, latest, version)
VALUES (%s, %s, %s, %s, %s)
    s3Pointer = IF(VALUES(version) >= version,
    latest = IF(VALUES(version) >= version,
    updateTime = IF(VALUES(version) >= version,
    sId = IF(VALUES(version) >= version,
    version =   IF(VALUES(version) >= version,



How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Short answer: No problem

Long answer:

  • 32TB for table size is a brick wall, but you are a long way from that. That might be at about 300B rows.

  • INT UNSIGNED AUTO_INCREMENT hits a brick wall at a little more than 4B. I assume you don’t have such.

  • Every factor of (about) 100 in size increases the depth of the BTrees by 1. The depth is currently about 5. (This is not a big deal.) (Yes, this disagrees with J.D. But there are about 100 rows per block, so it is log-base-100, not "2". Sure, the 100 rows in the leaf block need to be searched, but that is much less effort than getting another block.)

  • Disk space for the table is an obvious limit. And you presumably know that certain ALTERs temporarily use extra disk space about equal to the size of the table.

  • Non-leaf-node blocks tend to stay cached in the buffer_pool. If your accesses are "random", then queries probably depend on how many leaf nodes blocks need to be fetched from disk. When the table is a lot bigger than the buffer pool, a non-leaf-node may need to be fetched from disk. This could lead to a discontinuity in the graph of table size and speed. But not a "brick wall". That is, if a 5-random-row query is currently hitting 5 leaf node blocks, it might jump to 10 blocks as the table grows past 100 times the RAM size. [There is a lot of ‘hand-waving’ in this analysis.]

  • Some schema changes benefit greatly from pt-online-query-digest or gh-ost.

  • If we could see SHOW CREATE TABLE and the main queries, we may have suggestions.

  • Building and maintaining summary table(s) can make COUNT(*) and other aggregates somewhat more manageable at your scale.

Method 2

In theory, if your data is indexed appropriately for the queries you need to run against it, and the predicates of your queries result in a cardinality that invokes an index seek type of operation, then it is unlikely for you to grow in data size that would outpace the ability of your server’s hardware, in a reasonable timeframe.

To dive into the above, a B-Tree based index uses a logical data structure whose search algorithm results in a Big O of log2(n) (base 2) time. Essentially the depth of a B-Tree grows only 1 level deeper every time the amount of data doubles. What this generally means is the search time on an index is only marginally increased for every magnitude in growth in the amount of data it covers. Note that log2(n) is a generalized theoretical Big O function, and it’s actually much better than that in practice, as noted in Rick James’s answer which goes more into the specifics on the calculation.

Putting some numbers to the conceptualization, log2(4 billion rows) is roughly 32, meaning the longest branch in a B-Tree that could be seeked on, for an index covering 4 billion rows of data, is 32 nodes deep. Multiply that amount of data by 1,000 and log2(4 trillion rows) is roughly 42, meaning the 42 nodes is the most amount of nodes that could be seeked on. As you can see, the growth of the search time is logarithmically much less than the growth of the data itself. Seeking on these trivial numbers of nodes is no problem on any modern hardware, and hopefully makes it apparent that big data doesn’t have to be a big problem.

Now the above is a bit of a generalization regarding the theory on how indexing works, and why it’s awesomely efficient, even with large amounts of data. But there are some other factors at play in practice on a MySQL server instance that could potentially affect performance. What will work for your specific case in the future is hard to concretely say without testing, but based on your use cases (low cardinality selects) and the performance you’re currently seeing, there’s no reason (in theory) why you shouldn’t see the same outcome for the same use cases, all things being equal, but with any amount of more data.

The tagline here is that it is neither the total number of rows a table has or its total number of bytes of data, that affects performance of index that covers that data, reasonably and theoretically speaking. Here’s an article on B-Tree Indexing that goes more in depth on how it works under the hood.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply