How can I determine size of database prior to importing from SQL dump?

All we need is an easy explanation of the problem, so here it is.

I had to manually duplicate a server for development, and the only way I could duplicate the database was via SQL dump. I was importing into MySQL 14.14 on a virtual machine running Ubuntu 14. (Yeah, really old stuff, but the client isn’t ready to allow an upgrade.)

The SQL dump came to about 6GB of non-compressed, plain ANSI text, so assuming that the import queries bloated the data with all the query instructions, character-escape sequences, and textual representations of binary data, the database ought to be smaller, right? I figured 6GB for the temporary SQL dump, plus 6GB for the database plus another 8GB for the rest of the system (20GB total) ought to be more than enough. It wasn’t.

I rebuilt my virtual machine with a 30GB-disk thinking surely that would be enough, but it still wasn’t. So I rebuilt my virtual machine with 50GB, and ended up with 16GB leftover when all was said and done. It turns out the database blew up to 20.5GB!

Somewhere along the way I figured out how to check the size of an existing database in MySQL, but what I would like to know, when the original database is not available to query, is there an application or MySQL command that can passively process a SQL dump without building the database, for the purpose of estimating the size of the resulting database?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Look at the disk footprint for the database. That might be within a factor of 2 of the size of the output of mysqldump.

Your example of 6GB –> 20.5GB (3.4 ratio) suggests that there are a lot of BIGINTs with small numbers and/or CHARs that should be VARCHARs.

For example: A BIGINT will take 8 bytes of data, plus overhead of maybe another 6 bytes. A single-digit number will probably be 5, (2 bytes) in the dump. This leads to a 7:1 ratio between the dump and the database.

Your specific Question probably begs for Plan B to help avoid the surprised bloat.

A one-row table will take 16KB. A PARTITIONed table wastes at least 4MB per partition.

So, providing some of the CREATE TABLEs can help in guessing whether there some extreme cases that explode the dump well beyond the simple 2:1 that I started with.

Plan A: Write the dump to another table. For example, do this on another machine ("host2"):

host2$ mysqldump -h host1 ... >dump.sql

Note that this would have avoided your example by thinking of the VM as a separate "host" and not wasting the 16GB for the dump.

Plan B: If your VM is Docker, note how you can "mount a volume" — that is have dump.sql live outside the VM, but access it from within.

Plan C: Never write the dump; simply consume it immediately:

mysqldump -h host1 | mysql -h host2

will mostly clone the data onto another machine. (That can be run on either host.)

Plan D: Compress on the fly:

mysqldump | gzip >dump.sql.gzip

Ordinary text compresses about 3:1. (But if you have only numbers, I can’t predict the compression factor.)

Reloading that later:

gunzip <dump.sql.gzip | mysql

Method 2

I don’t know of any tool that can do what you ask.

The space required can vary due to many factors:

  • Storage engines
  • Number of indexes per table
  • Data types
  • Character sets
  • Table options (for example row format and compression)
  • Numerous MySQL Server configuration options
  • Table encryption

If you are trying to optimize the storage, try launching a cloud instance with a modest instance type but over-provisioned with plenty of storage space. Then restore your database dump file, and measure the actual size of the restored database. Use that information to provision another cloud instance with exactly the storage you think is appropriate. Then drop the first one, since you only used it to get the size estimate.

Another consideration: Do you really want to provision only enough space for the current database? You don’t think it will keep growing (most databases keep growing, in my experience). Does your storage volume have enough space for binary logs? For temporary tables needed during ALTER TABLE? For temporary tables needed during queries?

It’s usually a good idea to make sure you have at least double the storage space of your current database size.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply