Load huge mysqldump (500G)

All we need is an easy explanation of the problem, so here it is.

I have a 500G mysqldump to load and it seems taking forever. My server is with Intel i9-10920X @ 3.5GHz and 128GB RAM. The database is set up in a HDD harddisk. I have the mysql (Ver 8.0.29-0ubuntu0.20.04.3) setup as follows:

innodb_buffer_pool_size = 32G
innodb_log_buffer_size = 512M
innodb_log_file_size = 2G
innodb_write_io_threads = 32
innodb_flush_log_at_trx_commit = 0
innodb_doublewrite = 0

I further set the following before source .sql file:

SET global max_allowed_packet=1000000000;
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;

Right now it read 20k rows per second. How can I optimize further? Thanks!

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

How long do you mean when you say forever? Hours? Days? I wouldn’t be surprised if it takes days.

Importing an .sql dump file is notoriously time-consuming. It is bound to be single-threaded, so you can only use one CPU core, no matter what type of CPU you have.

The I/O system is important. As you fill the 2G InnoDB log file, the dirty pages from the buffer pool must flush to disk. Using a fast directly-attached disk system like NVMe can help. Using a striping RAID-0 or RAID-10 can help. Using remote storage (for example AWS EBS) is bad for latency.

Using minimal indexes in your tables can help. Think about every row write being multiplied by the number of indexes in the table. The table itself is stored as a clustered index, and that’s one write. Then each secondary index is an additional write. Unique indexes must be synchronous writes (though if you set unique_checks=off, this is relaxed). Non-unique indexes can be deferred by change buffering, but they do need to get merged into the tablespace eventually.

Instead of loading a .sql file, it could be much faster to use LOAD DATA [LOCAL] INFILE. See my presentation Load Data Fast!. But you can’t use that with .sql dump files. It’s for CSV files and similar.

Loading multiple tables in parallel using LOAD DATA [LOCAL] INFILE is probably the best you can get for bulk loading large data sets. This is the idea behind the MySQL Shell Parallel Table Import Utility.
See an evaluation of parallel table import here:

To get any faster, you’d have to resort to physical backups instead of dump files (i.e. Percona XtraBackup), or filesystem snapshots.

Method 2

  1. This file is still relatively large for mysql. It is recommended to split the file.

  2. Turn on automatic submission or submit multiple times in batches.

3.innodb_buffer_pool is set larger, and the log cache is also set larger.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply