All we need is an easy explanation of the problem, so here it is.
COPY is documented as not being WAL-logged if it’s loaded on the same transactions..
In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include: CREATE TABLE AS,
CREATE INDEX,
CLUSTER,
COPY into tables that were created or truncated in the same transaction
With that said, do any of the non-durability features speed up a pg_restore -1
?
-1 –single-transaction Execute the restore as a single transaction (that is, wrap the emitted commands in BEGIN/COMMIT). This ensures that either all the commands complete successfully, or no changes are applied. This option implies –exit-on-error.
COPY
is a single transaction, willsynchronous_commit
do anything? Or does that only speed up batch-INSERTS
?- Does making the initial table
UNLOGGED
matter ifCOPY
ignores the WAL log anyway? - What about
max_wal_size
andcheckpoint_timeout
? I assume they would do nothing without WAL-logging? - To what extent does turning off
fsync
do anything other than playing with fire? Are we likely to see a speed up by turning it off?
question inspired by @Craig Ringer’s answer on SO
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
The answer you refer to is talking about bulkloading a DB, not restoring a dump.
synchronous_commit
will have minimal to no effect on COPY
, since there’s only one commit at the end.
COPY
can only reduce WAL, not ignore it, and only if wal_level = minimal
. Most production systems run with wal_level = hot_standby
(in pg10, wal_level = replica
) or higher. UNLOGGED
tables absolutely do benefit COPY
at higher wal_level
, but only if you do not later plan on making the table logged. If you do, you’re just deferring the WAL cost, since it’ll be paid when you convert the UNLOGGED
table to LOGGED
instead of at COPY
time. Use UNLOGGED
tables if you can regenerate and reload the data later in case of crash and you don’t need it to replicate. pg_restore
will restore unlogged tables as unlogged, and logged tables as logged, so it’s kind of a moot point.
Don’t turn fsync
off unless you really know what you’re doing, or you don’t care about your data. For bulk loading a new blank PostgreSQL instance, wal_level = minimal
and synchronous_commit = off
are probably going to be just as good. Better, even, since you avoid a whole bunch of WAL writing.
max_wal_size
won’t make much difference with bulk loading if wal_level = minimal
. If you’re using a higher WAL level you do need to tune checkpoints for best results. checkpoint_timeout
doesn’t really matter since it generally only comes into play on idle-ish systems.
So if you’re using pg_restore
to a system configured for physical replication (hot standby/warm standby/archive PITR) you’ve got to pay the costs for that, and there’s not much you can do to speed things up on top of what COPY
already does. pg_restore
uses COPY
.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0