To what extent do the "non-durability features" affect COPY's performance?

All we need is an easy explanation of the problem, so here it is.

COPY is documented as not being WAL-logged if it’s loaded on the same transactions..

In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include: CREATE TABLE AS,
CREATE INDEX,
CLUSTER,
COPY into tables that were created or truncated in the same transaction

With that said, do any of the non-durability features speed up a pg_restore -1?

-1 –single-transaction Execute the restore as a single transaction (that is, wrap the emitted commands in BEGIN/COMMIT). This ensures that either all the commands complete successfully, or no changes are applied. This option implies –exit-on-error.

  • COPY is a single transaction, will synchronous_commit do anything? Or does that only speed up batch-INSERTS?
  • Does making the initial table UNLOGGED matter if COPY ignores the WAL log anyway?
  • What about max_wal_size and checkpoint_timeout? I assume they would do nothing without WAL-logging?
  • To what extent does turning off fsync do anything other than playing with fire? Are we likely to see a speed up by turning it off?

question inspired by @Craig Ringer’s answer on SO

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

The answer you refer to is talking about bulkloading a DB, not restoring a dump.


synchronous_commit will have minimal to no effect on COPY, since there’s only one commit at the end.

COPY can only reduce WAL, not ignore it, and only if wal_level = minimal. Most production systems run with wal_level = hot_standby (in pg10, wal_level = replica) or higher. UNLOGGED tables absolutely do benefit COPY at higher wal_level, but only if you do not later plan on making the table logged. If you do, you’re just deferring the WAL cost, since it’ll be paid when you convert the UNLOGGED table to LOGGED instead of at COPY time. Use UNLOGGED tables if you can regenerate and reload the data later in case of crash and you don’t need it to replicate. pg_restore will restore unlogged tables as unlogged, and logged tables as logged, so it’s kind of a moot point.

Don’t turn fsync off unless you really know what you’re doing, or you don’t care about your data. For bulk loading a new blank PostgreSQL instance, wal_level = minimal and synchronous_commit = off are probably going to be just as good. Better, even, since you avoid a whole bunch of WAL writing.

max_wal_size won’t make much difference with bulk loading if wal_level = minimal. If you’re using a higher WAL level you do need to tune checkpoints for best results. checkpoint_timeout doesn’t really matter since it generally only comes into play on idle-ish systems.

So if you’re using pg_restore to a system configured for physical replication (hot standby/warm standby/archive PITR) you’ve got to pay the costs for that, and there’s not much you can do to speed things up on top of what COPY already does. pg_restore uses COPY.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply