Does UPDATE write a new row version for unchanged, TOASTed values?

All we need is an easy explanation of the problem, so here it is.

I am working with a PostgreSQL table with a large TEXT field, which is theoretically updated on a regular basis. I have thought about storing the data directly in the filesystem, but with TOAST, the data is already being stored off-page and compressed in the database so I figured I would keep things simple and just use the database storage.

To give some context, I am indexing RSS feeds. I will have a script run every 24h that pulls the RSS feeds and potentially updates the table. This can lead to a lot of dead tuples, and thus lots of space being used on disk. Of course, autovacuum will take care of this eventually but it has the potential to be a lot of data (many GB) and I want to make sure I know what will happen when I am doing lots of updates on this very large table.

One solution I have is to only update the TEXT field (storing the RSS data) if there are certain substantial changes to the feed, e.g. a new post on the website. This means that I could avoid doing the UPDATE unless I really have to. However, I still want to update the table (to keep track of when I most recently did an HTTP request). This will create a dead tuple with the old version of the row’s data.

What will happen to the TEXT data, if it isn’t actually changed? Will the UPDATE also duplicate the TEXT data, when it creates a dead tuple? Or will the TEXT data be left untouched, because it wasn’t changed and it is stored off-page?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

That’s the one major shortcut an UPDATE takes in Postgres’ MVCC model (as compared to DELETE + INSERT): Fields that are stored out-of-line (TOASTed) and not changed in the UPDATE are kept as is. Meaning, the old (soon to be dead) row version and the new row version in the main relation point to the same TOASTed values, no additional bloat there.

Or as the manual puts it:

During an UPDATE operation, values of unchanged fields are normally
preserved as-is; so an UPDATE of a row with out-of-line values
incurs no TOAST costs if none of the out-of-line values change.

As a_horse_with_no_name pointed out:
"Not changed in the UPDATE", or "unchanged" as the manual puts it, means "not targeted in a SET clause of the UPDATE". Postgres does not verify whether a new column value actually differs from the previous row version.

If possible, skip rows that do not change to begin with by adding a WHERE clause. That still leaves cases updating multiple columns at once where some stay unchanged. If that applies to a your big column, it might pay to update it separately, and only where it actually changes. See:

If, OTOH, you update big, TOASTed fields on a regular basis, consider the LZ4 compression algorithm (new in Postgres 14) for those columns. Disk footprint is a bit bigger, but performance is much better. See:

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply