Do I need an explicit FOR UPDATE lock in a CTE in UPDATE?

All we need is an easy explanation of the problem, so here it is.

In Postgres 13, I have a table which gets updated frequently. However, the update query is rather complicated and uses the same values multiple times. So, using a CTE seems quite a logical thing to do.

A simplified example looks like this:

WITH my_cte AS (
    SELECT
          my_id,
          CASE WHEN my_value1 > 100 THEN 50 ELSE 10 END AS my_addition     
    FROM my_table      
    WHERE my_id = $1
)
UPDATE my_table
        SET my_value1 = my_table.my_value1 + my_cte.my_addition,
            my_value2 = my_table.my_value2 + my_cte.my_addition
FROM my_cte
WHERE my_table.my_id = my_cte.my_id

Now I’m wondering: What would happen if between the SELECT in the CTE and the UPDATE, the table is updated by another query, changing my_value1 on thus, the calculation of my_addition were to become outdated and wrong when the UPDATE happens. Can such a situation occur? Or does Postgres set an implicit lock automatically?

If Postgres does no magic here and I need to take care of it myself: Would it be sufficient to do FOR UPDATE in the SELECT of the CTE?

Sorry if I did not make myself clear here: It’s not that I want to "see" those concurrent modifications, I want to prevent them i.e. once the calculation the SELECT is done, no other queries might modify that very row till the UPDATE is done.

In real life, what I mocked here by CASE WHEN my_value1 > 100 THEN 50 ELSE 10 END is about 20 lines long and I need it at about 5 places in the UPDATE. Since I’m a big fan of "Do not repeat yourself", I think a CTE is the way to go. Or is there a better way to avoid copy & pasting in an UPDATE without a CTE?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Postgres uses a multiversion model (Multiversion Concurrency Control, MVCC).

In default READ COMMITTED isolation level, each separate query effectively sees a snapshot of the database as of the instant the query begins to run. Subsequent queries – even within the same transaction – can see a different snapshot if concurrent transactions are committed in between. (Plus what has been done in the same transaction so far.)

However, as far as CTEs are concerned, all sub-statements in WITH are executed concurrently with the outer statement, they effectively see the same snapshot of the database. All of it is considered a single query for this purpose.

So, no, you don’t need an explicit lock to stay consistent.

Encapsulating the logic in a function may be convenient for a number of reasons, but that has no effect whatsoever on concurrency. Aside: a CTE with a volatile function is never inlined. See:

A SELECT does not lock queried rows. Postgres allows concurrent UPDATES. But UPDATE locks target rows. Concurrent transactions trying to write also, have to wait until the locking transaction has finished.

If you want to forbid writes to rows (columns) that have only been selected from while your UPDATE is in progress, you may want to take locks anyway (or use a stricter isolation level). Maybe FOR UPDATE locks, or maybe a weaker lock. That depends on details and requirements you are expressly withholding / not giving in your question.

Also (though you did not ask for that), if multiple concurrent transactions may be writing to overlapping rows (more than one at a time), be sure to adhere to the same, consistent order of rows to avoid deadlocks.

Method 2

Building on what a_horse_with_no_name said:

I would put such a condition into a (SQL) function. Another alternative to locking (if you expect this to occur rarely) would be to use the serializable isolation level and re-run the UPDATE if an error occurs.

Put the addition logic into a function, and then call that function each time you went to set a new value. This will help you in two ways.

  1. This allow you to avoid duplicating the addition logic each time you use it.
  2. This makes for a very simple update statement that can get in quick, lock just a few rows, and get out.

Something like this should work.

CREATE FUNCTION fn_my_addition(my_value int)
RETURNS INT
LANGUAGE SQL
AS
$$
  select CASE my_value1 > 100 THEN 50 ELSE 10 END;
$$;

UPDATE my_table
SET my_value1 = my_value1 + fn_my_addition(my_value1),
    my_value2 = my_value2 + fn_my_addition(my_value2)
WHERE my_id = $1;

Method 3

If you want to prevent concurrent statements from modifying the rows that the CTE selects before they get updated, you need to use SELECT ... FOR NO KEY UPDATE in the CTE.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply