All we need is an easy explanation of the problem, so here it is.
In Postgres 13, I have a table which gets updated frequently. However, the update query is rather complicated and uses the same values multiple times. So, using a CTE seems quite a logical thing to do.
A simplified example looks like this:
WITH my_cte AS ( SELECT my_id, CASE WHEN my_value1 > 100 THEN 50 ELSE 10 END AS my_addition FROM my_table WHERE my_id = $1 ) UPDATE my_table SET my_value1 = my_table.my_value1 + my_cte.my_addition, my_value2 = my_table.my_value2 + my_cte.my_addition FROM my_cte WHERE my_table.my_id = my_cte.my_id
Now I’m wondering: What would happen if between the
SELECT in the CTE and the
UPDATE, the table is updated by another query, changing
my_value1 on thus, the calculation of
my_addition were to become outdated and wrong when the
UPDATE happens. Can such a situation occur? Or does Postgres set an implicit lock automatically?
If Postgres does no magic here and I need to take care of it myself: Would it be sufficient to do
FOR UPDATE in the
SELECT of the CTE?
Sorry if I did not make myself clear here: It’s not that I want to "see" those concurrent modifications, I want to prevent them i.e. once the calculation the
SELECT is done, no other queries might modify that very row till the
UPDATE is done.
In real life, what I mocked here by
CASE WHEN my_value1 > 100 THEN 50 ELSE 10 END is about 20 lines long and I need it at about 5 places in the
UPDATE. Since I’m a big fan of "Do not repeat yourself", I think a CTE is the way to go. Or is there a better way to avoid copy & pasting in an
UPDATE without a CTE?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Postgres uses a multiversion model (Multiversion Concurrency Control, MVCC).
READ COMMITTED isolation level, each separate query effectively sees a snapshot of the database as of the instant the query begins to run. Subsequent queries – even within the same transaction – can see a different snapshot if concurrent transactions are committed in between. (Plus what has been done in the same transaction so far.)
However, as far as CTEs are concerned, all sub-statements in
WITH are executed concurrently with the outer statement, they effectively see the same snapshot of the database. All of it is considered a single query for this purpose.
So, no, you don’t need an explicit lock to stay consistent.
Encapsulating the logic in a function may be convenient for a number of reasons, but that has no effect whatsoever on concurrency. Aside: a CTE with a volatile function is never inlined. See:
SELECT does not lock queried rows. Postgres allows concurrent
UPDATE locks target rows. Concurrent transactions trying to write also, have to wait until the locking transaction has finished.
If you want to forbid writes to rows (columns) that have only been selected from while your
UPDATE is in progress, you may want to take locks anyway (or use a stricter isolation level). Maybe
FOR UPDATE locks, or maybe a weaker lock. That depends on details and requirements you are expressly withholding / not giving in your question.
Also (though you did not ask for that), if multiple concurrent transactions may be writing to overlapping rows (more than one at a time), be sure to adhere to the same, consistent order of rows to avoid deadlocks.
Building on what a_horse_with_no_name said:
I would put such a condition into a (SQL) function. Another alternative to locking (if you expect this to occur rarely) would be to use the
serializableisolation level and re-run the UPDATE if an error occurs.
Put the addition logic into a function, and then call that function each time you went to set a new value. This will help you in two ways.
- This allow you to avoid duplicating the addition logic each time you use it.
- This makes for a very simple update statement that can get in quick, lock just a few rows, and get out.
Something like this should work.
CREATE FUNCTION fn_my_addition(my_value int) RETURNS INT LANGUAGE SQL AS $$ select CASE my_value1 > 100 THEN 50 ELSE 10 END; $$; UPDATE my_table SET my_value1 = my_value1 + fn_my_addition(my_value1), my_value2 = my_value2 + fn_my_addition(my_value2) WHERE my_id = $1;
If you want to prevent concurrent statements from modifying the rows that the CTE selects before they get updated, you need to use
SELECT ... FOR NO KEY UPDATE in the CTE.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂