All we need is an easy explanation of the problem, so here it is.
Which one is generally more efficient for batch updates between these two forms for the same operation?
UPDATE table SET a = val.b FROM (VALUES (0,1), (2,3)) AS val(b,id) WHERE table.id = val.id
UPDATE table SET a = 0 WHERE table.id = 1; UPDATE table SET a = 2 WHERE table.id = 3;
I found this in the doc:
When a FROM clause is present, what essentially happens is that the
target table is joined to the tables mentioned in the from_item list,
and each output row of the join represents an update operation for the
Does it mean that the two options are equivalent under the hood?
For option 1, I wonder if Postgres will only do index updates only once at the end, after all the rows are updated, or once for each row being updated. The former should be more efficient than the latter, which would make option 1 better than option 2.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Merely delaying the index maintenance to the end is unlikely to increase the efficiency–that would require an algorithm change, not just a delay. The same work still has to be done. For GIN indexes, there is a "fast update" mechanism but if it is turned on then it will get applied to both cases. If more/better "fast update"-like features get implemented, they will likely be implemented in a similar way–so that it improves both cases.
The big advantages of using the first method is that it doesn’t need to flush WAL between each update, doesn’t need to parse/analyze/plan each update, and doesn’t need to do IPC/context swap/network round trip to the client for each update. But the client could ameliorate these issues by using explicit transactions, prepared statements, or pipelining.
If the values list is quite long, it might be faster to put it into a hash table and then read all of the target table and probe the hash table with it, rather than doing a loop over the values list with a nested index scan on the target table. With the 2nd method, it is effectively forced to do the nested loop even if that is slower.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂