UPDATE on big table in PostgreSQL randomly takes too long

All we need is an easy explanation of the problem, so here it is.

I’m trying to figure out why an UPDATE statement takes too long (>30 sec).

This is random, i.e. in most cases it finishes in under 100msec. However, sometimes (randomly) it takes > 30sec to complete.

Some specifics:

  • I’m using PostgreSQL 12 (actually, AWS Aurora)
  • I’m trying this in a database with no traffic, so it’s not being affected by any other queries running in the same time. I’m also monitoring the logs to see if anything else is running, and I don’t see anything.
  • I’ve tried REINDEX ing, VACUUMing (and VACUUM ANALYZE), with no improvement
  • I’ve checked for locks, (log_lock_waits) and I don’t see anything.
  • The query is performed in a loop (from a Python app). It performs ~5000 queries, and at some point some of them, they don’t seem to follow a pattern, take a huge time to complete.
  • I’ve tried running them in batches, but again, some batches randomly take too long.
  • The size of the table is kinda big, ~10000000 rows and ~25 indexes.

The query:

UPDATE "my_table" SET "match_request_id" = 'c607789f-4816-4a38-844b-173fa7bf64ed'::uuid WHERE "my_table"."id" = 129624354;


 Update on public.my_table  (cost=0.56..8.58 rows=1 width=832) (actual time=34106.965..34106.966 rows=0 loops=1)
   Buffers: shared hit=431280 read=27724
   I/O Timings: read=32469.021
   ->  Index Scan using my_table_pkey on public.my_table  (cost=0.56..8.58 rows=1 width=832) (actual time=0.100..0.105 rows=1 loops=1)
         Output: (...)
         Index Cond: (my_table.id = 130561719)
         Buffers: shared hit=7
 Planning Time: 23.872 ms
 Execution Time: 34107.047 ms

Note that this is EXPLAIN ANALYZE. I’m baffled since while the cost is really low, the actual running time is huge!

I’m trying to understand if this is expected and if I can improve the situation somehow. Any ideas would be welcome, I’m kinda running out!

EDIT: Adding some more info requested by comments:

A query plan for a "normal" update

 Update on public.my_table  (cost=0.43..8.45 rows=1 width=837) (actual time=2.037..2.037 rows=0 loops=1)
   Buffers: shared hit=152 read=1
   I/O Timings: read=1.225
   ->  Index Scan using my_table_pkey on public.my_table  (cost=0.43..8.45 rows=1 width=837) (actual time=0.024..0.026 rows=1 loops=1)
         Output: (...)
         Index Cond: (my_table.id = 129624354)
         Buffers: shared hit=4
 Planning Time: 1.170 ms
 Execution Time: 2.133 ms
(9 rows)

The table has 23 indexes, 6 foreign-key constraints. 3 BRIN, 1 GIN. The rest B-Tree. I’m not sure how to check for fastupdate on the GIN index, the output of \d+ index_name is

 Column | Type | Key? | Definition | Storage  | Stats target
 search | text | yes  | search     | extended |
gin, for table "public.my_table"

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

GIN indexes have a "fastupdate" mechanism where new data gets written to a specific section in a linear fashion. Once that exceeds some set size (gin_pending_list_limit, which is set globally but can be overridden per index) the next process which happens along to write to the index gets assigned the task of consolidating those entries into the main part of the index, which can lead to long freezes for that process. For the length of freezing you are seeing, you must have a pretty high setting for gin_pending_list_limit.

If consistent latency is more important to you than overall insert/update performance, you can disable fastupdate. Or you could lower the value of gin_pending_list_limit to keep some of the benefit while lowering the freeze-up time.

If you need both consistency and overall efficiency, you could write a manual process/thread which will open its own connection and run select gin_clean_pending_list(...) on the index in a loop while the main update operation is ongoing. That way you get the benefit of the pending list, while the latency of cleaning it up is all loaded onto a background process where it doesn’t matter. (Vacuum and autovacuum will also clean it up, but you can’t really get them to run often enough to do a sufficient job of it, because of all the other work they also need to do. Hence gin_clean_pending_list)

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply