All we need is an easy explanation of the problem, so here it is.
Autovacuum does not clear the database.
Postgres 10.18, AWS RDS(vCPU 2, RAM 8Gb, SSD(gp2) 1100Gib)
relid | 16556 schemaname | public relname | spree_datafeed_products seq_scan | 20 seq_tup_read | 365522436 idx_scan | 962072108 idx_tup_fetch | 9929276855 n_tup_ins | 2846455 n_tup_upd | 35778058 n_tup_del | 284291955 n_tup_hot_upd | 0 n_live_tup | 3546840 n_dead_tup | 338790851 n_mod_since_analyze | 307930753 last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | 2022-04-29 13:01:43.985749+00 vacuum_count | 0 autovacuum_count | 0 analyze_count | 0 autoanalyze_count | 1
Table and indexes sizes:
indexname | size index_spree_datafeed_products_on_updated_at | 48 GB index_spree_datafeed_products_on_state | 35 GB index_spree_datafeed_products_on_size_variant_field | 40 GB index_spree_datafeed_products_on_product_id | 32 GB index_spree_datafeed_products_on_original_id | 31 GB index_spree_datafeed_products_on_datafeed_id | 42 GB index_spree_datafeed_products_on_datafeed_id_and_original_id | 31 GB index_spree_datafeed_products_on_data_hash | 39 GB spree_datafeed_products_pkey | 18 GB pg_size_pretty - 419 GB
datid | 16404 datname | milanstyle_production pid | 2274 backend_start | 2022-05-01 19:52:00.066097+00 xact_start | 2022-05-01 19:52:00.23692+00 query_start | 2022-05-01 19:52:00.23692+00 state_change | 2022-05-01 19:52:00.236921+00 wait_event_type | wait_event | state | active backend_xid | backend_xmin | 1301636863 query | autovacuum: VACUUM ANALYZE public.spree_datafeed_products backend_type | autovacuum worker
autovacuum on autovacuum_analyze_scale_factor 0.05 autovacuum_analyze_threshold 50 autovacuum_freeze_max_age 200000000 autovacuum_max_workers 3 autovacuum_multixact_freeze_max_age 400000000 autovacuum_naptime 30 autovacuum_vacuum_cost_delay 20 autovacuum_vacuum_cost_limit -1 autovacuum_vacuum_scale_factor 0.1 autovacuum_vacuum_threshold 50
The garbage cleaning script has accumulated a lot of deleted entries. We have been waiting for more than a week (autoclearance). What is the problem? Why is the database failing?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
It will take a very long time to vacuum a table of that size with the those settings (assuming the ones you don’t show are at their defaults).
If the live tuples are really 1% of the dead tuples, the easiest way out of this hole might be a VACUUM FULL of the table.
To avoid getting back into that hole again, you should at least drop autovacuum_vacuum_cost_delay to 2 and increase autovacuum_work_mem to at least 256MB (but I would probably do 1GB).
I would also look through the logs to see if autovacs have a history of getting cancelled before finishing.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂