All we need is an easy explanation of the problem, so here it is.
At work, we have a database table we use for queued jobs, so it sees a lot of throughput. One issue we’ve run into is that after a weekend without any code changes, the indexes on this table fill with dead tuples. When we run
VACUUM VERBOSE ANALYZE, this shows up as "600461 dead row versions cannot be removed yet, oldest xmin: 902335252" (see ).
When we looked for what was preventing the vacuum from clearing those, it pointed to a query running
LISTEN, using Postgres’ pubsub feature. This had been running for several days, which I’d think is an expected way to use
So, one way to solve this is to make sure our application servers restart regularly, so they can’t be listening for so long, or have them restart any
LISTENs running for longer than some time period. That said, I was wondering if there was an easy way to handle this with Postgres. Is there some way to configure the transaction to not block vacuuming the dead tuples? Could something be wrong in our application code that causes LISTEN to have this behavior?
"Don’t have long running transactions" makes sense to me as a general principle, but it’s fairly awkward for
LISTEN which is sort of intended to be used as a way to get streaming updates, so I was hoping Postgres might have a nice solution.
=> SELECT * -> FROM pg_stat_activity -> WHERE backend_xmin = '902335252'; -[ RECORD 1 ]----+------------------------------ datid | 16404 datname | company_web_backend pid | 8936 leader_pid | usesysid | 16388 usename | company_web_backend application_name | client_addr | 10.0.1.80 client_hostname | client_port | 56654 backend_start | 2021-07-24 01:21:28.270245+00 xact_start | 2021-07-24 01:21:28.279008+00 query_start | 2021-07-24 01:21:28.279008+00 state_change | 2021-07-24 01:21:28.281313+00 wait_event_type | Client wait_event | ClientWrite state | idle backend_xid | backend_xmin | 902335252 query | LISTEN queued_jobs backend_type | client backend
server=> VACUUM VERBOSE ANALYZE queued_jobs; INFO: vacuuming "public.queued_jobs" INFO: launched 2 parallel vacuum workers for index cleanup (planned: 2) INFO: "queued_jobs": found 0 removable, 5324589 nonremovable row versions in 553064 out of 8685508 pages DETAIL: 600461 dead row versions cannot be removed yet, oldest xmin: 902335252 There were 9061227 unused item identifiers. Skipped 3 pages due to buffer pins, 5619824 frozen pages. 0 pages are entirely empty. CPU: user: 1.42 s, system: 1.25 s, elapsed: 3.10 s. INFO: vacuuming "pg_toast.pg_toast_37823" INFO: "pg_toast_37823": found 0 removable, 414848 nonremovable row versions in 79545 out of 16530493 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 902335252 There were 0 unused item identifiers. Skipped 0 pages due to buffer pins, 14394456 frozen pages. 0 pages are entirely empty. CPU: user: 0.46 s, system: 0.12 s, elapsed: 0.58 s. INFO: analyzing "public.queued_jobs" INFO: "queued_jobs": scanned 30000 of 8685508 pages, containing 311159 live rows and 2103 dead rows; 30000 rows in sample, 90085799 estimated total rows VACUUM
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
The client seems to have issued a LISTEN command, but is not actually listening. That is, it is not reading the data it is being sent. (That is what ClientWrite means–the server is trying to send it more data, but the send buffer is full.) Once it processes the notices (or other data waiting to be sent), the xmin should advance automatically.
Could something be wrong in our application code that causes LISTEN to have this behavior?
Yes, most likely. You can log onto 10.0.1.80 and try to debug. Or you can just kill the backend, and see of anyone calls to complain. But they probably won’t as they probably forgot about whatever they were doing and aren’t paying attention to it anymore. (kind of like what the process itself did)
By the way, this is not in a transaction. Plain "idle" means it is between transactions. If it were in a transaction, it would be "idle in transaction". Although with LISTEN, you can not be in a transaction while still holding the xmin horizon hostage.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂