Adding a WHERE clause slows down query

All we need is an easy explanation of the problem, so here it is.

We run a database on

PostgreSQL 12.6 on x86_64-pc-linux-gnu, compiled by Debian clang version 10.0.1, 64-bit

We are trying to get the top rows of a result set where the most recent entry has a status of ‘Failed’.

Main table is product_tracking:

                                          Table "public.product_tracking"
    Column     |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
---------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
 product_id    | character varying(512)   |           | not null |         | extended |              |
 delivery_name | character varying(512)   |           | not null |         | extended |              |
 feed_gid      | uuid                     |           | not null |         | plain    |              |
 status        | status                   |           | not null |         | plain    |              |
 updated_at    | timestamp with time zone |           |          | now()   | plain    |              |
 errors        | text                     |           |          |         | extended |              |
 created_at    | timestamp with time zone |           |          |         | plain    |              |
Indexes:
    "product_tracking_pkey" PRIMARY KEY, btree (product_id, delivery_name, feed_gid)
    "product_tracking_created_at_idx" btree (created_at)
    "product_tracking_created_at_product_idx" btree (created_at, product_id, delivery_name, feed_gid)
    "product_tracking_delivery_name_idx" btree (delivery_name)
    "product_tracking_feed_gid_idx" btree (feed_gid)
    "product_tracking_product_id_idx" btree (product_id)

We most recently added the index product_tracking_created_at_product_idx in hopes that this would speed up the ordering of the new query.

Original query and plan:

explain analyze
WITH required_feeds AS (
    SELECT gid, name feed_name
    FROM feeds
)
SELECT product_id, feed_gid, errors
FROM product_tracking pt1
         INNER JOIN required_feeds f ON (pt1.feed_gid = f.gid)
WHERE updated_at = (SELECT MAX(updated_at)
                    FROM product_tracking pt2
                    WHERE pt1.product_id = pt2.product_id
                      AND pt1.feed_gid = pt2.feed_gid)
  AND status = 'Failed'
ORDER BY created_at desc, product_id desc, delivery_name desc, feed_gid desc
LIMIT 100
Limit  (cost=0.56..23433928.19 rows=100 width=138) (actual time=0.245..4.107 rows=100 loops=1)
  ->  Nested Loop  (cost=0.56..111311156.82 rows=475 width=138) (actual time=0.244..4.092 rows=100 loops=1)
        Join Filter: (pt1.feed_gid = feeds.gid)
        Rows Removed by Join Filter: 4888
        ->  Index Scan Backward using product_tracking_created_at_product_idx on product_tracking pt1  (cost=0.56..111309426.64 rows=475 width=138) (actual time=0.210..3.190 rows=100 loops=1)
              Filter: ((status = 'Failed'::status) AND (updated_at = (SubPlan 1)))
              Rows Removed by Filter: 1515
              SubPlan 1
                ->  Aggregate  (cost=8.65..8.66 rows=1 width=8) (actual time=0.017..0.017 rows=1 loops=100)
                      ->  Index Scan using product_tracking_pkey on product_tracking pt2  (cost=0.56..8.65 rows=1 width=8) (actual time=0.013..0.016 rows=1 loops=100)
                            Index Cond: (((product_id)::text = (pt1.product_id)::text) AND (feed_gid = pt1.feed_gid))
        ->  Materialize  (cost=0.00..6.64 rows=243 width=16) (actual time=0.000..0.004 rows=50 loops=100)
              ->  Seq Scan on feeds  (cost=0.00..5.43 rows=243 width=16) (actual time=0.009..0.054 rows=234 loops=1)
Planning Time: 1.855 ms
Execution Time: 4.238 ms

With a filter on required_feeds we get an execution plan that looks like this. I’m using feed1 and feed2 as an example, in reality these feeds would change.

explain analyze
WITH required_feeds AS (
    SELECT gid, name feed_name
    FROM feeds
    WHERE name in ('feed1', 'feed2')
)
SELECT product_id, feed_gid, errors
FROM product_tracking pt1
         INNER JOIN required_feeds f ON (pt1.feed_gid = f.gid)
WHERE updated_at = (SELECT MAX(updated_at)
                    FROM product_tracking pt2
                    WHERE pt1.product_id = pt2.product_id
                      AND pt1.feed_gid = pt2.feed_gid)
  AND status = 'Failed'
ORDER BY created_at desc, product_id desc, delivery_name desc, feed_gid desc
LIMIT 100
Limit  (cost=1485702.73..1485702.74 rows=4 width=138) (actual time=2265.019..2265.047 rows=100 loops=1)
  ->  Sort  (cost=1485702.73..1485702.74 rows=4 width=138) (actual time=2265.018..2265.034 rows=100 loops=1)
        Sort Key: pt1.created_at DESC, pt1.product_id DESC, pt1.delivery_name DESC, pt1.feed_gid DESC
        Sort Method: top-N heapsort  Memory: 72kB
        ->  Nested Loop  (cost=2183.11..1485702.69 rows=4 width=138) (actual time=269.623..2260.551 rows=13814 loops=1)
              ->  Seq Scan on feeds  (cost=0.00..6.04 rows=2 width=16) (actual time=0.013..0.069 rows=2 loops=1)
                    Filter: (name = ANY {pkinteractive,iok}'::text[]))
                    Rows Removed by Filter: 243
              ->  Bitmap Heap Scan on product_tracking pt1  (cost=2183.11..742848.30 rows=3 width=138) (actual time=142.994..1127.950 rows=6907 loops=2)
                    Recheck Cond: (feed_gid = feeds.gid)
                    Rows Removed by Index Recheck: 2814781
                    Filter: ((status = 'Failed'::status) AND (updated_at = (SubPlan 1)))
                    Rows Removed by Filter: 1751784
                    Heap Blocks: exact=99595 lossy=98823
                    ->  Bitmap Index Scan on product_tracking_feed_gid_idx  (cost=0.00..2183.11 rows=71806 width=0) (actual time=124.666..124.666 rows=1799676 loops=2)
                          Index Cond: (feed_gid = feeds.gid)
                    SubPlan 1
                      ->  Aggregate  (cost=8.65..8.66 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=16149)
                            ->  Index Scan using product_tracking_pkey on product_tracking pt2  (cost=0.56..8.65 rows=1 width=8) (actual time=0.012..0.012 rows=2 loops=16149)
                                  Index Cond: (((product_id)::text = (pt1.product_id)::text) AND (feed_gid = pt1.feed_gid))
Planning Time: 1.840 ms
Execution Time: 2265.242 ms

Possibly useful statistics:

SELECT count(*) AS row_count,
       avg(length(product_id)) AS avg_prod_len,
       avg(length(delivery_name)) AS avg_delivery_len,
       count(*) FILTER (WHERE status = 'Failed') AS ct_failed,
       count(DISTINCT (product_id)) AS distinct_product_id,
       count(DISTINCT (delivery_name)) AS distinct_delivery_name,
       count(DISTINCT (feed_gid)) AS distinct_feed_gid,
       count(DISTINCT (status)) AS distinct_status,
       count(DISTINCT (updated_at)) AS distinct_updated_at,
       count(DISTINCT (errors)) AS distinct_errors,
       count(DISTINCT (created_at)) AS distinct_created_at
FROM product_tracking;

row_count = 11601030
avg_prod_len = 12.48
avg_delivery_len = 17.298
ct_failed = 74881
distinct_product_id = 8638613
distinct_delivery_name = 7315794
distinct_feed_gid = 245
distinct_status = 3
distinct_updated_at = 9096954
distinct_errors = 16664
distinct_created_at = 8772269

Table feeds has only 245 rows with all values currently unique.

What’s causing the slow down and what suggestions do you have?

Additional comments:

feed_gid sadly does have to be a UUID, this is a mapping from a downstream system we do not own. status can be ‘Failed’, ‘Pending’ or ‘Succeeded’, so a boolean is not quite possible. There are several transactions against this table every minute throughout the day so reordering or adapting the table design is out of the question for the short term.


Thanks, Erwin. Due to your input I managed to change my query to:

SELECT * FROM product_tracking pt1
JOIN feeds f ON (f.gid = feed_gid)
WHERE updated_at = (SELECT MAX(updated_at)
                    FROM product_tracking pt2
                    WHERE pt1.feed_gid IN (
                        SELECT gid
                        from feeds
                        where name IN ('feed1')
                    )
                      AND pt1.product_id = pt2.product_id
                      AND pt1.feed_gid = pt2.feed_gid
)
  AND status = 'Failed'
ORDER BY created_at desc, product_id desc, delivery_name desc, feed_gid desc
LIMIT 100

Limit  (cost=0.56..23431662.26 rows=100 width=176) (actual time=1599.557..4797.615 rows=4 loops=1)
  Buffers: shared hit=12319324
  ->  Nested Loop  (cost=0.56..112003343.48 rows=478 width=176) (actual time=1599.556..4797.610 rows=4 loops=1)
        Join Filter: (pt1.feed_gid = f.gid)
        Rows Removed by Join Filter: 432
        Buffers: shared hit=12319324
        ->  Index Scan Backward using product_tracking_created_at_product_idx on product_tracking pt1  (cost=0.56..112001602.36 rows=478 width=150) (actual time=1599.483..4797.469 rows=4 loops=1)
              Filter: ((status = 'Failed'::status) AND (updated_at = (SubPlan 2)))
              Rows Removed by Filter: 12781499
              Buffers: shared hit=12319323
              SubPlan 2
                ->  Aggregate  (cost=8.65..8.66 rows=1 width=8) (actual time=0.000..0.001 rows=1 loops=96451)
                      Buffers: shared hit=227
                      ->  Result  (cost=0.56..8.65 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=96451)
                            One-Time Filter: (hashed SubPlan 1)
                            Buffers: shared hit=227
                            ->  Index Scan using product_tracking_pkey on product_tracking pt2  (cost=0.56..8.65 rows=1 width=8) (actual time=0.027..0.030 rows=3 loops=31)
                                  Index Cond: (((product_id)::text = (pt1.product_id)::text) AND (feed_gid = pt1.feed_gid))
                                  Buffers: shared hit=224
                            SubPlan 1
                              ->  Seq Scan on feeds  (cost=0.00..6.04 rows=1 width=16) (actual time=0.018..0.035 rows=1 loops=1)
                                    Filter: (name = 'absolute'::text)
                                    Rows Removed by Filter: 244
                                    Buffers: shared hit=3
        ->  Materialize  (cost=0.00..6.64 rows=243 width=26) (actual time=0.007..0.020 rows=109 loops=4)
              Buffers: shared hit=1
              ->  Seq Scan on feeds f  (cost=0.00..5.43 rows=243 width=26) (actual time=0.021..0.031 rows=109 loops=1)
                    Buffers: shared hit=1
Planning Time: 0.583 ms
Execution Time: 4797.695 ms

This causes queries to be super fast if the index doesn’t need to scan too far down product_tracking_created_at_product_idx. For a feed that has relatively recent failures, the query is about half a second. For any feed that has queries from over a year ago, the query can take up to 5 seconds (like this shared query).

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

There can be many reasons for the switched query plan. A very bad plan typically indicates inaccurate column statistics and / or cost constants. It starts with the bad estimate for the index scan on product_tracking_feed_gid_idx:

(cost=0.00..2183.11 rows=71806 width=0) (actual time=124.666..124.666 rows=1799676 loops=2)

Produces many more rows than Postgres expected. And it does not stop there. There are countless related answers about ANALYZE and statistics and cost settings. Here is a starter:

On top of that, the combination of filter on one table, join to another, filter some more (getting max for one column), order by other columns and then LIMIT is notoriously hard to plan. Depending on details of data distribution very different query plans can be preferable. With inaccurate cost settings and column statistics (especially with uneven data distribution) this can go south quickly. Related:

Beware of cached query plans for tricky queries like this one, like for prepared statements or queries nested in PL/pgSQL functions. Normally, Postgres wouldn’t fall for that, but if the first couple of calls work fine with the same query plan, it might. (Dynamic SQL with EXECUTE can be a workaround for PL/pgSQL.)

Faster query

After seeing the value frequencies you added, I expect this equivalent query to perform much better:

SELECT product_id, feed_gid, errors
FROM   product_tracking  pt1
WHERE  status = 'Failed'  -- very selective (< 1% qualify)
AND    feed_gid IN (SELECT gid FROM feeds WHERE name IN ('feed1', 'feed2'))  -- also selective
AND    NOT EXISTS (  -- no later update
   SELECT FROM product_tracking pt2
   WHERE  pt2.product_id = pt1.product_id
   AND    pt2.feed_gid   = pt1.feed_gid
   AND    pt2.updated_at > pt1.updated_at
   )
ORDER  BY created_at DESC, product_id DESC, delivery_name DESC, feed_gid DESC
LIMIT  100;

I removed the pointless CTE.

The NOT EXISTS is constructed to allow multiple rows with the same updated_at to qualify, like your original.

Crucially, add this (small in size!) multicolumn partial index to make it fast:

CREATE INDEX product_tracking_failed_idx ON product_tracking (created_at DESC, product_id DESC, delivery_name DESC, feed_gid DESC)
WHERE status = 'Failed';

It replaces your existing index product_tracking_created_at_product_idx for the purpose of this query (also tailored to the outer ORDER BY clause), but it’s much smaller and faster for the purpose.

Maybe this simpler index will do the job, too. (Test it!):

CREATE INDEX product_tracking_failed_idx ON product_tracking (created_at DESC)
WHERE status = 'Failed';

And this index to support NOT EXISTS:

CREATE INDEX product_tracking_feed_gid_product_id_updated_at_idx
ON product_tracking (feed_gid, product_id, updated_at DESC NULLS LAST);

I see additional potential in optimizing column types, column order and indexes of your table.
If updated_at and created_at are never NULL, they should be declared NOT NULL.

As for:

status can be Failed, Pending or Succeeded so a boolean is not quite possible

Possible, too:

succeeded boolean
   true - 'Succeeded'
   false - 'Failed'
   NULL - 'Pending'

1 byte instead of 7 – 10 bytes, faster and simpler in queries.

But that’s beyond the scope of this question – and not an option anyway, apparently.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply