Why is my nested loop taking so much time?

All we need is an easy explanation of the problem, so here it is.

I have the following request. In a nutshell, the wallets_history represents the transactions that happen in the wallet. And the goal is to get 1 point every in_interval from in_from to in_to.

They will eventually be displayed as a histogram to the user. (I simplified the request to make it easier to read, this doesn’t impact the issue.)

CREATE OR REPLACE FUNCTION get_wallets_histogram(
  in_wallet_id TEXT, 
  in_code TEXT, 
  in_asset_id TEXT, 
  in_from TIMESTAMPTZ, 
  in_to TIMESTAMPTZ, 
  in_interval INTERVAL)
RETURNS TABLE(date TIMESTAMPTZ, balance_change BIGINT) AS $$
  SELECT s AS time,
        change::bigint
  FROM generate_series(in_from,
                       in_to - in_interval,
                       in_interval) s
  LEFT JOIN LATERAL (
      SELECT s, COALESCE(SUM(balance_change),0) change FROM wallets_history
      WHERE  s <= seen_at AND seen_at < s + in_interval
  ) q USING (s)
$$ LANGUAGE SQL STABLE;

Then I run this: (Note that the interval is 16 days, so there is only a single row in the lateral request for simplified example)

SELECT * FROM get_wallets_histogram(
  'WHALE', 
  'BTC',  
  '',  
  '2022-01-01'::timestamptz,  
  '2022-02-01'::timestamptz,  
  interval '16 day');

The query plan is

 Nested Loop Left Join  (cost=2526.62..2526662.88 rows=1000 width=16) (actual time=109.301..109.303 rows=1 loops=1)
   Output: s.s, (q.change)::bigint
   Inner Unique: true
   ->  Function Scan on pg_catalog.generate_series s  (cost=0.01..10.01 rows=1000 width=8) (actual time=104.854..104.855 rows=1 loops=1)
         Output: s.s
         Function Call: generate_series('2022-01-01 00:00:00+00'::timestamp with time zone, ('2022-02-01 00:00:00+00'::timestamp with time zone - '16 days'::interval), '16 days'::interval)
   ->  Subquery Scan on q  (cost=2526.62..2526.64 rows=1 width=40) (actual time=4.442..4.443 rows=1 loops=1)
         Output: q.s, q.change
         Filter: (s.s = q.s)
         ->  Aggregate  (cost=2526.62..2526.63 rows=1 width=40) (actual time=4.440..4.441 rows=1 loops=1)
               Output: s.s, COALESCE(sum(wallets_history.balance_change), '0'::numeric)
               ->  Bitmap Heap Scan on public.wallets_history  (cost=27.85..2523.83 rows=1115 width=5) (actual time=0.842..2.570 rows=23040 loops=1)
                     Output: wallets_history.wallet_id, wallets_history.code, wallets_history.asset_id, wallets_history.tx_id, wallets_history.seen_at, wallets_history.blk_height, wallets_history.blk_idx, wallets_history.balance_change,
wallets_history.balance_total, wallets_history.nth
                     Recheck Cond: ((s.s <= wallets_history.seen_at) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
                     Heap Blocks: exact=421
                     ->  Bitmap Index Scan on wallets_history_by_seen_at  (cost=0.00..27.57 rows=1115 width=0) (actual time=0.807..0.807 rows=23040 loops=1)
                           Index Cond: ((wallets_history.seen_at >= s.s) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
 Planning Time: 0.443 ms
 JIT:
   Functions: 13
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 2.408 ms, Inlining 9.035 ms, Optimization 59.083 ms, Emission 36.578 ms, Total 107.104 ms
 Execution Time: 111.823 ms

If I read properly the query plan, the first row generated by generate_series doesn’t happen before 104ms into the request!

What I don’t understand though is why the lateral requests seems to have a startup time BEFORE the startup time of the generate_series?

I don’t manage to find the bottleneck in this request. There is a huge gap between when generate_series startup time begin, and when the bitmap heap scan of the inner later query finish.

Strangest thing is that when I try to execute the lateral request manually from outside the function. It performs fast. However, as soon as I left join it to the generate_series (even if there is only 1 timestamp emitted), it takes more than 100ms!

There isn’t any node that is obviously making things slow.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Summary of the fix

I found out that the jit was taking most of the time in the query, then I tried to track down why JIT was needed for my simple query.

It turns out that generate_series for timestamps confuse the planner: the planner always estimates it returns 1000 rows.
So I created a generate_series_fixed below which add a LIMIT to generate_series, then use this as a drop in replacement instead.

The rest of this answer is the history of my findings.
Here is the drop in replacement if you need it.

-- Technically this LIMIT clause is useless. However, without it the query planner
-- is unable to correctly estimate the numbers of row in generate_series
-- which cause JIT compilation, slowing down the query considerably
-- See https://dba.stackexchange.com/questions/310235/why-is-my-nested-loop-taking-so-much-time/310242#310242
CREATE OR REPLACE FUNCTION generate_series_fixed(in_from TIMESTAMPTZ, in_to TIMESTAMPTZ, in_interval INTERVAL) RETURNS TABLE(s TIMESTAMPTZ) AS $$
  SELECT generate_series(in_from, in_to, in_interval)
  LIMIT  (EXTRACT(EPOCH FROM (in_to - in_from))/EXTRACT(EPOCH FROM in_interval)) + 1; -- I am unsure about the exact formula, but over estimating 1 row is fine...
$$ LANGUAGE SQL STABLE;

So it seems the culprit was… the JIT!
Using set jit = off completely solves the issue.

Plan with the JIT on

 WindowAgg  (cost=2543486.92..2543511.92 rows=1000 width=24) (actual time=305.248..305.252 rows=1 loops=1)
   Output: s.s, (q.change)::bigint, ((sum(q.change) OVER (?) + COALESCE($0, '0'::numeric)))::bigint
   InitPlan 1 (returns $0)
     ->  Limit  (cost=8414.21..8414.21 rows=1 width=23) (actual time=29.647..29.648 rows=1 loops=1)
           Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
           ->  Sort  (cost=8414.21..8662.97 rows=99505 width=23) (actual time=29.646..29.646 rows=1 loops=1)
                 Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
                 Sort Key: wallets_history_1.seen_at DESC, wallets_history_1.blk_height DESC, wallets_history_1.blk_idx DESC
                 Sort Method: top-N heapsort  Memory: 25kB
                 ->  Bitmap Heap Scan on public.wallets_history wallets_history_1  (cost=1871.58..7916.68 rows=99505 width=23) (actual time=3.425..19.254 rows=98750 loops=1)
                       Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
                       Recheck Cond: (wallets_history_1.seen_at < '2022-01-01 00:00:00+00'::timestamp with time zone)
                       Filter: ((wallets_history_1.wallet_id = 'WHALE'::text) AND (wallets_history_1.code = 'BTC'::text) AND (wallets_history_1.asset_id = ''::text))
                       Heap Blocks: exact=1800
                       ->  Bitmap Index Scan on wallets_history_by_seen_at  (cost=0.00..1846.71 rows=99505 width=0) (actual time=3.272..3.272 rows=98750 loops=1)
                             Index Cond: (wallets_history_1.seen_at < '2022-01-01 00:00:00+00'::timestamp with time zone)
   ->  Sort  (cost=2535072.71..2535075.21 rows=1000 width=40) (actual time=275.590..275.592 rows=1 loops=1)
         Output: s.s, q.change
         Sort Key: s.s
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop Left Join  (cost=2534.99..2535022.88 rows=1000 width=40) (actual time=275.557..275.559 rows=1 loops=1)
               Output: s.s, q.change
               Inner Unique: true
               ->  Function Scan on pg_catalog.generate_series s  (cost=0.01..10.01 rows=1000 width=8) (actual time=268.958..268.959 rows=1 loops=1)
                     Output: s.s
                     Function Call: generate_series('2022-01-01 00:00:00+00'::timestamp with time zone, ('2022-02-01 00:00:00+00'::timestamp with time zone - '16 days'::interval), '16 days'::interval)
               ->  Subquery Scan on q  (cost=2534.98..2535.00 rows=1 width=40) (actual time=6.586..6.587 rows=1 loops=1)
                     Output: q.s, q.change
                     Filter: (s.s = q.s)
                     ->  Aggregate  (cost=2534.98..2534.99 rows=1 width=40) (actual time=6.583..6.584 rows=1 loops=1)
                           Output: s.s, COALESCE(sum(wallets_history.balance_change), '0'::numeric)
                           ->  Bitmap Heap Scan on public.wallets_history  (cost=27.85..2532.19 rows=1115 width=5) (actual time=0.901..4.737 rows=23040 loops=1)
                                 Output: wallets_history.wallet_id, wallets_history.code, wallets_history.asset_id, wallets_history.tx_id, wallets_history.seen_at, wallets_history.blk_height, wallets_history.blk_idx, wallets_history.bala
nce_change, wallets_history.balance_total, wallets_history.nth
                                 Recheck Cond: ((s.s <= wallets_history.seen_at) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
                                 Filter: ((wallets_history.wallet_id = 'WHALE'::text) AND (wallets_history.code = 'BTC'::text) AND (wallets_history.asset_id = ''::text))
                                 Heap Blocks: exact=421
                                 ->  Bitmap Index Scan on wallets_history_by_seen_at  (cost=0.00..27.57 rows=1115 width=0) (actual time=0.864..0.864 rows=23040 loops=1)
                                       Index Cond: ((wallets_history.seen_at >= s.s) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
 Planning Time: 2.333 ms
 JIT:
   Functions: 27
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 2.050 ms, Inlining 40.980 ms, Optimization 127.174 ms, Emission 100.448 ms, Total 270.652 ms
 Execution Time: 334.052 ms
(44 rows)

Plan with the JIT off

 WindowAgg  (cost=2543486.92..2543511.92 rows=1000 width=24) (actual time=39.259..39.264 rows=1 loops=1)
   Output: s.s, (q.change)::bigint, ((sum(q.change) OVER (?) + COALESCE($0, '0'::numeric)))::bigint
   InitPlan 1 (returns $0)
     ->  Limit  (cost=8414.21..8414.21 rows=1 width=23) (actual time=32.375..32.376 rows=1 loops=1)
           Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
           ->  Sort  (cost=8414.21..8662.97 rows=99505 width=23) (actual time=32.374..32.374 rows=1 loops=1)
                 Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
                 Sort Key: wallets_history_1.seen_at DESC, wallets_history_1.blk_height DESC, wallets_history_1.blk_idx DESC
                 Sort Method: top-N heapsort  Memory: 25kB
                 ->  Bitmap Heap Scan on public.wallets_history wallets_history_1  (cost=1871.58..7916.68 rows=99505 width=23) (actual time=3.147..21.561 rows=98750 loops=1)
                       Output: wallets_history_1.balance_total, wallets_history_1.seen_at, wallets_history_1.blk_height, wallets_history_1.blk_idx
                       Recheck Cond: (wallets_history_1.seen_at < '2022-01-01 00:00:00+00'::timestamp with time zone)
                       Filter: ((wallets_history_1.wallet_id = 'WHALE'::text) AND (wallets_history_1.code = 'BTC'::text) AND (wallets_history_1.asset_id = ''::text))
                       Heap Blocks: exact=1800
                       ->  Bitmap Index Scan on wallets_history_by_seen_at  (cost=0.00..1846.71 rows=99505 width=0) (actual time=3.005..3.005 rows=98750 loops=1)
                             Index Cond: (wallets_history_1.seen_at < '2022-01-01 00:00:00+00'::timestamp with time zone)
   ->  Sort  (cost=2535072.71..2535075.21 rows=1000 width=40) (actual time=6.875..6.877 rows=1 loops=1)
         Output: s.s, q.change
         Sort Key: s.s
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop Left Join  (cost=2534.99..2535022.88 rows=1000 width=40) (actual time=6.870..6.873 rows=1 loops=1)
               Output: s.s, q.change
               Inner Unique: true
               ->  Function Scan on pg_catalog.generate_series s  (cost=0.01..10.01 rows=1000 width=8) (actual time=0.023..0.023 rows=1 loops=1)
                     Output: s.s
                     Function Call: generate_series('2022-01-01 00:00:00+00'::timestamp with time zone, ('2022-02-01 00:00:00+00'::timestamp with time zone - '16 days'::interval), '16 days'::interval)
               ->  Subquery Scan on q  (cost=2534.98..2535.00 rows=1 width=40) (actual time=6.846..6.848 rows=1 loops=1)
                     Output: q.s, q.change
                     Filter: (s.s = q.s)
                     ->  Aggregate  (cost=2534.98..2534.99 rows=1 width=40) (actual time=6.845..6.846 rows=1 loops=1)
                           Output: s.s, COALESCE(sum(wallets_history.balance_change), '0'::numeric)
                           ->  Bitmap Heap Scan on public.wallets_history  (cost=27.85..2532.19 rows=1115 width=5) (actual time=1.324..4.837 rows=23040 loops=1)
                                 Output: wallets_history.wallet_id, wallets_history.code, wallets_history.asset_id, wallets_history.tx_id, wallets_history.seen_at, wallets_history.blk_height, wallets_history.blk_idx, wallets_history.bala
nce_change, wallets_history.balance_total, wallets_history.nth
                                 Recheck Cond: ((s.s <= wallets_history.seen_at) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
                                 Filter: ((wallets_history.wallet_id = 'WHALE'::text) AND (wallets_history.code = 'BTC'::text) AND (wallets_history.asset_id = ''::text))
                                 Heap Blocks: exact=421
                                 ->  Bitmap Index Scan on wallets_history_by_seen_at  (cost=0.00..27.57 rows=1115 width=0) (actual time=1.290..1.290 rows=23040 loops=1)
                                       Index Cond: ((wallets_history.seen_at >= s.s) AND (wallets_history.seen_at < (s.s + '16 days'::interval)))
 Planning Time: 0.451 ms
 Execution Time: 39.424 ms
(40 rows)

Tested on Postgres v12.9.
This come to a surprise. As wallet_history is a materialized view, the query didn’t seem too complex to me to require JIT.

I tried again also on Postgres 14.2, same problem.

I will keep posting here if I find the root cause about why the JIT is causing problem. I am marking this question solved meanwhile.

EDIT: I learned the JIT kick off if the cost of the query is high enough (100000 by default)

Looking at the query plan, it seems the Nested loop Left Join cost 2500000, pushing it above the threshold.
I don’t know yet why the nested loop left join is so expensive.
It may be related with the fact that the number of planned row of the generate_serie is really wrong… digging.

Here is another question that seems related Left join with generate_series misleads the planner

EDIT 2:

I found a hack which work… I am basically using a subquery for generate_serie + a LIMIT clause to help the query planner.

For example, in my non-simplified query:

CREATE OR REPLACE FUNCTION get_wallets_histogram(in_wallet_id TEXT, in_code TEXT, in_asset_id TEXT, in_from TIMESTAMPTZ, in_to TIMESTAMPTZ, in_interval INTERVAL)
RETURNS TABLE(date TIMESTAMPTZ, balance_change BIGINT, balance BIGINT) AS $$
  SELECT s AS time,
        change::bigint,
        (SUM (q.change) OVER (ORDER BY s) + COALESCE((SELECT balance_total FROM wallets_history WHERE seen_at < in_from AND wallet_id=in_wallet_id AND code=in_code AND asset_id=in_asset_id ORDER BY seen_at DESC, blk_height DESC, blk_idx DESC LIMIT 1), 0))::BIGINT  AS balance
  FROM (-- Technically this LIMIT clause is useless. However, without it the query planner
        -- is unable to correctly estimate the numbers of row in generate_series
        -- which cause JIT compilation, slowing down the query considerably
        SELECT * FROM generate_series(in_from,
        in_to - in_interval,
        in_interval) s LIMIT (EXTRACT(EPOCH FROM (in_to - in_from))/EXTRACT(EPOCH FROM in_interval))) s
  LEFT JOIN LATERAL (
      SELECT s, COALESCE(SUM(balance_change),0) change FROM wallets_history
      WHERE  s <= seen_at AND seen_at < s + in_interval AND wallet_id=in_wallet_id AND code=in_code AND asset_id=in_asset_id
  ) q USING (s)
$$ LANGUAGE SQL STABLE;

EDIT 3:

In case somebody get into the same issue, here is a generate_series_fixed function that you can use in place of the original generate_series.

-- Technically this LIMIT clause is useless. However, without it the query planner
-- is unable to correctly estimate the numbers of row in generate_series
-- which cause JIT compilation, slowing down the query considerably
-- See https://dba.stackexchange.com/questions/310235/why-is-my-nested-loop-taking-so-much-time/310242#310242
CREATE OR REPLACE FUNCTION generate_series_fixed(in_from TIMESTAMPTZ, in_to TIMESTAMPTZ, in_interval INTERVAL) RETURNS TABLE(s TIMESTAMPTZ) AS $$
  SELECT generate_series(in_from, in_to, in_interval)
  LIMIT  (EXTRACT(EPOCH FROM (in_to - in_from))/EXTRACT(EPOCH FROM in_interval)) + 1; -- I am unsure about the exact formula, but over estimating 1 row is fine...
$$ LANGUAGE SQL STABLE;

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply