Postgres query with large IN, and joining on a temporary table doesn't seem to work

All we need is an easy explanation of the problem, so here it is.

EDIT: The query plans in the question body come from EXPLAIN, but as @jjanes suggested, EXPLAIN (ANALYZE, BUFFERS) may be more useful. Since the output is very large I have uploaded them here: https://gist.github.com/vr2262/ab3cfb69ac758b5161e27d9cb77ad05f

I have a query that selects records from a table by an indexed bigint column using WHEREIN. Up to a certain number (7835, as it happens), the query is fast (about 150 milliseconds for sequential IDs, and about 1 second for random IDs), but adding a single one more results in a different query plan, and the query takes around 150 seconds. I looked around for other answers, and the solution suggested in https://dba.stackexchange.com/a/91254 (and elsewhere) is to insert the values into an indexed temporary table and join on it. However, that actually made it a little slower.

Here is the original query:

SELECT
  my_table.id AS my_table_id,
  my_table.joined_table_2_id AS my_table_joined_table_2_id,
  my_table.big_where_id AS my_table_big_where_id,
  ST_AsGeoJSON(my_table.geog) AS unrelated_geog,
  joined_table_1.id AS joined_table_1_id,
FROM
  my_table
LEFT OUTER JOIN
  joined_table_a AS joined_table_1 ON my_table.id = joined_table_1.my_table_id
LEFT OUTER JOIN
  joined_table_b AS joined_table_2 ON joined_table_2.id = my_table.joined_table_2_id
WHERE
  my_table.joined_table_2_id = 1
  AND my_table.big_where_id IN (1, 2, 3, ..., 7835);

… and the associated fast query plan:

Gather  (cost=36576.06..15864926.71 rows=44743 width=139)
  Workers Planned: 2
  ->  Hash Left Join  (cost=35576.06..15859452.41 rows=18643 width=139)
        Hash Cond: (my_table.joined_table_2_id = joined_table_2.id)
        ->  Nested Loop Left Join  (cost=35574.99..15854534.26 rows=18643 width=246)
              ->  Parallel Bitmap Heap Scan on my_table  (cost=35574.42..89742.05 rows=2845 width=201)
                    Recheck Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[])))
                    ->  Bitmap Index Scan on my_table_joined_table_2_id_big_where_id_key  (cost=0.00..35572.71 rows=6829 width=0)
                          Index Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[])))
              ->  Index Scan using ix_joined_table_a_my_table_id on joined_table_a joined_table_1  (cost=0.57..5512.89 rows=2834 width=53)
                    Index Cond: (my_table_id = my_table.id)
        ->  Hash  (cost=1.05..1.05 rows=1 width=14)
              ->  Seq Scan on joined_table_b joined_table_2  (cost=0.00..1.05 rows=1 width=14)
                    Filter: (id = 1)

With one additional value of big_where_id, the query plan changes to:

Hash Left Join  (cost=50982.39..15870462.06 rows=44750 width=139)
  Hash Cond: (my_table.joined_table_2_id = joined_table_2.id)
  ->  Hash Right Join  (cost=50981.33..15858658.19 rows=44750 width=246)
        Hash Cond: (joined_table_1.my_table_id = my_table.id)
        ->  Seq Scan on joined_table_a joined_table_1  (cost=0.00..14184914.72 rows=618195072 width=53)
        ->  Hash  (cost=50895.95..50895.95 rows=6830 width=201)
              ->  Index Scan using my_table_joined_table_2_id_big_where_id_key on my_table  (cost=0.57..50895.95 rows=6830 width=201)
                    Index Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[])))
  ->  Hash  (cost=1.05..1.05 rows=1 width=14)
        ->  Seq Scan on joined_table_b joined_table_2  (cost=0.00..1.05 rows=1 width=14)
              Filter: (id = 1)

I tried using a temporary table like so:

CREATE TEMPORARY TABLE temp_table (id INTEGER PRIMARY KEY);
INSERT INTO temp_table (id) SELECT generate_series(1, 7836);
SELECT
  my_table.id AS my_table_id,
  my_table.joined_table_2_id AS my_table_joined_table_2_id,
  my_table.big_where_id AS my_table_big_where_id,
  ST_AsGeoJSON(my_table.geog) AS unrelated_geog,
  joined_table_1.id AS joined_table_1_id,
FROM
  my_table
LEFT OUTER JOIN
  joined_table_a AS joined_table_1 ON my_table.id = joined_table_1.my_table_id
LEFT OUTER JOIN
  joined_table_b AS joined_table_2 ON joined_table_2.id = my_table.joined_table_2_id
JOIN
  temp_table ON my_table.big_where_id = temp_table.id
WHERE
  my_table.joined_table_2_id = 1;

…but as stated it was a little slower than before. Here is the query plan (using EXPLAIN on the SELECT):

Hash Left Join  (cost=126858.69..28741416.19 rows=138238 width=139)
  Hash Cond: (my_table.joined_table_2_id = joined_table_2.id)
  ->  Hash Right Join  (cost=126857.60..28706108.24 rows=138238 width=246)
        Hash Cond: (joined_table_1.my_table_id = my_table.id)
        ->  Seq Scan on joined_table_a joined_table_1  (cost=0.00..14184914.72 rows=618195072 width=53)
        ->  Hash  (cost=125995.86..125995.86 rows=21099 width=201)
              ->  Nested Loop  (cost=0.57..125995.86 rows=21099 width=201)
                    ->  Seq Scan on temp_table  (cost=0.00..159.75 rows=11475 width=4)
                    ->  Index Scan using ix_my_table_big_where_id on my_table  (cost=0.57..10.95 rows=2 width=201)
                          Index Cond: (big_where_id = temp_table.id)
  ->  Hash  (cost=1.04..1.04 rows=4 width=14)
        ->  Seq Scan on joined_table_b joined_table_2  (cost=0.00..1.04 rows=4 width=14)

Maybe a regular JOIN on the temporary table isn’t right? I haven’t had any luck trying other joins either, though.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

This looks like a case of unfair measurement. You probably ran the same query over and over again, just adding another element onto the IN-list each time. But that means almost all the data needed for the "fast" plan is frequently used and already cached. If you change the parameter tested against joined_table_2_id every run (rather than using ‘1’ all the time), or pick about 7000 random values for the IN list differently for each execution rather than just using the series 1..7NNN, is the fast plan still fast?

If it is still much faster than the alternative even with random parameters, that suggests the setting for random_page_cost is too much higher than seq_page_cost given your storage system. The default settings of those (4 and 1) are generally suited for hard drives, not for SSD.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply