How do I debug Postgres' join strategy?

All we need is an easy explanation of the problem, so here it is.

In the process of optimizing a Postgres query, I ran into a situation where two equivalent, simple queries execute in very different amounts of time. While I can’t provide a reproducible example for this part, the queries look like:

Query 1, fast:

select a.id
  from a
  join b on a.id = b.id
  where a.id in ('1', '2', '3')

Query 2, slow:

select a.id
  from a
  join b on a.id = b.id
  where b.id in ('1', '2', '3')

The only difference being whether a.id or b.id is referenced in the where clause. (b.id is unique but a.id is not, if it matters.) Both tables are quite large, but the id columns are indexed.

The reason why Query 2 is running slowly is because Postgres is using a hash join. In Query 1, the planner chooses a nested loop, which is obviously much faster with such a selective condition.

Looking at the query plans (provided below) I can see that the predicted selectivities / row counts are relatively accurate – actually more accurate in the slower query. In other words, Postgres is not choosing a hash join strategy because it thinks there are more results for the condition than there are.

So, my question is: how can I debug why Postgres chose this hash join strategy? How do I find out why the query planner doesn’t consider (or overestimates) Query 1’s plan when presented with Query 2? EXPLAIN ANALYZE is great but it doesn’t allow me to see the plans that weren’t used.

Query plans

Query 1 (fast):

QUERY PLAN                                                                                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.99..1528.79 rows=344 width=17) (actual time=0.042..0.210 rows=21 loops=1)                                                                    
  ->  Index Only Scan using a_id_idx on a  (cost=0.56..19.71 rows=344 width=17) (actual time=0.027..0.049 rows=21 loops=1)
        Index Cond: (id = ANY ('{"1","2","3"}'::bpchar[]))                                            
        Heap Fetches: 0                                                                                                                                           
  ->  Index Only Scan using b_pkey on b  (cost=0.43..4.39 rows=1 width=17) (actual time=0.007..0.007 rows=1 loops=21)                   
        Index Cond: (id = (b.id)::bpchar)                                                                                                 
        Heap Fetches: 0                                                                                                                                           
Planning Time: 0.136 ms                                                                                                                                           
Execution Time: 0.227 ms                                                                                                                                          

Query 2 (slow):

QUERY PLAN                                                                                                                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1013.81..292692.84 rows=20 width=17) (actual time=704.293..1847.182 rows=21 loops=1)                                                                                             
  Workers Planned: 2                                                                                                                                                                           
  Workers Launched: 2                                                                                                                                                                          
  ->  Hash Join  (cost=13.81..291690.84 rows=8 width=17) (actual time=669.269..1822.987 rows=7 loops=3)                                                                                        
        Hash Cond: ((a.id)::bpchar = b.id)                                                                                                                             
        ->  Parallel Index Only Scan using a_id_idx on a  (cost=0.43..281065.62 rows=4042599 width=17) (actual time=0.019..1086.251 rows=3238563 loops=3)
              Heap Fetches: 0                                                                                                                                                                  
        ->  Hash  (cost=13.34..13.34 rows=3 width=17) (actual time=0.072..0.073 rows=3 loops=3)                                                                                                
              Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                                                                     
              ->  Index Only Scan using b_pkey on b  (cost=0.43..13.34 rows=3 width=17) (actual time=0.040..0.068 rows=3 loops=3)                                    
                    Index Cond: (id = ANY ('{"1","2","3"}'::bpchar[]))                                                                  
                    Heap Fetches: 0                                                                                                                                                            
Planning Time: 0.391 ms                                                                                                                                                                        
Execution Time: 1847.219 ms

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

How do I find out why the query planner doesn’t consider (or overestimates) Query 1’s plan when presented with Query 2? EXPLAIN ANALYZE is great but it doesn’t allow me to see the plans that weren't used.

You can "disable" various planner methods to see alternative plans. "Disable" quoted, because those methods are then not actually disabled, just estimated to be ridiculously high, so that any other plan seems favorable. Only for debugging! Not meant for permanent use.

There is a list in the manual in the chapter Planner Method Configuration. Consider the advice at the top:

These configuration parameters provide a crude method of influencing
the query plans chosen by the query optimizer. If the default plan
chosen by the optimizer for a particular query is not optimal, a
temporary solution is to use one of these configuration parameters to force the optimizer to choose a different plan. Better ways to
improve the quality of the plans chosen by the optimizer include
adjusting the planner cost constants (see Section 19.7.2),
running ANALYZE manually, increasing the value of the
default_statistics_target configuration parameter, and
increasing the amount of statistics collected for specific columns
using ALTER TABLE SET STATISTICS.

In your particular case, to understand why Postgres chose the this plan and not some other plan, compare the estimated costs before and after disabling the hash join strategy. Spoiler alert: it’s because it (falsely) projects a lower cost. For instance:

EXPLAIN <Query 2>;  -- take note
SET enable_hashjoin = off;
EXPLAIN <Query 2>;  -- compare

Or use EXPLAIN (ANALYZE, BUFFERS) for more details. You seem to be familiar with EXPLAIN.

Another plan will be chosen, and the estimated cost will be higher. And one or both estimates are pretty far from reality. Why Postgres arrives at bad estimates is another question. Basic starters are in the advice from the manual above.

Maybe your installation is (also) too optimistic about costs for parallelism. There are a number of GUC parameters to adjust that. You can disable parallelism for testing with:

SET max_parallel_workers_per_gather = 0;

Force a particular order of joins?

For the added question in the comment:

see the costs for performing scans in a Nested Loop in the opposite order?

You could experiment with join_collapse_limit:

Setting it to 1 prevents any reordering of explicit JOINs. Thus, the
explicit join order specified in the query will be the actual order in
which the relations are joined. Because the query planner does not
always choose the optimal join order, advanced users can elect to
temporarily set this variable to 1, and then specify the join order
they desire explicitly. For more information see Section 14.3.

So:

SET join_collapse_limit = 1;
EXPLAIN <Query 2>; 
EXPLAIN <Query 2 with tables reversed>;  -- compare

Related:

Don’t use data type char(n)!

One avoidable problem in your setup becomes obvious in the added casts in the EXPLAIN plan:

 Index Cond: (id = (b.id)::bpchar)

Etc.

At least one id column is of type character. bpchar is the internal name for the outdated SQL data type character. "bpchar" for "blank padded character". Never use it. Especially not for PK columns. The type character is a Zombi you shoot in the head on sight.

Do not confuse character / char / bpchar with the useful data types text or varchar or "char" (with quotes!). See:

If your id columns hold integer numbers, use integer (or bigint). Else text or whatever is appropriate.

Might contribute to the inferior query plans.
Turned out to be the main issue. See OP’s follow-up with a bug report.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply