How does Bitmap heap scan as the inner node under Nested loop work?

All we need is an easy explanation of the problem, so here it is.

I understand generally how Bitmap Index Scan followed by Bitmap Heap Scan works.

However, I’m not sure how it work when a Bitmap Heap Scan chosen as the inner node of a Nested Loop.

 Nested Loop  (cost=86.73..369.99 rows=1 width=37) (actual time=8.487..1823.876 rows=1800 loops=1)
   ->  Bitmap Heap Scan on workspace_history_sets  (cost=11.69..141.51 rows=3 width=32) (actual time=6.046..14.218 rows=636 loops=1)
         Recheck Cond: (workspace_uid = '+++'::uuid)
         Filter: (((project_uid)::character varying)::text = '---'::text)
         Heap Blocks: exact=7
         ->  Bitmap Index Scan on workspace_history_set_idx_workspace_history_set  (cost=0.00..11.69 rows=641 width=0) (actual time=0.198..0.198 rows=636 loops=1)
               Index Cond: (workspace_uid = '+++'::uuid)
   ->  Bitmap Heap Scan on sheets  (cost=75.03..76.15 rows=1 width=90) (actual time=2.818..2.836 rows=3 loops=636)
         Recheck Cond: ((history_set_uid = workspace_history_sets.history_set_uid) AND (project_uid = '---'))
         Heap Blocks: exact=1800
         ->  BitmapAnd  (cost=75.03..75.03 rows=1 width=0) (actual time=2.752..2.752 rows=0 loops=636)
               ->  Bitmap Index Scan on sheets_idx_history_set_uid  (cost=0.00..8.00 rows=551 width=0) (actual time=0.041..0.041 rows=3 loops=636)
                     Index Cond: (history_set_uid = workspace_history_sets.history_set_uid)
               ->  Bitmap Index Scan on sheets_idx_project_uid  (cost=0.00..66.39 rows=1573 width=0) (actual time=2.645..2.645 rows=4612 loops=636)
                     Index Cond: (project_uid = '---')

From EXPLAIN above, we can see that Bitmap Heap Scan is executed 636 times (loops=636).

  1. Am I correct to say that each time it runs Bitmap Heap Scan and its children nodes for just 1 key(index) retrieved from the outer node?
  2. If above is true, then isn’t the stats of Bitmap Heap Scan and its children different every loop? Then what do the numbers shown in those nodes e.g. (actual time=2.818..2.836 rows=3 loops=636) mean? Are they averaged across all loops?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Yes, the bitmap index scan on sheets_idx_history_set_uid is done once for each 636 found values of workspace_history_sets.history_set_uid. The numbers reported for actual time and rows are an average.

The bitmap index scan on sheets_idx_project_uid is just repeated identically 636 times, since it doesn’t reference any value from the outer. It could just run the scan once and reuse the bitmap each time, but no one has implemented that optimization. Which is unfortunate as that is the dominate time spent in your query.

If the planner had more accurate stats, it would probably just scan sheets_idx_history_set_uid as a regular index scan, then test for project_uid = '---' as a filter. Using the BitmapAnd for 551 rows seems more reasonable than using it for 3 rows does. All of your other stats seem wonky too, you should VACUUM ANALYZE both tables to see if that changes things.

A multicolumn index on (history_set_uid, project_uid) should offer a better plan which would also be more resistant to the effects of bad stats.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply