All we need is an easy explanation of the problem, so here it is.
I have two tables :
assortment_product. They are connected between them with an
assortment_uuid. I also have:
CREATE UNIQUE INDEX assortment_product_assortment_uuid_article_number_idx ON public.assortment_product USING btree (assortment_uuid, article_number) CREATE UNIQUE INDEX customer_assortment_assortment_uuid_published_to_idx ON public.customer_assortment USING btree (assortment_uuid, published_to) CREATE INDEX assortment_product_article_number_idx ON public.assortment_product USING btree (article_number)
Having a list of customers and a list or articles, I want to get the entries where these 3 things match:
- assortment_uuid - published_to (customer) - article_number
My query looks like this:
select ass_product.article_number as articleNumber, cust_assortment.published_to as customerId, array_agg(cast (cust_assortment.assortment_uuid as varchar) order by cust_assortment.id) as uuids, array_agg(cust_assortment.assortment_name order by cust_assortment.id) as names from customer_assortment cust_assortment join assortment_product ass_product on cust_assortment.assortment_uuid = ass_product.assortment_uuid # there are max 10 values here where cust_assortment.published_to = any(values ('7000014910'), ('7000014940') ... ) # there a max 2000 values here and ass_product.article_number = any(values ('FK5327'),('GG1114') ... ) group by ass_product.article_number, cust_assortment.published_to;
This takes tens of seconds to get.
My understanding of a query plan is somehow limited, but if it helps, here it is:
GroupAggregate (cost=92861.99..94421.60 rows=47988 width=81) (actual time=4107.602..4382.830 rows=20000 loops=1) | Group Key: ass_product.article_number, cust_assortment.published_to | -> Sort (cost=92861.99..92981.96 rows=47988 width=61) (actual time=4107.518..4123.623 rows=153760 loops=1) | Sort Key: ass_product.article_number, cust_assortment.published_to | Sort Method: quicksort Memory: 27767kB | -> Nested Loop (cost=30.99..89130.83 rows=47988 width=61) (actual time=2.175..3632.692 rows=153760 loops=1) | -> HashAggregate (cost=30.00..32.00 rows=200 width=32) (actual time=2.090..3.603 rows=2000 loops=1) | Group Key: "*VALUES*_1".column1 | -> Values Scan on "*VALUES*_1" (cost=0.00..25.00 rows=2000 width=32) (actual time=0.004..1.048 rows=2000 loops=1) | -> Nested Loop (cost=0.99..445.39 rows=10 width=61) (actual time=0.034..1.797 rows=77 loops=2000) | -> HashAggregate (cost=0.15..0.25 rows=10 width=32) (actual time=0.000..0.006 rows=10 loops=2000) | Group Key: "*VALUES*".column1 | -> Values Scan on "*VALUES*" (cost=0.00..0.12 rows=10 width=32) (actual time=0.002..0.006 rows=10 loops=1) | -> Nested Loop (cost=0.84..44.34 rows=17 width=61) (actual time=0.031..0.176 rows=8 loops=20000) | -> Index Scan using assortment_product_article_number_idx on assortment_product ass_product (cost=0.29..10.01 rows=4 width=22) (actual time=0.013..0.021 rows=10 loops=20000) | Index Cond: ((article_number)::text = "*VALUES*_1".column1) | -> Index Scan using customer_assortment_assortment_uuid_published_to_idx on customer_assortment cust_assortment (cost=0.55..8.57 rows=1 width=55) (actual time=0.014..0.015 rows=1 loops=195740)| Index Cond: ((assortment_uuid = ass_product.assortment_uuid) AND ((published_to)::text = "*VALUES*".column1)) | Planning time: 3.601 ms | Execution time: 4385.493 ms |
Can anybody bring some sanity into my life please?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
You would get better performance with an index-only scan on
customer_assortment. For that, create the index as follows:
CREATE UNIQUE INDEX customer_assortment_assortment_uuid_published_to_idx ON public.customer_assortment (assortment_uuid, published_to) INCLUDE (id, assortment_name);
Then, make sure the visibility map is up to date:
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂