Duplicate Conditions in PostgreSQL Query

All we need is an easy explanation of the problem, so here it is.

Can using duplicate conditions in PostgreSQL cause any problems?

For example, I have a purchases table with a customer id, and my framework in some cases duplicates a lookup condition, like the example below.

SELECT p.*
FROM purchases p
WHERE p.customer_id = 123 
AND p.customer_id = 123;

OR

SELECT p.*
FROM purchases p
WHERE p.customer_id = 123 
AND p.salesman = 456 
AND p.departament = 789 
AND p.customer_id = 123;

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Yes, that can cause bad query estimates. If PostgreSQL thinks that WHERE p.customer_id = 123 will return 10% of the rows from the table, it will think that WHERE p.customer_id = 123 AND p.customer_id = 123 will only return 1% of the table, because it does not realize that the conditions are actually the same and treats them as statistically independent.

Bad estimates can lead to bad plan choices and bad performance.

Perhaps that does not matter so much in your case: PostgreSQL puts a lower limit of 1 on each row count estimate, so if your de-duplicated query is already estimated to return very few rows, not much harm will be done.

Try using EXPLAIN!

Method 2

According to my tests on 12.6, it depends on the predicate:

db=# explain select * from books where product_form = 'BB';
                                                  QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
Index Scan using index_books_on_client_id_and_product_form on books  (cost=0.29..7273.75 rows=3604 width=23686)
  Index Cond: ((product_form)::text = 'BB'::text)
(2 rows)

db=# explain select * from books where product_form = 'BB' and product_form = 'BB' ;
                                                  QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
Index Scan using index_books_on_client_id_and_product_form on books  (cost=0.29..7273.75 rows=3604 width=23686)
  Index Cond: ((product_form)::text = 'BB'::text)
(2 rows)

db=# explain select * from books where product_form = 'BB' and product_form = 'BB' and product_form = 'BB' ;
                                                  QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
Index Scan using index_books_on_client_id_and_product_form on books  (cost=0.29..7273.75 rows=3604 width=23686)
  Index Cond: ((product_form)::text = 'BB'::text)
(2 rows)

So in that case, no problem. The row estimate and costs are constant.

Possibly this is because an index is involved, because an ilike predicate is negatively affected by this:

db=# explain select * from books where title ilike '%phys%';
                          QUERY PLAN                          
--------------------------------------------------------------
Seq Scan on books  (cost=0.00..7613.31 rows=539 width=23686)
  Filter: ((title)::text ~~* '%phys%'::text)
(2 rows)

db=# explain select * from books where title ilike '%phys%' and  title ilike '%phys%';
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
Seq Scan on books  (cost=0.00..7676.98 rows=11 width=23686)
  Filter: (((title)::text ~~* '%phys%'::text) AND ((title)::text ~~* '%phys%'::text))
(2 rows)

db=# explain select * from books where title ilike '%phys%' and  title ilike '%phys%' and title ilike '%phys%';
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
Seq Scan on books  (cost=0.00..7740.64 rows=1 width=23686)
  Filter: (((title)::text ~~* '%phys%'::text) AND ((title)::text ~~* '%phys%'::text) AND ((title)::text ~~* '%phys%'::text))
(2 rows)

Note the slight increase in cost and the reductction in estimate row count.

I think this could well be version sensitive also, so I would suggest checking is on your version with the predicates and schema that you’re concerned with.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply