Optimization of recursive view

All we need is an easy explanation of the problem, so here it is.

I have an optimization problem when using a recursive view on PostgreSQL. When I perform a trivial query using this view, the execution time is abnormally long.

To expose my problem, here is a database, with the view and the query that causes the problem: http://sqlfiddle.com/#!17/9d39e/13

The main table is v_univ_st and the view is called v_univ_bf.

EXPLAIN

I tried the EXPLAIN method of PostgreSQL, here is the result I get (the table I’m working on is much bigger than the fiddle one):

"Hash Right Join  (cost=4724036209.97..4757553915.67 rows=1510012 width=23746) (actual time=5172.917..24833.100 rows=1869 loops=1)"
"  Hash Cond: ((recipes_flat."PRODUCT_ID")::text = (u_sample_tasks."PRODUCT_ID")::text)"
"  ->  CTE Scan on recipes_flat  (cost=4723991547.19..4728513371.25 rows=164429966 width=23705) (actual time=0.197..19761.488 rows=367312 loops=1)"
"        CTE recipes_flat"
"          ->  Recursive Union  (cost=0.00..4723991547.19 rows=164429966 width=15645) (actual time=0.181..17845.438 rows=367312 loops=1)"
"                ->  Seq Scan on v_univ_st  (cost=0.00..8024046.44 rows=279636 width=1824) (actual time=0.171..3060.524 rows=279684 loops=1)"
"                      SubPlan 1"
"                        ->  Aggregate  (cost=28.54..28.55 rows=1 width=0) (actual time=0.007..0.007 rows=1 loops=279684)"
"                              ->  Index Only Scan using idx_recipe_blends_ingredient_id on v_univ_st sr  (cost=0.42..28.52 rows=6 width=0) (actual time=0.004..0.006 rows=0 loops=279684)"
"                                    Index Cond: ("INGREDIENT_ID" = (v_univ_st."PRODUCT_ID_COMP")::text)"
"                                    Heap Fetches: 68468"
"                ->  Nested Loop  (cost=0.42..471267890.14 rows=16415033 width=15645) (actual time=0.262..799.437 rows=9736 loops=9)"
"                      ->  WorkTable Scan on recipes_flat s  (cost=0.00..55927.20 rows=2796360 width=14718) (actual time=0.010..22.153 rows=40812 loops=9)"
"                      ->  Index Scan using idx_recipe_blends_ingredient_id on v_univ_st e  (cost=0.42..0.58 rows=6 width=999) (actual time=0.005..0.008 rows=0 loops=367312)"
"                            Index Cond: (("INGREDIENT_ID")::text = (s."PRODUCT_ID_COMP")::text)"
"                      SubPlan 2"
"                        ->  Aggregate  (cost=28.54..28.55 rows=1 width=0) (actual time=0.038..0.038 rows=1 loops=87628)"
"                              ->  Index Only Scan using idx_recipe_blends_ingredient_id on v_univ_st sr_1  (cost=0.42..28.52 rows=6 width=0) (actual time=0.007..0.037 rows=0 loops=87628)"
"                                    Index Cond: ("INGREDIENT_ID" = (e."PRODUCT_ID_COMP")::text)"
"                                    Heap Fetches: 19160"
"  ->  Hash  (cost=44661.15..44661.15 rows=131 width=9) (actual time=4983.487..4983.487 rows=129 loops=1)"
"        Buckets: 1024  Batches: 1  Memory Usage: 14kB"
"        ->  Seq Scan on u_sample_tasks  (cost=0.00..44661.15 rows=131 width=9) (actual time=954.011..4983.337 rows=129 loops=1)"
"              Filter: (("EXAMPLE")::text = 'EXAMPLE'::text)"
"              Rows Removed by Filter: 365517"
"Planning time: 193.957 ms"
"Execution time: 24903.973 ms"

As shown by https://explain.depesz.com/:

Optimization of recursive view


Indexes are used, I already tried:

SET enable_seqscan = OFF
SET enable_nestloop = OFF

but it doesn’t improve, the result is even worse.

Here is the table on dbfiddle.uk. I use version 9.6.

On this small database the results are not slow. On PostgreSQL the table is 261MB and about 279,000 rows.

v_univ_st is the table itself, which has no primary key indeed, but I work on tables that are not necessarily "relational" but rather data extraction. The view created is v_univ_bf and it is during this creation that I want to create columns that gives me the "depth" level of the ingredient. In the simplified example I recursively go through the table to get this information.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

By using a materialized view and EXISTS, I lowered the execution time of the request by about 96%.
I have changed :

CREATE OR REPLACE VIEW public.v_univ_bf

To

CREATE MATERIALIZED VIEW public.v_univ_bf

The materialized view is not refreshed with each query that uses it, it must be refreshed if changes have been made to the table.

For my problem, I refresh the materialized view every day with a query in my ETL but here is a topic that presents all the methods to refresh a materialized view.

Method 2

I see that OP already solved his problem with materialized view.

Anyway I see an optimization with replacing (SELECT COUNT(1) ...)=0 with NOT EXISTS:


CREATE OR REPLACE VIEW public.v_univ_bf AS 
 WITH RECURSIVE recipes_flat AS (
         SELECT v_univ_st.PRODUCT_ID,

            v_univ_st.NAME_PRODUCT,
            v_univ_st.INGREDIENT_ID,
            v_univ_st.PRODUCT_ID_COMP,
          /*
            (( SELECT count(1) AS count
                   FROM v_univ_st sr
                  WHERE sr.INGREDIENT_ID::text = v_univ_st.PRODUCT_ID_COMP::text)) = 0 AS last_comp_blend,
           */
           NOT EXISTS(
               SELECT 1 FROM v_univ_st sr
                 WHERE sr.INGREDIENT_ID::text = v_univ_st.PRODUCT_ID_COMP::text
             ) AS last_comp_blend,           
           1 AS level_blend
           FROM v_univ_st
        UNION ALL
         SELECT 
            e.PRODUCT_ID,
            s.NAME_PRODUCT,
            s.INGREDIENT_ID,
            e.PRODUCT_ID_COMP,   
             NOT EXISTS(
               SELECT 1 
                   FROM v_univ_st sr
                  WHERE sr.INGREDIENT_ID::text = e.PRODUCT_ID_COMP::text               
             ) AS last_comp_blend,            
/*
            (( SELECT count(1) AS count
                   FROM v_univ_st sr
                  WHERE sr.INGREDIENT_ID::text = e.PRODUCT_ID_COMP::text)) = 0 AS last_comp_blend,
*/                       
            s.level_blend + 1
           FROM v_univ_st e
             JOIN recipes_flat s ON s.PRODUCT_ID_COMP::text = e.INGREDIENT_ID::text
        )
 SELECT recipes_flat.PRODUCT_ID AS id_product,
    recipes_flat.NAME_PRODUCT AS name_product,
    recipes_flat.INGREDIENT_ID AS ingredient_id,
    recipes_flat.PRODUCT_ID_COMP AS product_id_comp,
    recipes_flat.level_blend,
    recipes_flat.last_comp_blend,
        CASE
            WHEN recipes_flat.last_comp_blend AND recipes_flat.level_blend = 1 THEN 'CF'::text
            WHEN recipes_flat.last_comp_blend AND recipes_flat.level_blend > 0 THEN 'F'::text
            WHEN NOT recipes_flat.last_comp_blend AND recipes_flat.level_blend = 1 THEN 'C'::text
            ELSE ''::text
        END AS compact_flat_view
   FROM recipes_flat;
   
 -- EXPLAIN ANALYZE  
 select * from v_univ_bf;

http://sqlfiddle.com/#!17/9d39e/45

I’m curious how much it speed up original view or REFRESH MATERIALIZED VIEW on real data.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply