Is a correlated function in the FROM clause executed for every row?

All we need is an easy explanation of the problem, so here it is.

I have a heavy function, let’s call it fcalc(x,y) -> my_z. I need the result my_z to both be a filter (too low and the row is discarded) and in the result set (so my client can see it). I write the query like so:

SELECT *, my_z
FROM big_table t, (SELECT * FROM fcalc(t.x, t.y)) as my_z
WHERE condition1 AND condition2 AND ... AND my_z > $threshold

My question is: will all the other conditions apply first which should filter out very large number of rows before it applies fcalc? I’m very new to databases.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

NO, Postgres typically does not evaluate the function in the LATERAL subquery for all rows.
It will apply simple filters on big_table first and execute the function only for rows still in the race.

Fix question

Your query is syntactically invalid.
Assuming fcalc() returns a single value, this would work:

SELECT * , my_z
FROM big_table t, LATERAL (SELECT * FROM fcalc(t.x, t.y)) AS f(my_z)
WHERE $condition1 AND $condition2 AND ... AND f.my_z > $threshold

And should be untangled to just:

SELECT *
FROM   big_table t
JOIN   LATERAL fcalc(t.x, t.y) AS f(my_z) ON f.my_z > $threshold
WHERE  $condition1
AND    $condition2
AND ... 

Moving the f.my_z > $threshold from the WHERE clause to the join condition makes the query easier to read and has no effect on the query plan whatsoever (while using [INNER] JOIN). This produces the exact same query plan:

SELECT *
FROM   big_table t, fcalc(t.x, t.y) f(my_z)
WHERE  $condition1
AND    f.my_z > $threshold
AND    $condition2
AND ... 

Query plan

Either of the fixed queries will first apply predicates filtering rows in big_table, before executing fcalc() and filtering on the result.

You can check with EXPLAIN ANALYZE. Say, your big_table has 8 rows, 5 of which don’t pass your $conditionN filters, and 1 of the remaining 3 does not pass f.my_z > $threshold. You’ll see something like:

Nested Loop  (cost=0.00..1.17 rows=3 width=79) (actual time=0.026..0.027 rows=1 loops=1)
  ->  Seq Scan on big_table t  (cost=0.00..1.10 rows=3 width=75) (actual time=0.007..0.009 rows=3 loops=1)
        Filter: (id > 5)
        Rows Removed by Filter: 5
  ->  Function Scan on fcalc f  (cost=0.00..0.02 rows=1 width=4)
                                (actual time=0.005..0.005 rows=0 loops=3)
        Filter: (my_z > 9)
        Rows Removed by Filter: 1
Planning Time: 0.101 ms
Execution Time: 0.043 ms

Meaning, fcalc() was only executed 3 times in the example. In reality, you should see index scans for the big table, but all the same.

You can further verify this if you set the GUC track_functions to pl before executing the query with or without EXPLAIN ANALYZE. The manual:

Enables tracking of function call counts and time used. Specify pl
to track only procedural-language functions, all to also track SQL
and C language functions. The default is none, which disables
function statistics tracking. Only superusers can change this setting.

Note

SQL-language functions that are simple enough to be “inlined” into the
calling query will not be tracked, regardless of this setting.

Then check how often your function has actually been called, before and after executing your query:

SELECT calls
FROM   pg_catalog.pg_stat_user_functions
WHERE  funcid = 'fcalc'::regproc

Consult the manual for details about the cast 'fcalc'::regproc.

Aside

Postgres will also prioritize filters on the same level by their estimated cost. You can verify with the tools I laid out above. Tinker with the COST setting of simple plpgsql functions …

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply