Query result precedence if filter columns have values

All we need is an easy explanation of the problem, so here it is.

I have a table "customer_config" with these columns:

company (varchar)
warehouse (numeric)
section (numeric)
config_keyword (varchar)
config_value (varchar)

The two config_* columns can apply to an entire company (warehouse and section are null), an entire warehouse within a company (section is null), or a section within a warehouse.

So we could have a default row for the company, and then one or more rows that override a configuration value for a specific warehouse or warehouse & section.

I want to return only the most specific row for a given company, warehouse, and section. Something like this pseudocode:

results = select * from customer_config where (all match)
if results empty
    results = select * from customer_config where (company_code and warehouse match)
if results empty
    results = select * from customer_config where (company_code matches)

The most specific row shall take precedence.

Update

There can be multiple entries for the same config_keyword on the same level.
Is it also possible to return multiple rows for a single keyword?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

"Only the most specific row"

While looking for a single result row (like your original question indicated):

This is a bit more verbose, but very clear and probably fastest if supported with an index on (company_code, warehouse, section) – like you should probably have anyway (depends on undisclosed info).

SELECT * FROM customer_config
WHERE  (company_code, warehouse, section) = ($1, $2, $3)

UNION ALL
SELECT * FROM customer_config
WHERE  (company_code, warehouse) = ($1, $2)

UNION ALL
SELECT * FROM customer_config
WHERE  company_code = $1
LIMIT  1;

Postgres will stop executing as soon as a row has been found. Test with EXPLAIN ANALYZE, you’ll see "never executed" for remaining sub-SELECTs.

Note that LIMIT 1 applies to the whole query, not the last SELECT. (You’d need parentheses to change that.)
Similar:

"All rows for the most specific match"

If multiple rows can exist on each level.

Can be solved with pure SQL of course. For instance with a chain of CTEs. But this custom PL/pgSQL function should be more efficient:

CREATE OR REPLACE FUNCTION trade_volume (_company_code varchar, _warehouse numeric, _section numeric)
  RETURNS SETOF customer_config
  LANGUAGE plpgsql STABLE PARALLEL SAFE AS
$func$
BEGIN
   RETURN QUERY
   SELECT * FROM customer_config
   WHERE (company_code, warehouse, section) = ($1, $2, $3);

   IF FOUND THEN RETURN; END IF;
   
   RETURN QUERY
   SELECT * FROM customer_config
   WHERE (company_code, warehouse) = ($1, $2);

   IF FOUND THEN RETURN; END IF;
   
   RETURN QUERY
   SELECT * FROM customer_config
   WHERE  company_code = $1;
END
$func$;

Call:

SELECT * FROM trade_volume ('my_comany_code', 123456, 123);

Be sure to have the index mentioned above.

If the first query returns any rows, the function is done. The rest is not even planned. Etc.

Related:

I made the function PARALLEL SAFE to allow parallelism in Postgres 14 or later. (Only relevant for big tables.) Quoting the release notes:

Allow plpgsql’s RETURN QUERY to execute its query using parallelism (Tom Lane)

Method 2

This should work:

SELECT DISTINCT ON (config_keyword)
       config_keyword, config_value
FROM customer_config
WHERE company_code = $1
  AND coalesce(warehouse, $2) = $2
  AND coalesce(section, $3) = $3
ORDER BY config_keyword,
         section IS NULL,
         warehouse IS NULL;

DISTINCT ON will return the first row for each config_keyword, and results where section is not NULL will sort before results where section is NULL, because FALSE < TRUE. The same for warehouse.

Method 3

I don’t use PostgreSQL, but in oracle it can be done with analitic functions to return multiple rows for a single keyword:

select * 
  from (
     select rank() over (order by section, warehouse, company_code ) rn, 
            c.* 
       from customer_config c
      where company_code = $1
        and coalesce(warehouse, $2) = $2
        and coalesce(section, $3) = $3
       )
 where rn = 1;

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply