Need to consider if indexing a complex query Postgres sql

All we need is an easy explanation of the problem, so here it is.

Here is query sample

which is the column need to consider to put an index?
which is the best indexing can apply per column or multiple column per index?

My problem is when i execute this query it takes so much time to finish

Is it on the complexity of the query or indexing ?

TIA!

SELECT
 a.id
 a.name AS name
 CASE
   WHEN a.status IS NULL THEN '1111'
   WHEN a.status = '2222' THEN '3333'
   WHEN a.status = '4444' THEN '5555'
   ELSE a.status
 END AS status,
 a.updated_at
FROM a
LEFT JOIN b ON a.request_id = b.request_id
LEFT JOIN (
   SELECT 
    DISTINCT ON (id) id,
    name
   FROM
    aa
   WHERE
    updated_at BETWEEN '2022-05-01 00:00:00' AND '2022-05-31 23:59:59'
    AND id IN (
      SELECT id 
      FROM a
      WHERE 
      updated_at BETWEEN '2022-05-01 00:00:00' AND '2022-05-31 23:59:59'
      AND status NOT IN ('6666', '7777', '8888')
      AND id LIKE '%%'
                 )
)
WHERE 
a.updated_at BETWEEN '2022-05-01 00:00:00' AND '2022-05-31 23:59:59'
AND status NOT IN ('6666', '7777', '8888')
AND id LIKE '%%'

UNION

SELECT
 z.id
 z.name AS name
 CASE
   WHEN z.status IS NULL THEN '1111'
   WHEN z.status = '2222' THEN '3333'
   WHEN z.status = '4444' THEN '5555'
   ELSE z.status
 END AS status,
 z.updated_at
FROM z
LEFT JOIN zb ON z.request_id = zb.request_id
LEFT JOIN (
   SELECT 
    DISTINCT ON (id) id,
    name
   FROM
    zz
   WHERE
    updated_at BETWEEN '2022-05-01 00:00:00' AND '2022-05-31 23:59:59'
    AND id IN (
      SELECT id 
      FROM z
      WHERE 
      updated_at BETWEEN '2022-05-01 00:00:00' AND '2022-05-31 23:59:59'
      AND status NOT IN ('6666', '7777', '8888')
      AND id LIKE '%%'
                 )
)
WHERE 
z.updated_at BETWEEN '2022-05-01 00:00:00' AND '2022-05-31 23:59:59'
AND status NOT IN ('6666', '7777', '8888')
AND id LIKE '%%'

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

One thing to note, clustered indexes should have a unique key (an identity column I would recommend) as the first column. Basically it helps your data insert at the end of the index and not cause lots of disk IO and Page splits.

Secondly, if you are creating other indexes on your data and they are constructed cleverly they will be reused.

e.g. imagine you search a table on three columns

state, county, zip.

you sometimes search by state only.
you sometimes search by state and county.
you frequently search by state, county, zip.

Then an index with state, county, zip. will be used in all three of these searches.

If you search by zip alone quite a lot then the above index will not be used (by SQL Server anyway) as zip is the third part of that index and the query optimiser will not see that index as helpful.

You could then create an index on Zip alone that would be used in this instance.

By the way We can take advantage of the fact that with Multi-Column indexing the first index column is always usable for searching and when you search only by ‘state’ it is efficient but yet not as efficient as Single-Column index on ‘state’

I guess the answer you are looking for is that it depends on your where clauses of your frequently used queries and also your group by’s.

The article will help a lot. 🙂

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply