How to optimize a query with timestamp that using OR clause?

All we need is an easy explanation of the problem, so here it is.

I have the follow table:

create table myTable
(
    id        serial                                                               not null
    rowtype   integer                                                              not null
    starttime timestamp                                                            not null,
    endtime   timestamp,
    clientid  integer                                                              not null
    deletedat timestamp default '1980-01-01 00:00:00'::timestamp without time zone not null,

    constraint myTable_pkey primary key,
    constraint myTable_rowtype_fkey references rowtype (number),
    constraint myTable_clientid_fkey references client,
    constraint myTable_rowtype_starttime_endtime_clientid_deletedat_key unique (rowtype, starttime, endtime, clientid, deletedat)
);

And I need to run the following query:

select myTable.id as id, client.id as clientid, myTable.rowtype as rowtype, myTable.starttime as starttime, myTable.endtime as endtime
from myTable
    inner join client on myTable.clientid = client.id 
where myTable.rowtype in (1, 2, 3, 4)
  and ((myTable.starttime>'2021-03-04 19:33:26+00' and myTable.starttime<'2021-03-05 00:40:28+00')
        or (myTable.endtime>'2021-03-04 19:33:26+00' and myTable.endtime<'2021-03-05 00:40:28+00'))
  and myTable.deletedat<'2000-01-01 00:00:00'
  and client.deletedat<'2000-01-01 00:00:00';

Explain Analyse:

Nested Loop  (cost=0.84..15782.99 rows=1 width=28) (actual time=246.485..557.682 rows=2 loops=1)
    ->  Index Scan using client_unique_otherFk on client  (cost=0.28..8.30 rows=1 width=4) (actual time=0.047..0.052 rows=1 loops=1)
        Index Cond: ((otherFk = 10) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
    ->  Index Scan using myTable_rowtype_starttime_endtime_clientid_deletedat_key on myTable  (cost=0.56..15774.63 rows=6 width=28) (actual time=246.460..557.643 rows=2 loops=1)
        Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
        Filter: (((starttime > '2021-03-04 19:33:26'::timestamp without time zone) AND (starttime < '2021-03-05 00:40:28'::timestamp without time zone)) OR ((endtime > '2021-03-04 19:33:26'::timestamp without time zone) AND (endtime < '2021-03-05 00:40:28'::timestamp without time zone)))
        Rows Removed by Filter: 9423

Through explain analyze I saw that the clause for myTable.starttime and myTable.endtime do not use myTable_rowtype_starttime_endtime_clientid_deletedat_key index. I think it is OR clause, right?

  1. My first question is, have one method that I can query use index using OR clause in this case?

However, I tested query using only starttime or only endtime in Where clause and remove OR. In two cases both used index myTable_rowtype_starttime_endtime_clientid_deletedat_key, but when I execute only with endtime the execution time of query it’s much slower than I use only starttime, even the two using the same index.

Query only use endtime example:

select myTable.id as id, client.id as clientid, myTable.rowtype as rowtype, myTable.starttime as starttime, myTable.endtime as endtime
from myTable
    inner join client on myTable.clientid = client.id 
where myTable.rowtype in (1, 2, 3, 4)
  and (myTable.endtime>'2021-03-04 19:33:26+00' and myTable.endtime<'2021-03-05 00:40:28+00')
  and myTable.deletedat<'2000-01-01 00:00:00'
  and client.deletedat<'2000-01-01 00:00:00';

Explain Analyse:

Nested Loop  (cost=0.84..15232.04 rows=1 width=28) (actual time=276.628..568.299 rows=1 loops=1)
    ->  Index Scan using client_unique_otherFk on client  (cost=0.28..8.30 rows=1 width=4) (actual time=0.047..0.052 rows=1 loops=1)
        Index Cond: ((otherFk = 10) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
    ->  Index Scan using myTable_rowtype_starttime_endtime_clientid_deletedat_key on myTable  (cost=0.56..15223.71 rows=3 width=28) (actual time=276.573..568.233 rows=1 loops=1)
        Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (endtime > '2021-03-04 19:33:26'::timestamp without time zone) AND (endtime < '2021-03-05 00:40:28'::timestamp without time zone) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))
Planning time: 0.453 ms
Execution time: 568.409 ms

Using only starttime instead endtime, the execution time is on average 0.449 ms.

  1. My second question is about this time difference for starttime and endtime fields, both using the same index. Does this time difference have any relation with the endtime isn’t not null but starttime is, or not? And if not, what could it be?

  2. What could I do, to optimization this query? (using starttime and endtime with OR like the first example query).

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Usually it is most efficient to avoid the dreaded OR and replace it with a UNION:

   SELECT myTable.id, client.id, ...
   FROM myTable
       JOIN client ON myTable.clientid = client.id 
   WHERE myTable.rowtype in (1, 2, 3, 4)
     AND myTable.starttime > '2021-03-04 19:33:26+00'
     AND myTable.starttime < '2021-03-05 00:40:28+00'
     AND myTable.deletedat < '2000-01-01 00:00:00'
     AND client.deletedat < '2000-01-01 00:00:00';
UNION
   SELECT myTable.id, client.id, ...
   FROM myTable
       JOIN client ON myTable.clientid = client.id 
   WHERE myTable.rowtype in (1, 2, 3, 4)
     AND myTable.endtime > '2021-03-04 19:33:26+00'
     AND myTable.endtime < '2021-03-05 00:40:28+00'
     AND myTable.deletedat < '2000-01-01 00:00:00'
     AND client.deletedat < '2000-01-01 00:00:00';

This will produce the same result if you have both primary keys in the SELECT list. If you can live with duplicates, you could use the more efficient UNION ALL.

I am not sure which conditions are the selective ones, but you could start with two indexes ON mytable (starttime) and ON mytable (endtime).

Method 2

Index:

(rowtype, starttime, endtime, clientid, deletedat)

Index usage:

Index Cond: ((rowtype = ANY ('{1,2,3,4}'::integer[])) AND (clientid = client.id) AND (deletedat < '2000-01-01 00:00:00'::timestamp without time zone))

Notice how the index condition skips over starttime (as it not usable, being specified in the OR condition). So while it is comparing clientid and deletedat from the index against specified values, it just using them as an "in-index filter", it can’t jump to a specific part of the index where it knows the qualifying values will be. This is quite inefficient, at compared to being to jump to a specific part of the index.

So one thing you could try is simply re-writing the column order in the index.

(rowtype, clientid, deletedat, starttime, endtime)

Using starttime in the OR makes it not useful for use in the index. And having it early in the index makes the rest of the columns, occurring after it in the index, also not useful (well, not nearly as useful as they could be). So that is a double whammy.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply