The best approach for optimizing query in large database table

All we need is an easy explanation of the problem, so here it is.

I’ve PostgreSQL database table named worker_activity following structure

   Column      |              Type              | Collation | Nullable |                        
---------------+--------------------------------+-----------+----------+
 id            | bigint                         |           | not null | 
 workerName    | character varying(255)         |           | not null |
 workerId      | integer                        |           | not null |
 action        | text                           |           | not null |
 created_at    | timestamp(0) without time zone |           |          |
 updated_at    | timestamp(0) without time zone |           |          |

This table has almost 30 millions of rows and I have the query that creates weekly reports

select * from "worker_activity" 
  where "created_at" between $1 and $2 
  and ("action" = $3 or "action" = $4) order by "id" asc

So, that "plain" query executes nearly 3 minutes, it’s rather slow, isn’t it?
How can I speed it up using all features PostgreSQL provides?
Are window functions or some other aggregation method applicable in my case?

Additional info as members pointed out:

  • Real row from the table
id| algorithmName|algorithmId |action |created_at|updated_at
3 | LiquidityMaker|     1     | {"step_1":{"action":"openOrderSell","orderSum":0.032,"orderPrice":2.3049,"counter":1},"step_2":{"action":"workTime","time":0.03714489936828613}}|2021-07-06 06:49:26|2021-07-06 06:49:26
  • psql (PostgreSQL) 10.19 (Ubuntu 10.19-0ubuntu0.18.04.1)
  • Hardware KVM VPS, 32 Gb of RAM, 8 Core CPU, SSD Drive.

Here is the execution plan:

explain (analyze, buffers)
select * from "worker_activity"
where "created_at" between '2021-07-06 06:49:25' and '2021-07-07 00:00:00'
  and ("action" = 'Start Work' or "action" = 'End Work')
order by "id" asc;
 Sort  (cost=4138424.27..4138424.29 rows=9 width=903) (actual time=235525.506..235571.749 rows=0 loops=1)
   Sort Key: worker_activity_liquidity_maker_1.id
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=120287 read=3750619
   ->  Gather  (cost=1000.00..4138424.12 rows=9 width=903) (actual time=235525.494..235571.733 rows=0 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=120287 read=3750619
         ->  Append  (cost=0.00..4137423.22 rows=9 width=903) (actual time=235521.443..235521.462 rows=0 loops=3)
               Buffers: shared hit=120287 read=3750619
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_1  (cost=0.00..577947.97 rows=1 width=883) (actual time=30871.374..30871.375 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 1532803
                     Buffers: shared hit=493 read=539153
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_2  (cost=0.00..505967.24 rows=1 width=935) (actual time=29927.739..29927.740 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 1273284
                     Buffers: shared hit=290 read=473816
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_3  (cost=0.00..409953.82 rows=1 width=978) (actual time=24560.669..24560.670 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 986666
                     Buffers: shared hit=464 read=384922
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_4  (cost=0.00..517571.53 rows=1 width=912) (actual time=30845.683..30845.684 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 1328712
                     Buffers: shared hit=602 read=483822
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_5  (cost=0.00..412771.36 rows=1 width=969) (actual time=24726.327..24726.328 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 1003528
                     Buffers: shared hit=397 read=387242
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_6  (cost=0.00..505008.02 rows=1 width=950) (actual time=30233.641..30233.642 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 1236177
                     Buffers: shared hit=444 read=473533
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_7  (cost=0.00..452042.03 rows=1 width=871) (actual time=26881.948..26881.949 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 1215541
                     Buffers: shared hit=516 read=421139
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_8  (cost=0.00..126317.74 rows=1 width=757) (actual time=199.055..199.056 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 389744
                     Buffers: shared hit=116649
               ->  Parallel Seq Scan on worker_activity_liquidity_maker_9  (cost=0.00..629843.52 rows=1 width=873) (actual time=37274.984..37274.985 rows=0 loops=3)
                     Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at <= '2021-07-07 00:00:00'::timestamp without time zone) AND ((action = 'Start Work'::text) OR (action = 'End Work'::text)))
                     Rows Removed by Filter: 1696199
                     Buffers: shared hit=432 read=586992
 Planning time: 0.919 ms
 Execution time: 235571.830 ms

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

That query could be made faster with the following index:

CREATE INDEX ON worker_activity (created_at, action);

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply