All we need is an easy explanation of the problem, so here it is.
I have a simple query that should return data about booking, but it misbehaves and sometimes does not return data within 1 minute.
So I have main booking table:
table booking_data - bid bigint primary key - user_id varchar(32) (other columns not relevant here and there is index on user_id)
and table with helper data, like if there are multiple flights on a booking, I wanna store the last flight datetime to helper table:
table itinerary_timestamps - bid references booking_data.bid - last_segment_arrival_at timestamp with time zone not null (some other columns and index on last_segment_arrival_at)
This is simplified query:
explain analyze select * from booking_data join itinerary_timestamps using (bid) where booking_data.user_id = 'dUZYLebTiZOBG1R3crKLhh' and itinerary_timestamps.last_segment_arrival_at < now() order by itinerary_timestamps.last_segment_arrival_at desc limit 11;
Limit (cost=1.00..1253.62 rows=11 width=515) (actual time=7171.379..10008.565 rows=11 loops=1) -> Nested Loop (cost=1.00..17263562.69 rows=151602 width=515) (actual time=7171.378..10008.551 rows=11 loops=1) -> Index Scan Backward using idx_itinerary_timestamps_last_segment_arrival_at on itinerary_timestamps (cost=0.44..781981.48 rows=25695264 width=60) (actual time=0.014..1918.143 rows=1760471 loops=1) Index Cond: (last_segment_arrival_at < now()) -> Index Scan using booking_data_pkey on booking_data (cost=0.56..0.64 rows=1 width=463) (actual time=0.004..0.004 rows=0 loops=1760471) Index Cond: (bid = itinerary_timestamps.bid) Filter: ((user_id)::text = 'dUZYLebTiZOBG1R3crKLhh'::text) Rows Removed by Filter: 1 Planning Time: 0.423 ms Execution Time: 10008.630 ms
There are 165867 bookings in booking_data for that user. There are 165862 records in itinerary_timestamps table with bid belonging to user_id and 26424052 total records.
Do you have any ideas how to optimize this simple query?
I was thinking of using bid for filtering that table, but it had similar results (sometimes better sometimes same):
explain analyze select * from booking_data join itinerary_timestamps using (bid) where booking_data.user_id = 'aQIDkXXEx3nWY5KvLSuEiK' and itinerary_timestamps.last_segment_arrival_at < now() and booking_data.bid in (select bid from booking_data where user_id='aQIDkXXEx3nWY5KvLSuEiK') order by itinerary_timestamps.last_segment_arrival_at desc limit 11;
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
This is maybe the most common planning issue we see here. The planner assumes the 165862 qualifying records are spread randomly throughout the 26424052 rows of the index (well, it thinks the numbers are 151602 and 25695264, as it uses estimates not the actual counts–that doesn’t change anything though.) Therefore it thinks it can stop early after scanning 11/165862 of the index. But in reality this customers records are deficient in later timestamps, so it ends up needing to walk 1760471/25695264 of the index before it finds 11 of them, which is over 1000 times more than it thought.
There is nothing you can do to get the planner to stop assuming the records are randomly scattered. What you could do is force it not to use the idx_itinerary_timestamps_last_segment_arrival_at index for ordering, by changing the ORDER BY to
order by itinerary_timestamps.last_segment_arrival_at+interval '0 seconds' desc. This means it will have to read all 165862 records and sort them. Which is not ideal, but probably better than the current plan.
If the performance of this were important enough, it might be worth it to denormalize by copying user_id into itinerary_timestamps, then you would just need an index on (user_id, last_segment_arrival_at).
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂