All we need is an easy explanation of the problem, so here it is.
EXPLAIN ANALYZE output for a query and found a couple subqueries were scanning the whole table. This is done to get only the most recent record in those tables related to an appointment.
After some research, I decided that it would be reasonable to use a lateral join on those subqueries to dramatically reduce the amount of data scanned.
EXPLAIN ANALYZE suggested the cost of the whole query with lateral joins would be about a quarter of the original. So we proceeded.
Within two hours of deploying the query change, our DB server was maxed out at 100% and basically unresponsive. Reverting the query to use subqueries scanning the tables restored the CPU usage to something sane. Our DB is running in AWS RDS for PostgreSQL using a t2.xlarge. Performance insights showed a substantial increase in ClientWrite. See .
The query using subqueries along with the
EXPLAIN output: https://explain.depesz.com/s/wES6.
select appointments.*, reportSnapshots.created_at as latestSnapshotTime, responses.created_at as latestResponseTime from appointments left join ( SELECT DISTINCT ON (appointmentId) created_at, appointmentId FROM reportSnapshots ORDER BY appointmentId, created_at DESC ) reportSnapshots on appointments.id = reportSnapshots.appointmentId left join ( SELECT DISTINCT ON (appointmentId) created_at, appointmentId FROM responses ORDER BY appointmentId, created_at DESC ) responses on appointments.id = responses.appointmentId where appointments.organizationId = 16 and appointments.locationId = '51' and appointments.cancelled = false and appointments.filteredIn = true and start between '2021-05-04T00:00:00-06:00' and '2021-05-04T23:59:59-06:00' and appointments.locationId in (61,60,140,53,138,130,133,131,55,51,100) group by appointments.id, reportSnapshots.created_at, responses.created_at order by start ASC, start ASC, id ASC limit 100
The query using lateral join along with the
EXPLAIN output: https://explain.depesz.com/s/B2vp.
select appointments.*, reportSnapshots.created_at as latestSnapshotTime, responses.created_at as latestResponseTime from appointments left join lateral ( SELECT DISTINCT ON (appointmentId) created_at, appointmentId FROM reportSnapshots WHERE reportSnapshots.appointmentId = appointments.id ORDER BY appointmentId, created_at DESC ) reportSnapshots on appointments.id = reportSnapshots.appointmentId left join lateral ( SELECT DISTINCT ON (appointmentId) created_at, appointmentId FROM responses WHERE responses.appointmentId = appointments.id ORDER BY appointmentId, created_at DESC ) responses on appointments.id = responses.appointmentId where appointments.organizationId = 16 and appointments.locationId = '51' and appointments.cancelled = false and appointments.filteredIn = true and start between '2021-05-04T00:00:00-06:00' and '2021-05-04T23:59:59-06:00' and appointments.locationId in (61,60,140,53,138,130,133,131,55,51,100) group by appointments.id, reportSnapshots.created_at, responses.created_at order by start ASC, start ASC, id ASC limit 100
Obviously, I did not understand what the
EXPLAIN output was telling me about the queries. What did I miss that could have told me the DB load would be higher with the lateral join query despite a lower cost?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
EXPLAIN ANALYZEsuggested the cost of the whole query with lateral joins would be about a quarter of the original.
EXPLAIN (estimating the cost) suggests
189,883.92 unicorn points.
EXPLAIN ANALYZE (measuring actual execution times) disagrees and shows
2,502.031 ms vs.
1,835.193 ms, so around 1/3 slower. There can be many reasons why the estimation is off target. Most prominently cost settings and statistics. See:
That said, the query can probably be much faster (by orders of magnitude). We only got limited information, but I suppose you want something like this:
SELECT a.* , rs.created_at AS latestsnapshottime -- maybe use COALESCE? , rp.created_at AS latestresponsetime FROM appointments a LEFT JOIN LATERAL ( SELECT rs.created_at FROM reportsnapshots rs WHERE rs.appointmentid = a.id ORDER BY rs.created_at DESC -- NULLS LAST ? LIMIT 1 ) rs ON true LEFT JOIN LATERAL ( SELECT rp.created_at FROM responses rp WHERE rp.appointmentid = a.id ORDER BY rp.created_at DESC -- NULLS LAST ? LIMIT 1 ) rp ON true WHERE a.organizationid = 16 AND a.locationid = '51' AND a.cancelled = FALSE AND a.filteredin = TRUE AND a.start BETWEEN '2021-05-04T00:00:00-06:00' AND '2021-05-04T23:59:59-06:00' AND a.locationid IN (61,60,140,53,138,130,133,131,55,51,100) -- GROUP BY a.id, rs.created_at, rp.created_at -- not needed, I guess ORDER BY a.start, a.id LIMIT 100;
DISTINCT ON is a great tool, but for different situations. See:
Retrieving a single row from each
LATERAL subquery like I suggest can use an index and is super fast. Ideally, you have theses indexes:
reportsnapshots (appointmentid, created_at DESC NULLS LAST) responses (appointmentid, created_at DESC NULLS LAST)
created_at is defined
NOT NULL, a simpler index on
(appointmentid, created_at) is just as good (and preferable). See:
- Sort by column ASC, but NULL values first?
- Why does ORDER BY NULLS LAST affect the query plan on a primary key?
Plus an index on the outer table. The one in use now (
appointments_organizationId_status_start_idx) doesn’t seem too bad. But I suspect more potential, depending on undisclosed information.
You probably don’t have to aggregate in the outer query, since both subqueries return a single row each (even your original).
Alternatively, use plain correlated subqueries with
max() for your simple case. Probably even faster, yet:
SELECT a.* , (SELECT max(rs.created_at) FROM reportsnapshots rs WHERE rs.appointmentid = a.id) AS latestsnapshottime , (SELECT max(rp.created_at) FROM responses rp WHERE rp.appointmentid = a.id) AS latestresponsetime FROM appointments a WHERE ... LIMIT 100;
BETWEEN is typically no good for timestamps. See:
- How to add a day/night indicator to a timestamp column?
Aside 2: In your first query plan I see
Sort Method: external merge Disk: 3,856kB and
Sort Method: external merge Disk: 17,752kB, which indicates a lack of
work_mem. The same problem does not surface in the 2nd plan, nor will it for my queries. But look into your server configuration. Related:
You looked at the estimated costs of the query and saw that it was less, but you overlooked two important things. The actual measured time was longer (although not by much), not shorter. And the actual row count was off by a factor of 32 from the estimate. Both of these are pretty important flags.
So while you apparently did the EXPLAIN with ANALYZE, you apparently ignored all the important data provided by using ANALYZE. (Or maybe you originally just did EXPLAIN, and only repeated it with ANALYZE as part of your post-mortem write-up. In that case, well, that was your mistake. About the only reason to do EXPLAIN without ANALYZE is if the query will never finish with it, or if actually running it will modify data you don’t want to modify)
If it was faster, you should have seen it being faster, not slower. And even if you had somehow not noticed it being slower, the way-wrong row estimates should have made you suspicious that the estimates were not reliable. Especially when you see it was thought to do a seq scan of a large table just once, but instead it did the scan 32 times. (And the reason for that, in turn, seems to be that you are missing an index on
responses (appointmentid), or better yet on
responses (appointmentid, created_at). With one of those indexes, your proposed query would have actually been much faster than the old one–but making it faster doesn’t seem to be what your actual question is about.)
So there was ample evidence in the ANALYZE part of the EXPLAIN (ANALYZE) that it would not actually be an improvement. But as for crippling the server, I don’t see evidence in the EXPLAIN (ANALYZE) that it would do that. But I also don’t see evidence that it actually did do that. I don’t see indications of a crippled server anywhere in your load chart. The load chart seems to show one and a half cycles of, well, something. But what you describe is only one cycle (apply the new query, see problem, revert), so how does that line up with the load chart?
In any event, high ClientWrite wait events indicate a problem with the clients (or the network, or maybe indicate no problem at all), not a problem with the server. The server is operating so well that it is generating requested data faster than the clients (or network) can process it.
I suspect the main problem you ran into is that you exhausted a credit balance on your t-class instance, either CPU credit balance or IO credit balance. That would not show up on an
EXPLAIN (ANALYZE) taken during a time when the credit balance was not exhausted. (And I don’t know how it would show up on Amazon RDS’s flavor of load chart, but would not be surprised if it doesn’t how up in an obvious manner).
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂