We are using Azure SQL Database and recently we have experienced a severe performance degradation on our production database. When looking into Regressed queries in Query Store to find out the probable culprit, I found a regressed query where there were a few hundred failed executions of its usual query plan:

I was not able to find any information on how to figure out what exactly went wrong, just that an exception aborted the execution. The problem went away on its own after a couple of minutes.

I queried the sys.query_store_runtime_stats catalog, but saw nothing useful there.

Is there a way to get to the actual failures? Could I set up these errors to be sent out to e.g. Log Analytics automatically (as we’re in Azure)?

The best way I know to solve this is to set up Extended Events. Specifically you can capture the error_reported event. Use Causality Tracking along with batch/rpc starting and batch/rpc completed to capture every bit of data about what’s happening with the queries. Be sure to use some good filtering because this could generate quite a lot of data.

