All we need is an easy explanation of the problem, so here it is.
We are using Azure SQL Database and recently we have experienced a severe performance degradation on our production database. When looking into Regressed queries in Query Store to find out the probable culprit, I found a regressed query where there were a few hundred failed executions of its usual query plan:
I was not able to find any information on how to figure out what exactly went wrong, just that an exception aborted the execution. The problem went away on its own after a couple of minutes.
I queried the
sys.query_store_runtime_stats catalog, but saw nothing useful there.
Is there a way to get to the actual failures? Could I set up these errors to be sent out to e.g. Log Analytics automatically (as we’re in Azure)?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
The best way I know to solve this is to set up Extended Events. Specifically you can capture the error_reported event. Use Causality Tracking along with batch/rpc starting and batch/rpc completed to capture every bit of data about what’s happening with the queries. Be sure to use some good filtering because this could generate quite a lot of data.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂