How to estimate cpu based on peak rate of query requests?

All we need is an easy explanation of the problem, so here it is.

By default, SQL Server allows a maximum of 32767 concurrent
connections which is the maximum number of users that can
simultaneously log in to the SQL server instance.

I am developing 10 web apis that will query the sql server using their own logins (so max 10 logins, plus admin/devs). Expected number of concurrent logins is not beyond 20.

However each login (api) can make several (maybe 500) concurrent query requests. So effectively 10 apis * 500 req = 5000 concurrent query requests. There will be times when there are less or no requests.

Assuming there is sufficient memory and disk io capability, I am planning the cpu requirements.

I understand sql request is assigned to the worker thread and based on number of processors there is a certain number of default worker threads. Presently my dev machine has 24 processors so the default max worker thread is 832.

Assuming cost threadhold (of 40) may be exceeded by the queries, which means the sql server may decide to use parallelism (max dop).

Assuming MAX DOP = 1 then 832 requests can be handled at a time.

Assuming MAX DOP = 4 then 208 requests can be handled at a time.

Any query requests beyond this will have to wait until it is allocated a worker thread.

So to ensure peak load of 5000 requests can be satisfied, is it correct to estimate that I will require roughly atleast 145 cpus?

((145-4)*32)+512 = 5024

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

There’s no way to estimate core count based on number of requests. 500 inexpensive queries may execute concurrently in a few milliseconds of CPU time or could take several hours of CPU time.

Depending on your response time requirements, and the cost of those concurrent queries, it could require a 4-core machine or a massive machine.

A few years ago, Nick Craver blogged about the hardware used to run this site. We’ve made a few upgrades to Stack Overflow infrastructure since 2016, but consider the traffic volume of Stack Overflow and the Stack Exchange network (even back in 2016), compared to the hardware in use. At the time, it was 12 cores for Stack Overflow and 8 cores for the Stack Exchange network.

Method 2

@AMtwo is right to point out that number of concurrent requests – without additional information – doesn’t usually work well for system sizing.
For a given workload on a given system, cpu utilisation and number of active requests can be monitored, and that can lead to a scaling model. If all requests are DOP 1 that can work pretty well. If parallel queries are also part of the workload, then I often plot cpu utilisation on one Y axis of a time series graph and stack area for active parallel threads on top of active requests. If this is an enterprise edition SQL Server instance, Resource Governor provides those counters for each workload group, which makes it not very painful and almost fun.

But! After reading this question and some of your other recent questions, @variant, I wanted to point out:
— cpu utilisation for some workloads correlates well to concurrency of requests or (requests + px workers). This is often true of long running ETL queries, Analytics Services cubes processing, or other long-running analytics queries.
— cpu utilisation for some workloads correlates well to a database event rate. Batches/s, transactions/s, query plan optimisations/s. (for the time being i caution anyone from using the workload group requests completed/s counter for capacity planning because it’s a leaky counter and I’ll have to blog about that someday). OLTP systems are usually better suited to measuring and planning for cpu utilisation based on a database rate of activity.

It’s very rare to find a workload where cpu correlates equally well to concurrency and rate.

In your questions you’ve tended to focus on concurrency more than rates – unless the requests in these workloads are expected to exceed one or even 5 seconds, you may be better off trying to correlate an activity rate to cpu utilisation.

And, whether your workload is rate-based or concurrency-based, system planning for cpu only without also modelling memory and disk resources could leave the system hurting in the end.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from or, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply