Improving P95+ latency

## Improving P95+ latency

We see an increasing number of large internet-facing sites and services being hosted on .NET. While there is a lot of legitimate focus on the [requests per second (RPS) metric](https://twitter.com/ben_a_adams/status/1260792649625280513), we find that very few big site owners ask us about that or even need to satisfy throughput approaching 1000 RPS (1000 RPS == 86M requests per day). We hear a lot more about latency, specifically about improving [P95 or P99 latency](https://docs.microsoft.com/en-us/azure/internet-analyzer/internet-analyzer-scorecard). Often, the number of machines or cores that are provisioned for (and biggest cost driver of) a site are chosen based on achieving a specific P95 metric (for example, 200ms), as opposed to a lower P50 metric (for example, 50ms). We think of latency as being the true &ldquo;money metric&rdquo;.

We want .NET to be a platform that makes it cheaper to host your applications with each new release. In order to achieve that, it's important that latency is both good (low) but also predictable. That's the whole point of measuring P95+ latency. We have increased our focus on predictably consistent performance, reducing performance cliffs and outliers, with an emphasis on P95+ latency.

An equally important theme is predictable performance. Some of these epics are more related to that, then specifically targeting P95 latency. We're going to be a bit lazy here, and mix the two topics.

## GC

* [Card mark stealing for better work balance in Server GC](https://github.com/dotnet/coreclr/pull/25986)
* [Optimize decommitting GC heap memory pages](https://github.com/dotnet/runtime/pull/35896)
 * [Pinned object heap](https://github.com/dotnet/runtime/pull/32283) to reduce heap fragmentation caused by pinning
Reduce GC pause times in specific situations, like [Array.Copy](https://github.com/dotnet/coreclr/pull/27776), [Array.Sort](https://github.com/dotnet/runtime/pull/35297) or [object unboxing](https://github.com/dotnet/runtime/pull/32353#issuecomment-586642480)

## Runtime

* [Improve call counting mechanism](https://github.com/dotnet/runtime/pull/32250) used by tiered JIT compilation to smooth out performance during startup
* [Casting in a loop may cause long GC pause times](https://github.com/dotnet/runtime/issues/13821)
* [Dynamic expansion of internal generic dictionary](https://github.com/dotnet/runtime/pull/32270) that eliminate performance cliffs hit by generic code
* [Fix for solving lock contention issue in GC statics scanning](https://github.com/dotnet/runtime/pull/32795)
* [GC polling in unboxing JIT helpers](https://github.com/dotnet/runtime/pull/32353#issuecomment-586642480)
* [Buffer::BlockCopy may spend too long without GC polling](https://github.com/dotnet/runtime/issues/13554)
* [Calling System.Math floating point operations in a loop causes long GC pause times ](https://github.com/dotnet/runtime/issues/13820)
* [Inlined GC poll for methods marked with SuppressGCTransitionAttribute](https://github.com/dotnet/runtime/issues/13582)

## Libraries

* [Very high latency for GC when using (lots of) ThreadLocal](https://github.com/dotnet/runtime/issues/2382)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving P95+ latency #37534