Description
There are few places in ConcurrentDictionary where we call AcquireAllLocks
but still access the volatile variables like _tables._countPerLock
, _tables._buckets
and _tables._locks
inside loop.
E.g. CopyTo, GetCountInternal, CopyTo, GetKeys and GetValues.
Is there a reason for doing it or can we cache those variables outside the loop and use it inside? Currently, for ARM64, JIT generates expensive memory barrier instructions for accessing volatile variables. With volatile variables getting accessed inside the loop, we are executing these instructions inside the loop. Caching them would optimize the performance of these APIs on ARM64.
Below is an example of machine code we generate before/after caching the volatile variable outside loop for GetCountInternal
method.
Before:
After: I made following change and see this generated code. This gave approx. 30% win in Dictionary.Count benchmark.
private int GetCountInternal()
{
int count = 0;
int[] countPerLocks = _tables._countPerLock;
// Compute the count, we allow overflow
for (int i = 0; i < countPerLocks.Length; i++)
{
count += countPerLocks[i];
}
return count;
}