Performance optimisations. Classes vs Structs vs ArrayPool Structs
Intro
Performance is a big and hard topic, there are already plenty of books and articles written. In this post I will not go into the basics or teory of performance instead I will jump directly into examples on how to improve performance.
NOTE: this post contains code examples with my own adaptation that structurally have there origins derived from the theory and examples in the book “Pro .NET Memory Management” by Konrad Kokosa. Please read it for more theory and details.
Structs
One of the first (and easiest) optimisations that can be applied is to use structs instead of classes. Why is it so?
As we know classes allocate objects on the GC Managed Heap based on decision tree (different for SOH and LOH) which is a really complex operation. Some details here https://docs.microsoft.com/en-us/archive/msdn-magazine/2005/may/net-framework-internals-how-the-clr-creates-runtime-objects
Value types (structs in our case) “may” be allocated on the stack. Which is a much faster operation.
How fast is allocation operation:
- Allocations are cheap as far as fast path is used.
- More complex allocation paths from time to time will trigger Garbage Collector.
- Allocations of big objects in LOH is slower because it may be mainly dominated by zeroing memory costs.
- Allocating a lot of objects may break generational hypothesis about an object’s lifetime which will lead to allocation of temporary objects which will give a lot work for Garbage Collector.
Premature optimisation is the root of all evil
We hear this mantra all the time and this is basically true. But knowing the business context we can start measuring GC in an early development stage and this will help us to avoid mistakes with optimisation in the wrong places.
We tend to use classes between methods with small amount of data and do not even think about alternatives.
With the following code sample we will see how memory management can be optimised with some small code changes:
Code
Using BenchmarkDotNet
nuget package (v.0.12.1 at the moment of writing this article) on a simple console project we can do our measurements.
Starting with the simple benchmark runner:
|
|
Our domain objects:
|
|
and their struct alternatives:
|
|
Benchmark class with “some” business logic:
|
|
OrdersService for instantiating our objects:
|
|
And finally another service with more business logic just to avoid aggresive optimisations for measurement:
|
|
Here ByStruct
method takes value type argument by reference to explicitly avoid copying.
Benchmark results
The code based on structures allocates about half of the code based on objects. Which is a pretty good result if we call it very often!
|
|
Key word ref
allowed us to avoid copying objects which reduced a lot memory traffic.
More optimisations
Use ValueTuple
instead of regular Tuple
.
This may significantly reduce overhead of returning mutiple values from a method. Since C# 7.0 we have nice syntax sugar like:
|
|
Use stackalloc
An array of structures is still allocated on the heap.
We can explicitly ask to allocate value types on the stack with stackalloc
command. It returns a pointer to a requested memory region that will be located on the stack.
|
|
but in that case we need to add the key word unsafe
to the method that will contain such code.
Although there is a new solution using Span<T>
that lets us get rid of unsafe
|
|
NOTE:
stackalloc
can not be used with a managed type so if you have any fields that are reference type (evenstring
) it won’t work!
stackalloc
should be used for small buffers that do not exceed 1kB. If there is not enough stack space left the StackOverflowException
will be rised. Populating a big memory region on a thread’s stack will bring a lot of its memory pages into working set which might be a wasteful approach if pages are not shared between other threads.
To be sure that StackOverflowException
never happens you can use
|
|
Each of these methods ensure that the remaining stack space is large enough to execute the average .NET Framework function. The current value is 128 kB for 64 bit and 64 kB for 32 bit environments. But it does not guarantee that it will be enough for a large stackalloc so thats why it is better to use it for small buffers.
Use ArrayPool
For large objects the stackalloc
approach provides memory traffic and performance issues. Instead we can re-use objects from pool of preloaded objects.
Each of 17 buckets in the default ArrayPool contains arrays twice as large as the previous ones: 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288 and 1048576.
|
|
Summary
In this blog post I showed how memory usage can be optimised with small code changes. But you have to be careful and think twice before stepping on the optimisation path. Any optimisation changes reduce code readability and break common code patterns and with the wrong usage they can actually decrease the performance.