Intro

Performance is a big and hard topic, there are already plenty of books and articles written. In this post I will not go into the basics or teory of performance instead I will jump directly into examples on how to improve performance.

NOTE: this post contains code examples with my own adaptation that structurally have there origins derived from the theory and examples in the book “Pro .NET Memory Management” by Konrad Kokosa. Please read it for more theory and details.

Structs

One of the first (and easiest) optimisations that can be applied is to use structs instead of classes. Why is it so?

As we know classes allocate objects on the GC Managed Heap based on decision tree (different for SOH and LOH) which is a really complex operation. Some details here https://docs.microsoft.com/en-us/archive/msdn-magazine/2005/may/net-framework-internals-how-the-clr-creates-runtime-objects

Value types (structs in our case) “may” be allocated on the stack. Which is a much faster operation.

How fast is allocation operation:

  • Allocations are cheap as far as fast path is used.
  • More complex allocation paths from time to time will trigger Garbage Collector.
  • Allocations of big objects in LOH is slower because it may be mainly dominated by zeroing memory costs.
  • Allocating a lot of objects may break generational hypothesis about an object’s lifetime which will lead to allocation of temporary objects which will give a lot work for Garbage Collector.

Premature optimisation is the root of all evil

We hear this mantra all the time and this is basically true. But knowing the business context we can start measuring GC in an early development stage and this will help us to avoid mistakes with optimisation in the wrong places.

We tend to use classes between methods with small amount of data and do not even think about alternatives.

With the following code sample we will see how memory management can be optimised with some small code changes:

Code

Using BenchmarkDotNet nuget package (v.0.12.1 at the moment of writing this article) on a simple console project we can do our measurements.

Starting with the simple benchmark runner:

1
2
3
4
5
6
7
8
public class Program
{
    static void Main(string[] args)
    {
        var benchmark = BenchmarkRunner.Run<ClassVsStruct>();
        Console.ReadKey();
    }
}

Our domain objects:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
public class OrderClass
{
    public string Firstname;
    public string Lastname;
    public ProductClass Product;
}

public class ProductClass
{
    public string Name;
    public decimal Price;
    public ushort Amount;
}

and their struct alternatives:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
public struct OrderStruct
{
    public string Firstname;
    public string Lastname;
    public ProductStruct Product;
}

public struct ProductStruct
{
    public string Name;
    public decimal Price;
    public ushort Amount;        
}

Benchmark class with “some” business logic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
[SimpleJob]
[MemoryDiagnoser]
public class ClassVsStruct
{
    private readonly OrdersService _ordersService = new OrdersService();
    private readonly CalculationService _calculationService = new CalculationService();
    
    // Benchmark helper to consume our data
    private readonly Consumer _consumer = new Consumer();
    
    [Params(1000)]
    public int Amount { get; set; }
    
    [Benchmark]
    public void Classes()
    {
        var result = new List<string>();
        var orders = _ordersService.GetOrderClasses(Amount);
        for (var i = 0; i < orders.Count; i++)
        {
            var order = orders[i];
            if (_calculationService.ByClass(order.Product) <= 100.0)
            {
                var name = $"{order.Firstname} {order.Lastname}";
                result.Add(name);
            }
        }

        result.Consume(_consumer);
    }

    [Benchmark]
    public void Structs()
    {
        var result = new List<string>();
        var orders = _ordersService.GetOrdersStructs(Amount);
        for (var i = 0; i < orders.Length; ++i)
        {
            ref var order = ref orders[i];
            if (_calculationService.ByStruct(ref order.Product) <= 100)
            {
                var name = $"{order.Firstname} {order.Lastname}";
                result.Add(name);                    
            }
        }
        result.Consume(_consumer);
    }

    [Benchmark]
    public void StructsWithArrayPool()
    {
        var result = new List<string>();
        var orders = _ordersService.GetStructsByArrayPool(Amount);
        for (var i = 0; i < orders.Length; ++i)
        {
            ref var order = ref orders[i];
            
            if (_calculationService.ByStruct(ref order.Product) <= 100.0)
            {
                var name = $"{order.Firstname} {order.Lastname}";
                result.Add(name);
            }
        }
        ArrayPool<OrderStruct>.Shared.Return(orders);
        result.Consume(_consumer);
    }
}

OrdersService for instantiating our objects:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
public class OrdersService
{
    internal List<OrderClass> GetOrderClasses(int amount)
    {
        var result = new List<OrderClass>(amount);
        for (var i = 0; i < amount; ++i)
        {
            result.Add(new OrderClass
            {
                Firstname = "x", Lastname = "x",
                Product = new ProductClass
                {
                    Name = "x", Amount = 1, Price = 100
                }
            });
        }
        return result;
    }

    internal OrderStruct[] GetOrdersStructs(int amount)
    {
        var result = new OrderStruct[amount];
        for (var i = 0; i < amount; ++i)
        {
            result[i] = new OrderStruct
            {
                Firstname = "x", Lastname = "x",
                Product = new ProductStruct
                {
                    Name = "x", Amount = 1, Price = 100
                }
            };
        }
        return result;
    }

    internal OrderStruct[] GetStructsByArrayPool(int amount)
    {
        var result = ArrayPool<OrderStruct>.Shared.Rent(amount);
        for (var i = 0; i < amount; ++i)
        {
            result[i] = new OrderStruct
            {
                Firstname = "x", Lastname = "x",
                Product = new ProductStruct
                {
                    Name = "x", Amount = 1, Price = 100
                }
            };
        }
        return result;
    }
}

And finally another service with more business logic just to avoid aggresive optimisations for measurement:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
public class CalculationService
{
    internal double ByClass(ProductClass product)
    {
        return 100.0;
    }

    internal double ByStruct(ref ProductStruct product)
    {
        return 100.0;
    }
}

Here ByStruct method takes value type argument by reference to explicitly avoid copying.

Benchmark results

The code based on structures allocates about half of the code based on objects. Which is a pretty good result if we call it very often!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// * Summary *

BenchmarkDotNet=v0.12.1, OS=macOS 11.1 (20C69) [Darwin 20.2.0]
Intel Core i9-9880H CPU 2.30GHz, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.101
  [Host]     : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT
  DefaultJob : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT


|               Method | Amount |     Mean |    Error |   StdDev |   Gen 0 |  Gen 1 |  Gen 2 | Allocated |
|--------------------- |------- |---------:|---------:|---------:|--------:|-------:|-------:|----------:|
|              Classes |   1000 | 59.90 us | 1.166 us | 1.388 us | 17.2119 | 4.5166 |      - |  141.3 KB |
|              Structs |   1000 | 53.81 us | 0.550 us | 0.515 us | 11.4746 | 2.0752 |      - |   94.4 KB |
| StructsWithArrayPool |   1000 | 56.16 us | 0.729 us | 0.682 us |  5.7983 | 0.7935 | 0.0610 |   47.5 KB |

Key word ref allowed us to avoid copying objects which reduced a lot memory traffic.

More optimisations

Use ValueTuple instead of regular Tuple.

This may significantly reduce overhead of returning mutiple values from a method. Since C# 7.0 we have nice syntax sugar like:

1
2
3
4
5
6
7
8
var tuple = ("a", 1);
// or
var tuple = (Name: "a", Price: 1);
tuple.Name
// and deconstructing with situation when we only need the first value:
(string name, _) = FetchData(order);
// or
var (name, _) = FetchData(order);

Use stackalloc

An array of structures is still allocated on the heap.

We can explicitly ask to allocate value types on the stack with stackalloc command. It returns a pointer to a requested memory region that will be located on the stack.

1
MyStruct* array = stackalloc MyStruct[10];

but in that case we need to add the key word unsafe to the method that will contain such code. Although there is a new solution using Span<T> that lets us get rid of unsafe

1
Span<MyStruct> array = stackalloc MyStruct[10];

NOTE: stackalloc can not be used with a managed type so if you have any fields that are reference type (even string) it won’t work!

stackalloc should be used for small buffers that do not exceed 1kB. If there is not enough stack space left the StackOverflowException will be rised. Populating a big memory region on a thread’s stack will bring a lot of its memory pages into working set which might be a wasteful approach if pages are not shared between other threads.

To be sure that StackOverflowException never happens you can use

1
2
3
RuntimeHelpers.TryEnsureSufficientExecutionStack();
// or
RuntimeHelpers.EnsureSufficientExecutionStack();

Each of these methods ensure that the remaining stack space is large enough to execute the average .NET Framework function. The current value is 128 kB for 64 bit and 64 kB for 32 bit environments. But it does not guarantee that it will be enough for a large stackalloc so thats why it is better to use it for small buffers.

Use ArrayPool

For large objects the stackalloc approach provides memory traffic and performance issues. Instead we can re-use objects from pool of preloaded objects.

Each of 17 buckets in the default ArrayPool contains arrays twice as large as the previous ones: 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288 and 1048576.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
var pool = ArrayPool<MyStruct>.Shared;
MyStruct[] buffer = pool.Rent(amount);
try
{
    Consume(buffer);
}
finally
{
    pool.Return(buffer);
}

Summary

In this blog post I showed how memory usage can be optimised with small code changes. But you have to be careful and think twice before stepping on the optimisation path. Any optimisation changes reduce code readability and break common code patterns and with the wrong usage they can actually decrease the performance.