cat /blog/benchmark-testing-my-game-engine

Benchmarking My Game Engine

Category: Things I Wish I Did
Published:

It's something I wish I started doing earlier... because profilers drive me crazy.

We’ve all been there, right? You’re writing an app, it feels sluggish, something feels wrong. It’s not a bug, the app works fine, it’s just…there’s no reason typing in a text field should be this laggy on this hardware. You know what I’m talking about, right?

My game UIs feel that way a lot. So does my engine’s editor. I’ve always wanted to figure out why, and fucking fix it. I never could.

My Specs

I’m not just posting my PC specs to brag. Hardware is just as important to understanding what’s going on with slow code as it is understanding what the code is doing. A slow memory leak is going to take longer on a supercomputer with over 1TB of RAM than it would take to cause problems on your phone. CPU and GPU usage is also going to be less of an issue on a game developer’s workstation than it might be on your garden variety cash register.

With that in mind, this PC ain’t slow.

| Hardware | What I have | |:--------:|:----------------------------------------} | CPU | AMD Ryzen 7 3700X | | GPU | AMD Radeon RX 7600XT | | Memory | 64 GB available | | Disks | 2x 1TB NVMe SSDs, 2x 4TB spinning disks |

I’m not on the bleeding edge because I don’t need to be. But this isn’t a slow computer either. It handles any 1080p game I throw at it, and I’m too blind for 4K, HDR or raytracing to matter. I also don’t need to take advantage of AI upscaling because my eyes do a better job of screwing up text readability. Point being, if something is slow, I’m either compiling Plasma or something in the app is poorly optimized.

Having an extremely fast computer is a double-edged sword however. If something feels slow on my PC, it’s slow. But I may also do something that feels perfectly fast on my PC but kills your laptop in minutes.

And I can’t debug what I can’t see.

Profiling with dotTrace

This has been generally unsuccessful for me. I would typically build my engine’s editor for release, get it running on the profiler, and try to do things in the UI that feel slow. Things would either stop feeling slow under the profiler or… I just couldn’t figure out what the problem really was.

That’s not a dock against JetBrains’ tools. I just struggle to navigate them. They don’t display things in a way I know how to read.

Sometimes, I would get lucky and find out that something in my code genuinely is problematic. I’d rewrite it, and be worse off - or not feel a difference.

What I Care About

My engine, Inertia, is written in C#. I will die on that hill. I love C# as a language. I find it fun to write, easy to read, and I do not feel limited by it. I can make it do scary things.

The problem with C# and performance is how the language used to be used. C# started with a lot of inspiration from Java and object-oriented programming. This means that the runtime has a garbage collector because that’s what we did back then. Most objects are stored on the heap because that’s how we did it back then. Newer versions of the language, and of the runtime, allow you to do things differently.

GC Hits

The number one concern I have when tracking down slow code in my engine is GC hits. If you’ve played a Unity game and ever felt it hitch every now and then, every few seconds/minutes, GC hits are very likely what you are feeling.

The runtime knows you have a finite amount of RAM and the garbage collector makes sure things get cleaned up. Every now and then, the GC will decide to clean up any unused memory in the application, freeing space for new heap allocations. Your code cannot execute while the GC is collecting. No matter how fast your code is in other languages, if you trigger GC, it’s moot.

You can’t stop GC hits from happening outright. The only thing you can do is allocate less memory on the heap, and keep references active until you’re genuinely done with them. Keep the memory you need but don’t allocate more than you do.

Frame Time

I find most people who talk about C# are thinking about its use on the web with ASP.NET. If that’s all you’ve ever done, let me introduce you to something really annoying. Frame time is how much time the UI thread in an application spends working on a single frame in one window. We generally want it to be as little time as possible. In some cases, high (or unstable) frame time can make someone feel ill. As someone who cares about accessibility, I do not want that happening.

Knowing frame time is helpful for understanding that there’s a problem. But it’s not very helpful to measure it if your brain is already telling you it’s too high. What’s more important is understanding what is going on in that frame and how long it takes to process each thing going on. If you’re rendering lots of text and the engine feels sluggish when there’s lots of text, you probably want to know how long it takes to render text.

I have a display with 120hz refresh rate. I would prefer my games be able to build frames at that refresh rate. Most of the time I run the display at 60hz, which means I’m willing to accept 60 frames per second. Anything below that in my own game and we’re in trouble.

My engine, especially in editor mode, supports multiple windows. For best results, each window should be able to fully update and repaint in less than 16.7ms. That’s one second divided over 60 frames. This is how much frame time the engine has to work with.

Runtime Performance

This is why I was having trouble with dotTrace. It doesn’t matter how fast my code is, if I’m using the runtime APIs incorrectly or they’re just slow.

Maybe a dictionary isn’t the best way to track inventory? I should probably measure.

It Started With This Video

I’ve been listening to Nick Chapsas videos while trying to sleep.

In this video, he uses BenchmarkDotNet to compare different ways to iterate a collection in .NET. This is where I decided to start in my engine, because I have an entity component system. You will typically be iterating through an ECS every frame, making collection iteration operations a hot path. Prime real estate for performance gains.

Ways to Iterate Lists in C#

Benchmarking is about comparison. If I want to know how fast my code is, I need a frame of reference to measure it against. And there are quite a few ways to iterate collections in C#.

  • With foreach loops
  • With for and while loops
  • With LINQ
  • In parallel
  • With a Span<T>
  • With a ReadOnlySpan<T>
  • With a raw pointer
  • With fancy unsafe runtime methods

To make matters worse, there are different kinds of collections in C#. Arrays, lists, hash sets, dictionaries, custom collection types… even a Span can be thought of as a collection if it represents a stack-allocated buffer.

So I Wrote A Tool

I wrote a tool for my engine that benchmarks it using BenchmarkDotNet and a series of common tasks my code might perform. I started with collection iterations.

I decided to test with both a standard int[] array and a List<int>. This allows me to compare things like foreach on a list vs. foreach on an array. Each collection is filled with random data, and each test sums up the value of each element in the collection. I also tested with collections with varying sizes, starting with powers of two and then a torture test of 100 thousand items. Gradually, the test would grow the collections larger in size to see how bigger collections affect the measurements.

After my tool runs the benchmark, it prints it out to a fancy-ish Markdown table.

And here’s a report.

MethodElementCountMeanErrorStdDevMedianAllocated
ListForEachMethod4758.40 ns17.448 ns35.247 ns760.00 ns88 B
ListEnumeration4258.70 ns10.686 ns30.489 ns260.00 ns-
ListIndexing4144.28 ns7.098 ns20.252 ns140.00 ns-
ArrayEnumeration495.76 ns5.509 ns15.717 ns94.50 ns-
ArrayIndexing457.32 ns4.529 ns13.283 ns55.00 ns-
ArraySpanIndexing485.70 ns5.092 ns13.854 ns80.00 ns-
ListSpanIndexing4110.29 ns6.202 ns17.186 ns105.00 ns-
ArraySpanEnumeration4136.65 ns7.926 ns23.120 ns130.00 ns-
ListSpanEnumeration4157.24 ns6.188 ns18.050 ns160.00 ns-
UnsafeSpanIterationOverArray4148.07 ns9.543 ns27.226 ns149.00 ns-
UnsafeSpanIterationOverList4140.42 ns6.304 ns18.189 ns140.00 ns-
ListForEachMethod641,095.43 ns79.855 ns234.202 ns1,025.00 ns88 B
ListEnumeration64632.28 ns14.883 ns15.925 ns629.50 ns-
ListIndexing64504.56 ns11.184 ns10.985 ns501.00 ns-
ArrayEnumeration64381.32 ns9.158 ns15.798 ns380.00 ns-
ArrayIndexing64419.53 ns10.819 ns18.948 ns419.50 ns-
ArraySpanIndexing64469.56 ns11.462 ns17.845 ns470.50 ns-
ListSpanIndexing64460.67 ns11.385 ns21.661 ns460.00 ns-
ArraySpanEnumeration64447.27 ns10.258 ns9.595 ns440.00 ns-
ListSpanEnumeration64450.79 ns10.940 ns15.690 ns445.00 ns-
UnsafeSpanIterationOverArray64387.31 ns9.792 ns14.353 ns385.00 ns-
UnsafeSpanIterationOverList64422.63 ns9.875 ns10.976 ns420.00 ns-
ListForEachMethod2561,431.36 ns30.789 ns57.829 ns1,425.00 ns88 B
ListEnumeration2561,774.31 ns28.270 ns27.765 ns1,780.00 ns-
ListIndexing2561,616.08 ns15.308 ns12.783 ns1,620.00 ns-
ArrayEnumeration2561,383.40 ns22.728 ns21.260 ns1,380.00 ns-
ArrayIndexing2561,418.36 ns18.865 ns16.723 ns1,415.50 ns-
ArraySpanIndexing2561,426.33 ns16.600 ns15.527 ns1,425.00 ns-
ListSpanIndexing2561,454.45 ns31.634 ns36.429 ns1,445.00 ns-
ArraySpanEnumeration2561,430.43 ns13.062 ns11.579 ns1,430.50 ns-
ListSpanEnumeration2561,450.00 ns20.753 ns18.397 ns1,445.00 ns-
UnsafeSpanIterationOverArray2561,313.85 ns17.989 ns15.021 ns1,310.00 ns-
UnsafeSpanIterationOverList2561,355.33 ns25.191 ns23.563 ns1,360.00 ns-
ListForEachMethod40969,856.92 ns134.013 ns111.907 ns9,840.00 ns88 B
ListEnumeration409612,783.20 ns272.413 ns754.855 ns12,610.00 ns-
ListIndexing409612,878.00 ns191.451 ns169.716 ns12,885.00 ns-
ArrayEnumeration409611,104.36 ns223.734 ns198.334 ns11,070.50 ns-
ArrayIndexing40969,212.45 ns209.589 ns566.637 ns9,050.00 ns-
ArraySpanIndexing409610,224.40 ns277.808 ns792.602 ns9,910.00 ns-
ListSpanIndexing409610,089.07 ns339.951 ns991.652 ns9,711.00 ns-
ArraySpanEnumeration409612,085.38 ns614.500 ns1,792.526 ns12,255.00 ns-
ListSpanEnumeration409610,211.55 ns247.290 ns705.531 ns10,089.50 ns-
UnsafeSpanIterationOverArray40968,239.89 ns164.241 ns426.884 ns8,230.00 ns-
UnsafeSpanIterationOverList40968,472.12 ns256.992 ns733.214 ns8,285.50 ns-
ListForEachMethod1638435,095.62 ns376.907 ns314.734 ns35,091.00 ns88 B
ListEnumeration1638423,245.47 ns215.295 ns201.387 ns23,180.00 ns-
ListIndexing1638419,277.77 ns224.528 ns187.491 ns19,370.00 ns-
ArrayEnumeration1638415,680.08 ns183.495 ns153.227 ns15,650.00 ns-
ArrayIndexing1638414,256.98 ns276.218 ns538.742 ns14,180.00 ns-
ArraySpanIndexing1638415,741.64 ns232.688 ns206.272 ns15,730.00 ns-
ListSpanIndexing1638415,957.92 ns210.260 ns175.577 ns15,931.00 ns-
ArraySpanEnumeration1638414,579.69 ns465.352 ns1,327.675 ns14,145.00 ns-
ListSpanEnumeration1638414,329.85 ns248.622 ns513.447 ns14,310.50 ns-
UnsafeSpanIterationOverArray1638413,323.21 ns208.472 ns184.805 ns13,304.50 ns-
UnsafeSpanIterationOverList1638411,643.54 ns218.602 ns532.106 ns11,630.00 ns-
ListForEachMethod65536136,913.00 ns464.877 ns362.946 ns136,820.50 ns88 B
ListEnumeration6553659,566.67 ns638.983 ns597.705 ns59,650.00 ns-
ListIndexing6553643,622.77 ns775.087 ns647.233 ns43,790.00 ns-
ArrayEnumeration6553633,895.24 ns665.930 ns888.998 ns33,495.00 ns-
ArrayIndexing6553634,435.62 ns330.000 ns275.565 ns34,500.00 ns-
ArraySpanIndexing6553632,694.86 ns1,072.178 ns3,144.512 ns32,620.00 ns-
ListSpanIndexing6553634,444.92 ns602.428 ns503.055 ns34,560.00 ns-
ArraySpanEnumeration6553634,462.54 ns393.631 ns328.699 ns34,590.00 ns-
ListSpanEnumeration6553634,462.69 ns320.949 ns268.007 ns34,420.00 ns-
UnsafeSpanIterationOverArray6553624,682.77 ns492.278 ns751.760 ns24,160.00 ns-
UnsafeSpanIterationOverList6553625,561.85 ns439.303 ns366.838 ns25,410.00 ns-
ListForEachMethod100000213,432.96 ns4,237.520 ns5,940.411 ns211,737.00 ns88 B
ListEnumeration10000085,497.21 ns873.409 ns774.254 ns85,666.00 ns-
ListIndexing10000060,998.97 ns1,200.999 ns1,334.907 ns60,940.50 ns-
ArrayEnumeration10000046,505.93 ns958.795 ns2,766.343 ns47,311.00 ns-
ArrayIndexing10000047,451.87 ns575.375 ns538.206 ns47,291.00 ns-
ArraySpanIndexing10000045,763.64 ns909.068 ns2,212.798 ns45,870.00 ns-
ListSpanIndexing10000046,116.79 ns913.460 ns1,985.786 ns46,340.00 ns-
ArraySpanEnumeration10000046,233.10 ns924.958 ns2,593.682 ns46,381.00 ns-
ListSpanEnumeration10000047,106.38 ns699.318 ns583.962 ns47,286.00 ns-
UnsafeSpanIterationOverArray10000034,374.02 ns904.218 ns2,550.363 ns33,150.00 ns-
UnsafeSpanIterationOverList10000037,568.35 ns734.672 ns1,005.627 ns37,889.50 ns-

If you just scrolled past that and went “jesus fuck that’s a large table,” I agree. Jesus fuck, that’s a large table. But it’s giving me a lot of insight.

Allocations

The Allocations column tells you how much memory has been allocated on the heap during the test (not counting any allocations while preparing the test). None of the measured list iteration methods allocate heap memory, except for one: ListForEachMethod. That would be code that looks like this.

var list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);

var sum = 0;
list.ForEach(value => sum += value);

I never knew this function existed on List<T> until I measured it. Allegedly, people think it looks elegant to use. But you are paying an 88-byte heap tax when you use it. So…I wouldn’t use this method.

Execution time

The most important columns in my opinion to read are Mean compared with ElementCount. You statistician folks can read the other numbers, I don’t really understand them yet.

Each test in the report was executed several (likely thousands of) times by BenchmarkDotNet. The Mean column is telling me, on average, how long each test run took. It is in fact directly telling me how much frame time I would need to iterate a collection of that size in that way.

My Code’s The Fastest

I can tell you for certain that my code, out of all tests, is the fastest. Here’s how I know.

Enumeration and Iteration (Indexing)

In the chart, Enumeration is any time I use a foreach loop. There is an enumeration test for arrays, lists, and spans.

Iteration is any time I’m using a for loop and a loop counter. There is a test for lists, arrays, and spans.

The Unsafe Span Iteration tests are my code.

If we look at a 4-element array, iteration with a for loop is almost twice as fast on average as enumerating with foreach. If you compare enumeration vs. iteration of List, Array Span, and List Span, you will notice iteration is generally faster on average for a 4-element collection.

We generally don’t iterate over collections that small, so what matters more is if this pattern scales with larger element counts. Even in the torture test, iteration generally wins over enumeration. In most tests, the difference is negligible. Enumeration does win over iteration of an array, but iteration wins by tens of microseconds over enumeration on a list.

Unsafe Iteration

Across all tests, my Unsafe Span Iteration method is generally fastest. It is also faster on array spans than on list spans.

But it looks like this.

[Benchmark]
public int UnsafeSpanIterationOverArray()
{
    var span = array.AsSpan();
    var sum = 0;

    ref var reference = ref span[0];
    for (var i = 0; i < ElementCount; i++)
    {
        sum += reference;
        Unsafe.Add(ref reference, 1);
    }

    return sum;
}

[Benchmark]
public int UnsafeSpanIterationOverList()
{
    var span = CollectionsMarshal.AsSpan(list);
    var sum = 0;

    ref var reference = ref span[0];
    for (var i = 0; i < ElementCount; i++)
    {
        sum += reference;
        Unsafe.Add(ref reference, 1);
    }

    return sum;
}

I’m accessing the span of the collection, taking a reference to its first element, and using Unsafe.Add to add one element worth of offset to the reference every time I iterate. This is effectively like doing this in C.

int length = 100000;
int* array = malloc(sizeof(int) * length);
int* reference = array;
int sum = 0;

for (int i = 0; i < length; i++) {
}
    sum += *reference;
    reference++;
}

…And exactly as dangerous, too. If the array memory is ever moved or freed during iteration, this is going to be problematic. But it is the fastest, and if I can guarantee I won’t lose the memory mid-iteration, I’m inclined to use this in my engine’s ECS to squeexe as much performance out of it as possible.

Should I Be Writing C#?

If I care enough about performance to benchmark different ways of iterating over memory at a low level in the language, why am I not using C, Zig, Rust, or something more low-level for my engine?

The answer is because I love C#. It gives me the freedom to choose how I write my code. It doesn’t fail to compile because I mis-typed a space, nor does it yell at me about borrowing. And besides…

MethodElementCountMeanErrorStdDevMedianAllocated
UnsafeSpanIterationOverMatrix4x4Array41,696.88 ns32.863 ns32.28 ns1,694.50 ns-
UnsafeSpanIterationOverMatrix4x4Array6419,622.29 ns46.542 ns41.26 ns19,620.00 ns-
UnsafeSpanIterationOverMatrix4x4Array25677,102.17 ns103.041 ns80.45 ns77,080.50 ns-
UnsafeSpanIterationOverMatrix4x4Array4096323,963.70 ns3,547.935 ns3,318.74 ns322,388.50 ns-
UnsafeSpanIterationOverMatrix4x4Array16384111,209.92 ns321.903 ns251.32 ns111,230.00 ns-
UnsafeSpanIterationOverMatrix4x4Array65536423,498.53 ns5,247.143 ns4,908.18 ns425,014.00 ns-
UnsafeSpanIterationOverMatrix4x4Array100000661,515.67 ns12,987.558 ns18,206.74 ns652,446.00 ns-

The language ain’t gonna stop you from doing 100,000 matrix multiplications on the CPU every frame and completely neglecting that poor GPU over there just dying to do this instead. And you know what? It’s still decently fast for a one-off.

Inertia Benchmarking Tool: Coming Soon.

What this is missing is fancy graphs and a GUI. I plan on integrating BenchmarkDotNet into the Inertia editor, so you can benchmark your own game code within the editor and get a detailed report of how things are doing.