Hackweek 2020

Last week was Unity's annual "hackweek", where every engineer in the company meets in the same physical space to work on a project with a team of their choosing. We're not supposed to have any other responsibilities (e.g., daily stand-ups or other meetings). It's just about creating. Sometimes, projects that begin as hackweek projects end up shipping, but most don't, and that's okay.

Here's what I get out of it personally:

This year was a bit different: we did a stay-at-home virtual hackweek which made extensive use of Zoom. Since part of my team was in Europe (and I am on the west coast of the US), I tried to shift my schedule to work late at night (midnight here is 9am in Copenhagen) and sleep in in the morning. It didn't really work, and we had a few hours overlap each day. Still, it was good to catch up with folks during those small windows.

The project

I've been thinking about the right way to surface an SDK written in some other language, like Objective-C, C, or even (shudder) JavaScript in C#. Why not just write in those languages? If you have an Objective-C file in your project, Unity will happily add it to the generated Xcode project, so you can just write Objective-C. But that wouldn't feel very Unity-like, would it? It would feel foreign to our users. Plus, at Unity, we love C#; everything should be C# (well, syntactically, anyway)!

Unity has taken an interesting approach with the Burst compiler and "High Performance C#" or HPC#. We're basically making a new programming language for games, but using the existing C# syntax as a starting point. It reminds me a lot of what Jonathan Blow is doing with Jai, except we decided to stay with C#. But part of that philosophy is that you should do everything in the one programming language you know well, and for Unity, that means C#. Sure, the core engine is still C++, but as we move more code to packages, a lot of it gets rewritten in C#. Even our build system is written in C#.

So it makes sense that if I want to talk to, say, ARKit (an Objective-C API), there should be C# bindings for it. The API should be exactly the same, but I want to write C#. Microsoft have already done something like this with Xamarin, so maybe we could just use that?

The problem with Xamarin is that it's not really want you want for a real-time application. The natural mapping from an Objective-C object to a C# object would be to create a C# class, that is, a reference type, for each Objective-C object. But in Objective-C, everything is an object. If you want an array of integers, then you're looking at an NSArray<NSNumber*>*, where an NSNumber is an object. That's a lot of objects.

In a normal frame, I need to make several calls into ARKit to check on its current state, which, with a scheme where every Objective-C object is a C# object, means dozens of GC allocations. And normally, nothing has changed between successive frames. That's a lot of GC allocations just to determine that nothing happened.

Plus, I want to be able to write Burst-compilable functions that make ARKit SDK calls, so reference types are right out.

Okay, that's no problem. All we really want is a way to hold onto an Objective-C pointer in a type-safe way. So perhaps an ARFrame would look something like this:

public struct ARFrame
{
IntPtr m_Self;
public double Timestamp => GetTimestamp(m_Self);
public ARCamera Camera => GetCamera(m_Self);
...
}

So, we've got an Objective-C pointer, and we've got some type safety to make sure we only call ARFrame related methods on this pointer. But what about object lifetime?

Object lifetime

Objective-C's memory model uses something called Automated Reference Counting or ARC. Every Objective-C pointer is reference counted, kind of like a std::shared_ptr in C++. In the old days, you had to explicitly call invoke selectors retain and release on objects to increment and decrement the reference count, respectively. This was, of course, error prone. So they changed the compiler to automatically invoke retain and release for you whenever you assigned one pointer to another. Pretty clever, right?

Nowadays, you don't have to think about memory management because ARC takes care of it. As long as you stay in Objective-C, that is.

C# does something similar but uses a mark-and-sweep garbage collection algorithm to automatically manage memory for you (Unity, specifically, uses the Boehm-Demers-Weiser garbage collector). So again, you don't have to think about it. If you stay in C#, that is.

What happens when you start passing pointers from Objective-C to C#? We're back to manual memory management since ARC can't track pointers that we've passed to C#. That means we need to modify our struct like this:

public struct ARFrame
{
IntPtr m_Self;
public double Timestamp => GetTimestamp(m_Self);
public ARCamera Camera => GetCamera(m_Self);

public void Retain() => Retain(m_Self);
public void Release() => Release(m_Self);
...
}

Maybe we could make it IDisposable and wrap uses of it in a using block to give us some extra safety. But that's still error prone, and certainly unexpected to an experienced Unity user.

This is probably why bindings like Xamarin use C# classes for each Objective-C object, since they can invoke a release call in the object's finalizer. But we don't have that luxury if we want to keep using structs.

What we can do is change the IL generated by the C# compiler. For example, if you write

ARFrame frame1 = ...;
ARFrame frame2 = ...;
frame1 = frame2;

What you really want is

ARFrame frame1 = ...;
ARFrame frame2 = ...;

frame1.Release(); // release the old ARFrame
frame1 = frame2;
frame1.Retain(); // add a ref count to the one we just assigned

Well, we can do that automatically!

If you know anything about Unity's ECS, you've probably seen code like this

Entities.ForEach((Entity entity, ref Position position, ref Rotation rotation) =>
{
position.x++;
rotation = calculateNewRotation();
});

Wait, aren't lambdas bad because they create GC allocations? Well, not if we just rewrite the generated IL. This is how ECS gets away with this -- in fact, they Burst-compile the function if they can and patch the callsite. It's pretty slick.

So my project was to see if I could do something similar to look for assignments between C# types that represent native objects and insert retain and release calls as appropriate. This is conceptually similar to what the Objective-C compiler does when you assign one Objective-C pointer variable to another.

There's a really great online tool called Sharplab -- it's basically Compiler Explorer for C#. It can show you the raw IL instructions from any C# snippet.

As you can see from this example

using System;
public class C {
public struct A {}
public void M() {
var a = new A();
var b = new A();
a = b;
}
}

generates IL instructions like these:

.method public hidebysig
instance void M () cil managed
{
// Method begins at RVA 0x2050
// Code size 20 (0x14)
.maxstack 1
.locals init (
[0] valuetype C/A a,
[1] valuetype C/A b
)

IL_0000: nop
IL_0001: ldloca.s 0
IL_0003: initobj C/A
IL_0009: ldloca.s 1
IL_000b: initobj C/A
IL_0011: ldloc.1
IL_0012: stloc.0
IL_0013: ret
} // end of method C::M

stloc.0 means "Pop a value from stack into local variable 0". All stores look something like this. Storing member fields are a different instruction, but still a "store" instruction. Using Cecil, it's straightforward to find these types of instructions, and find the type of the thing being stored. If it's one of the objects we care about, then we can insert Release and Retain calls as appropriate.

This sort of thing is called "IL post processing" and we do it all over the place. It feels like magic when it works, but it's also a giant headache when you get it wrong.

The thing I'm most excited about is how this affects lambdas. Earlier I said C# lambdas were bad, but sometimes you gotta use 'em anyway. I have cases where I want to write something like

public void OnFrame(ARFrame newFrame)
{
DoSomethingNextFrame(() =>
{
Debug.Log($"{newFrame.Timestamp}");
});
}

In this example, the lambda has captured the variable newFrame, but we might not use it until later. newFrame is likely to have been destroyed by that time, so we have a use-after-free bug. That's because newFrame is really just an opaque pointer, and C# doesn't know how to extend its lifetime to match that of the lambda.

But we do.

How do lambdas work anyway?

Lambdas can work in different ways depending on context, but often times, the compiler generates a new class (usually called something like <>__DisplayClass1_0) and any variables that are captured are just fields in that class. The lambda function itself is a method on the display class.

So this (sharplab)

public void M(int a, int b)
{
Action foo = () => { a = b; };
}

becomes this

[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
public int a;

public int b;

internal void b__0()
{
a = b;
}
}

public void M(int a, int b)
{
<>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
<>c__DisplayClass0_.a = a;
<>c__DisplayClass0_.b = b;
Action action = new Action(<>c__DisplayClass0_.<M>b__0);
}

Note the assignments

<>c__DisplayClass0_.a = a;
<>c__DisplayClass0_.b = b;

Since our IL post processor inserts retain calls after an assignment, the native objects' lifetimes are extended and survive long enough to be used in the lambda. But when do we release them? This is clearly a memory leak since we have an unmatched retain. Notice the DisplayClass has no finalizer, but, since it is a reference type, we can just give it one. Finalizers are unpredictable -- you never know when the garbage collector will run -- but that's okay in this case. We just need it to get cleaned up at some point after we are finished using it. Objective-C's retain & release are atomic, so it's also okay that GC runs on a separate thread.

What's next

I was able to get a proof-of-concept going, but the devil's in the details. The generated IL can look very different in different contexts and it can be tough handling them all. It also doesn't account for a case like this

if (frame.Camera.TrackingState == TrackingState.Normal)
{
// ...
}

In this case, Camera is an object, so the property getter frame.Camera actually retains a new object which is never released. That could be pretty surprising. IL post processing could help, but I think I have a better idea. More on that another time.