Reflections on Introspection (Part 2)

Following on from part 1, the idea is to create a run-time type system using clang to auto-generate the content.

But before we can get onto all the fun stuff, we what does the run-time type data look like anyway? If you want to play along at home, I have thrown the code up on github.

One of the things we want to do is automatic serialisation, because writing load and save code is just dull. For example, given some data we want to walk the structure and convert it to JSON. So given a class, we need a name, description of its fields, types and so forth. Something like this ...
    struct TypeInfo
    {
        // name
        // list of fields
        // list of base classes
        // any custom attributes
    };
For starters, we should probably be able to grab the type data for any class. C++ gives us the typeid() operator for static types. This is a nice feature so lets steal it for our own Type() function :)
    Thingamajig thingy;

    TypeInfo* type = Type< Thingamajig >();
    TypeInfo* type = Type( &thingy );
Which we can implement with templates by deriving a TypeInfo to hold the data for each specific type.
    template< typename T >
    struct TypeInfoImpl : public TypeInfo
    {
        static const TypeInfo* GetType()
        {
            static TypeInfo info;
            return &info;
        }
    }
And then our Type functions becomes ...
    template< typename T > inline const TypeInfo* Type()
    {
        return TypeInfoImpl::GetType();
    }

    template< typename T > inline const TypeInfo* Type( const T* )
    {
        return TypeInfoImpl::GetType();
    }
Which is quite neat, with just this we can start working with (static) types by getting the TypeInfo and comparing them.
    float f;

    if( Type( &f ) == Type<float>() )
    {
         // do some things here
    }
Of course we want to fill out TypeInfo with all the details, which is where our automagical clang tool comes into play, but more on that later.

This we will implement with a “Create” function that we specialise for each type and call during instantiation.
    template<> void TypeInfoImpl< Vector3 >::Create()
    {
         // do the things
    }
Sounds easy enough, just a couple of niggly little details to make it work nicely for multithreaded code and windows DLL’s.

Even though we will only be calling Create() during instantiation, there is still a possibility that two threads could end up inside here at the same time causing “bad things” to happen.

We could guard the code with mutexes but that seems a bit wasteful. It is much easier to ensure everything is setup during the global construction phase (guaranteed thread-safe, where as scoped static initialisation is not).

This also allows us to separate the acquisition of the TypeInfo pointer from its instantiation. This might not be an obvious benefit but may become a little clearer when we consider dynamically loading types from DLL's.

So ... mix in a little auto-registration
    template< typename T >
    struct Register
    {
        Register()
        {
            const TypeInfo* info = TypeInfoImpl<T>::GetType();
            reinterpret_cast< TypeInfoImpl<T>* >( info )->Create();
        }
    }
And wrap it up in a macro for convenience.
    #define REGISTER_CONCAT(a,b) a##b
    #define REGISTER_CREATENAME(c) REGISTER_CONCAT( __rtti_, c )
    #define REGISTER(T) \
        static Register<T> REGISTER_CREATENAME( __COUNTER__ )();
I’m using __COUNTER__ here to create a unique name for our registrant, type names could contain funky characters so we can't just append it to the name[1].

Then so long as we register our types in advance it all works.

Shared libraries are slightly more problematic.

Windows DLL's get their own copy of global and static variables. This means unless we attach a __declspec( dllexport ) / __declspec( dllimport ) as appropriate to our symbol, they are statically linked to each module individually, i.e. those static TypeInfo's in the GetType() function are unique to each module.

Right now Type<float>() will return a different pointer depending on which module called it. The standard C++ RTTI has the same issue, which is why std::type_info has the hash_code() function to compare types.

We could add __declspec( dllexport ) to the TypeInfoImpl template, which will work so long as we are happy linking to and declaring all our types in the same module. However, on a large code base it is probably more preferable to be able to define types in multiple modules and load them dynamically without any linkage shenanigans.

This means I could write a serialisation function in one module that can read or write any type from any module, without actually linking to the implementation or including the headers. This is the power of reflection!

The solution here is to create all the TypeInfo's in one place during registration and for GetType() to look up the type by name and hold the pointer, rather than the TypeInfo itself. For loading and script binding we will need to keep a registry of types we can look up by name anyway so this ties in nicely.

Something like this ...
    __declspec( dllexport ) const TypeInfo* FindOrCreate( const char* );

    template< typename T >
    struct TypeInfoImpl : public TypeInfo
    {
        // either find or create the metadata entry based on it's type name<
        static const TypeInfo* GetType()
        {
            static const TypeInfo* info = FindOrCreate( typeid(T).name() );
            return info;
        }

        // we specialise Create() per type and call it once from 
        // the module that registers it
        void Create();
    }
But hang on … doesn’t typeid(T).name() require the standard RTTI to be enabled? Yes it does. But didn’t you say most games turn off RTTI because games programmers are paranoid control freaks? Well that sounds like something I would say.

Every game I have worked on has had RTTI disabled. Maybe the reasoning for turning it off does not really hold any more but old habits die hard so let us assume we cannot rely on it being available. This means we need another way of identifying the type.

I have seen some implementations "declare" types in headers to solve this problem, which is OK I suppose but I am getting old and having more things to remember is too much for my little brain. I like the elegance of it "just working".

A neat little trick is to use the __PRETTY_FUNCTION__ define (or if we are using MSVC it is called __FUNCSIG__). This expands the function call as a string, including the template variables. Looks something like this ...
    "const struct TypeInfo *__cdecl TypeInfoImpl<struct Vector3>::GetType(void)"
So we can use this to create a unique identifier for the type.
    static const TypeInfo* GetType()
    {
        #ifdef  _MSC_VER
            static const TypeInfo* info = FindOrCreate( __FUNCSIG__ );
        #else
            static const TypeInfo* info = FindOrCreate( __PRETTY_FUNCTION__ );
        #endif

        return info;
    }
So that is the basic framework for our registry of types that supports dynamic loading and multithreading.

Working with run-time types (as opposed to the static compile time types so far) is just a case of adding a virtual function to each class that requires RTTI.
    virtual const TypeInfo* GetType() const { return Type( this ); }
Simples.

All that remains is to fill in the TypeInfo struct, which our snazzy clang tool will take care of for us but if you can take a sneak peek at the end result here.

The only final thing worth some discussion is offsetof().

Part of  type introspection is to look up member variables of a class by name and get pointers to them. A typical implementation is to use the offsetof() macro.

This works something like this ...
    #define offsetof( Type, Member ) ((size_t)(&((Type*)nullptr)->Member))

    struct MyThingy
    {
         float someField;
    };

    size_t offset = offsetof( MyThingy, someField );

    MyThingy thingy;
    float* pSomeField = (float*) ( (char*)(&thingy) + offset );
The idea is that we store the offset in bytes for a given member variable alongside the name, so at run-time we can work out the actual address for given an instance of that class. This works for plain old data structures and probably most cases we care about.

However, this falls over for more complicated data layouts (virtual inheritance) and is dubious with multiple inheritance.

As we are going to generate the TypeInfo data anyway we can do better by adding a getter function that will work with virtual inheritance. I also have the vague notion that we could override this with some kind of per-type customisation in the future, which will probably never happen but humour me :)

So for fields, we add a getter function ...
    struct Field
    {
        const char*     Name;
        const TypeInfo* Type;
        void* (*Get )( void* ); // getter function (takes pointer of object)
    }
And we can generate a custom lambda to fill it in.
    fields[ 2 ].Get = []( void* o ) -> void* { return &reinterpret_cast<vector3>(o)->z; };
I have missed out a few details but hopefully this covers the important points.


[1] Just a little aside about the registrant pattern in C++. If the registrant classes are included directly in an exe or DLL then they will work fine. It is possible for them to be missed out if they are contained in a static library. Static libraries are really just a collection of object files, linkers only pull in an object file if a symbol from that file is referenced, but that is outside the scope of this article ;)

Comments

Popular posts from this blog

Game Development in a Post-Agile World

Comments

Polyphasic Sleep - Dymaxion Day 1