Clangerizing (Part 3)

After a long and tedious disquisition on the specifics of implementing a RTTI system we get to the fun (!?) stuff ... generating the data, or clangerizing as I am calling it.

This is a two phase process
1. Using clang to parse the source code and extract symbols
2. Running a script over the output to generate the C++ code
As I often change my mind about the RTTI implementation, having the second part as a script makes it easier to pander to my whimsical nature.

For the clang bit, we are going to write a recursive AST visitor. There are plenty of examples of how to do this online already but one more won't hurt :) The sample code for this post is on github.

Clang is awesome, modular and easy to build tools that plug into the underlying libraries. Getting it to compile on the other hand, not so much.

Once you have compiled all the source, the easiest way to get started it to copy and modify an existing tool from the llvm/tools/clang/tools folder. You'll also need to modify the CMakeLists.txt in the folder above and regenerate the make files to add it to the list of things to be built.

Pro-tip! The make files with LLVM/Clang have a "fast" option if you just want to build a single project. For example:
make reflector/fast
So to the code ...

Getting clang to do its thing is just a case of parsing the command line and creating a ClangTool instance with a FrontEnd action that gets invoked for every file to be processed.
int main( int argc, const char* argv[] )
{
// file to parse
std::vector< std::string > files;
files.push_back( argv[1] );

// compilation options
auto options = FixedCompilationDatabase::loadFromCommandLine( argc, argv );

// run tool
ClangTool Tool( *options, files );
}


The FixedCompilationDatabase is just a complicated way of saying "read my compilation option from the command line" (defines, include paths, etc). Normally clang looks for these in a corresponding json file but if you are not using cmake then I find this an easier option to integrate it into my build environment.

Note that compilation options come after a "--" on the command line. So invocation of your tools would be something like this ...
reflector myfile.cpp -- -Wall -Isome/path -DDEBUG=1

Our front end action creates an Abstract Syntax Tree consumer, which unsurprisingly is an interface onto the syntax tree of our source code. There are various overridable functions but for our purposes we are going to find the top level tags (structs, classes, unions or enums) and pass off the rest of the work to our ASTVisitor, that recursively walks the underlying symbols.

class ReflectASTConsumer : public ASTConsumer
{
public:

virtual bool HandleTopLevelDecl( DeclGroupRef group ) override
{
// for each declaration

for( auto itr = group.begin(); itr != group.end(); ++itr )
{
// if it is a "Tag" (class, enum, etc)

if( auto decl = dyn_cast<TagDecl>( *itr ) )
{
// traverse it!

mVisitor.TraverseDecl( decl );
}
}

return true;
}

virtual void Initialize( ASTContext& Context ) override
{
// keep hold of context as we need it for getCommentForDecl
gContext = &Context;
}

protected:

ReflectVisitor  mVisitor;
};

class ReflectFrontendAction : public ASTFrontendAction
{
public:

virtual std::unique_ptr< ASTConsumer >
CreateASTConsumer( CompilerInstance& CI, StringRef file ) override
{
return make_unique< ReflectASTConsumer >();
}
};


The visitor class is similar. We implement the required function and use the API to get the things we want to write out ...

class ReflectVisitor : public RecursiveASTVisitor< ReflectVisitor >
{
public:

// when we "visit" a record declaration (struct, class or union) ...

bool VisitCXXRecordDecl( CXXRecordDecl* decl )
{
// get type and name

std::string type = decl->getKindName().str();
std::string name = decl->getQualifiedNameAsString();

// get base classes

for( auto& base : decl->bases() )
{
std::string type = base.getType().getAsString();
}

// get fields

for( const auto& field : decl->fields() )
{
std::string name = field->getName().str();
std::string type = field->getType().getAsString( pp );
}

return true;
}
};


That is pretty much all there is to walking our code and reading the declarations. However, there is one final step we need to add before we are done.

Our reflection library requires us to be able to add additional annotations to the source code. Things that are not present in the AST but required for validation or context, for example ranges of values ...

/// min=0, max=100
float someField;

Conveniently, clang provides a getCommentForDecl function for just this kind of purpose. This returns any comment in a "doxygen style" immediately above the declaration (or parent in the case of classes, inherited comments are very convenient![1]).

FYI these are special comment blocks that start with three '/' or two '*', i.e. look like this ...

/// this is a special comment

or

/**
* so is this ...
*/


NB: in the case of my reflection code, I look for an additional %% to mean a special "reflection instruction".

Comments take a little bit of work to unpack from clang ...

void GetComment( TagDecl* decl )
{
std::string str;

auto comment = gContext->getCommentForDecl( decl, nullptr );

if( comment == nullptr )
{
return;
}

for( auto commentItr = comment->child_begin();
commentItr != comment->child_end();
++commentItr )
{

{
continue;
}

++textItr )
{
if( auto textComment = dyn_cast<TextComment>( *textItr ) )
{
str += textComment->getText();
}
}
}
}


And ... huzzah! That is all there is to it :)

Clang handles all the C++ complexities for us and provide a function to get additional "meta data". So it is relatively straightforward to pull out the information we need. The next step is to connect the output from reflector to our RTTI data model, coming up in part 4.

The full source code is on github along with some sample output.

[1] Alternatively you could use __attribute(),  whilst not supported in MSVC could be easily defined out and only read by the parser. For example, if you wanted to use the macros instead of comments