How Thoroughly Are You Testing Your C Extensions?

You probably know how to find Python code that isn't exercised by your tests. Install coverage and run:

$ coverage run --source=SOURCEDIR setup.py test

Then, for a beautiful coverage report:

$ coverage html

But what about your C extensions? They're harder to write than Python, so you better make sure they're thoroughly tested. On Linux, you can use gcov. First, recompile your extension with the coverage hooks:

$ export CFLAGS="-coverage"
$ python setup.py build_ext --inplace

In your build directory (named like build/temp.linux-x86_64-2.7) you'll now see some files with the ".gcno" extension. These are gcov's data files. Run your tests again and the directory will fill up with ".gcda" files that contain statistics about which parts of your C code were run.

You have a number of ways to view this coverage information. I use Eclipse with the gcov plugin installed. (Eclipse CDT includes it by default.) Delightfully, Eclipse on my Mac understands coverage files generated on a Linux virtual machine, with no hassle at all.

lcov can make you some nice HTML reports. Run it like so:

$ lcov --capture --directory . --output-file coverage.info
$ genhtml coverage.info --output-directory out

Here's a portion of its report for PyMongo's BSON decoder:

lcov table

Our C code coverage is significantly lower than our Python coverage. This is mainly because such a large portion of the C code is devoted to error handling: it checks for every possible error, but we only trigger a subset of all possible errors in our tests.

A trivial example is in _write_regex_to_buffer, when we ensure the buffer is large enough to hold 4 more bytes. We check that realloc, if it was called, succeeded:

lcov source: No Memory

We don't run out of memory during our tests, so these two lines of error-handling are never run. A more realistic failure is in decode_all:

lcov source

This is the error handler that runs when a message is shorter than five bytes. Evidently the size check runs 56,883 times during our tests, but this particular error never occurs so the error-handler isn't tested. This is the sort of insight that'd be onerous to attain without a tool like gcov.

Try it for yourself and see: are you testing your C code as thoroughly as your Python?

You might also like my article on automatically detecting refcount errors in C extensions, or the one on making C extensions compatible with mod_wsgi.