(Val)grinding your code

One of of the most exoteric tool when working on software projects that developers should be using more often than they do, is certainly Valgrind. It’s a multi-faced tool: timing profiler, memory profiler, race conditions detector, … it can do all this because it’s mainly a simulated CPU, which executes the code, keeping track of what happens, and replacing some basic functions like the allocation routines.

So this works pretty well, to a point; unfortunately it does not always work out that well because there are a few minor issues: it needs to know about all the instructions used in the programs (otherwise it raises SIGILL crashes), and right now it does not know about SSE4 on x86 and x86-64 processors. And it has to know about the places in the C library where it cannot correctly keep scores via a series of suppression files. Turns out it currently fails at properly knowing about the changes in glibc 2.10, which makes it pretty difficult to find the real issues right in this moment with the most bleeding edge tools available.

But beside these problems with the modern tools and hardware, there are other interesting edges; one of these, the one I’m going to write about today, relate to the way the memory checker tool identifies leaks in the software: all the memory areas that cannot be reached in some way at the end of the execution, are considered leaked. Unfortunately this often enough counts in the memory areas that are allocated at one point and would just be freed at the end of the process, which can be said not to be a leak at all.

All the memory areas that are only to be freed at the end of a process don’t really need to be freed at all: the operating system will take care of that; leaks are those where the memory is used up and never freed nor used during execution; especially bad are those leaks that happen inside either iterative cycles or upon similar inputs, because they build up.

Actually freeing up the resources that does not need to be freed up can increase the size of the executable section of a binary (executable or library) for no good reason during production usage of the software, it is thus often considered a bad idea. At the same time, freeing them up makes it easier to find the actual bad leaks in the software. My choice for this is to write the actual source code to do the freeing up, but only compile it when debug code is enabled, just like assertions and debug output.

This way, the production builds won’t have unneeded code, while the debug builds will give proper results out on tools like valgrind. But it doesn’t stop at freeing the resources at the end of the main() function, because sometimes you don’t preserve reference to the actual memory areas allocated during initialization (since they are not strictly needed), or you just keep them as static local variables, which are both reported as unreachable by the end of the process, and are not easy to free up. To work this out, in feng, I’ve been using a slightly more sophisticated solution, making use of a feature of recent GCC and ICC versions: destructor functions.

Destructor functions are called within the fini execution path, after the main() and before the process is unloaded and the resource freed automatically. Freeing the memory at that point is the ideal situation to make sure that all the resources are freed up properly. Of course, this reduces the cases when the code is enabled to having both debug system enabled, and a compiler that supports the destructor functions. But this is a broad enough definition to allow proper development. A simple check with autoconf and the thing is done.

By switching the local static variables with unit-static variables (which, once compiled, are basically the same thing), and adding a cleanup function, almost all memory can be freed without having an explicit deallocation function. There is still the matter of initializing these memory areas; in this case there are two options: either you just initialize them explicitly, or you initialize them the first time they are needed. Both these options allow for the size of the structure to be chosen at runtime (from a configuration file), for example for the number of configuration contexts to initialize. The third way, that is to initialize them with constructor functions, also works out well for fixed-size memory areas, but since I don’t really have a good reason for which they could be better than the alternative the to just call a function during initialization, I don’t have any reason to take them into consideration.

For dynamic initialization, the interface usually used is the “once” interface, which also allows for multithreaded applications to have run-once blocks of code without having to fiddle with mutexes and locks; this is implemented by the pthread_once_t handling functions from the POSIX thread interface, or GOnce as provided by glib. This allows delayed allocation and initialization of structures which is especially useful for parts of the code that are not used constantly but just in some specific cases.

All the code can be found in your local feng repository or the nearest git server!