Valgrind and the R memory manager

Valgrind is a set of tools for detecting memory management bugs. Previously it ran only on x86 Linux, but version 3.0 supports AMD64 Linux and support for FreeBSD and for PowerPC Linux are under development. Typically Valgrind is used with unmodified binaries. It runs the binary in a CPU emulator and tracks memory allocations and initialisations. This approach is limited when a program does its own memory management. In R, memory becomes inaccessible to a correctly functioning program when it is garbage collected, and integer, logical and numeric vectors are uninitialized when allocated, but Valgrind does not know this.

Valgrind provides a `client request mechanism' for programs to provide information about their own memory management. This has been added to R-devel. There are four levels of instrumentation, governed by the macro VALGRIND_LEVEL.

There is a configure option to set VALGRIND_LEVEL,
configure --with-valgrind-instrumentation=## 
where ## can be 0, 1, 2, or 3. The default is 0. At the moment there is no configuration check that the platform is compatible with valgrind when a level > 0 is specified. Any problems will appear at compile-time. The potential for problems occurs on x86 platforms other than Win32 and Linux, and on PowerPC platforms other than Linux.

Both levels of instrumentation will catch more bugs when used in conjunction with gctorture(TRUE). I have added targets test-Valgrind and test-Vgct to tests/Makefile. These run the same code as test-Gct under Valgrind and Valgrind + gctorture() respectively. They report to standard output all messages from Valgrind.

It may be useful to add a fourth level of instrumentation to cover the header fields of the memory nodes.


Running test-Gct and no-segfault.R under Valgrind has found five bugs so far. One is purely theoretical (when using unary "!" with two arguments). Two are briefly unprotected pointers that might theoretically cause heisenbugs at some point. The final two are real, if not terribly major: parse(,n=0) used a status variable that was never set, and regexpr applied STRING_ELT to the pattern argument before coercing it to a string, so that
> regexpr(NA,"NANA")
[1] 1
attr(,"match.length")
[1] 2
> regexpr(as.character(NA),"NANA")
Error in regexpr(pattern, text, extended, fixed, useBytes) :
        invalid argument

Thomas Lumley. 2005-8-9