Valgrind

Valgrind Command-line Options

--tool=nameUse the Valgrind tool named name. The default name is memcheck
-vVerbose mode
-dShow debug info
--trace-children=no|yesValgrind-ise child processes (follow execve)
--track-fds=no|yesTrace open file descriptors
--trace-malloc=no|yesTrace client malloc
--time-stamp=no|yesAdd timestamps to log messages
--log-fd=numberLog messages to file descriptor [2=stderr]
--log-file=fileLog messages to file named file
--log-socket=ipaddr:portLog messages to socket named ipaddr:port
--demangle=no|yesAutomatically demangle C++ names (for certain tools only)
--read-var-info=yes|noUse debug info on stack and global variables for better error message (for certain tools only)
--trace-flags=XXXXXXXXShow trace after which phase. XXXXXXXX can be
10000000: Show after the 1st phase
01000000: Show after the 2nd phase
...
--trace-notbelow=XXXXXXXXDo not show trace after which phase.
--profile-flags=XXXXXXXXSimilar to --trace-flags, but show generated IR code after which phase.
--debug-dump=levelDump debug info. level can be syms, line, frames.

Must be used with -d option.

--trace-syscalls=no|yesShow system call details (like strace)
--trace-signals=no|yesShow signal handling details
--trace-symtab=no|yesShow symbol table details
--trace-cfi=no|yesShow call-frame-info details
--trace-redir=no|yesShow redirection details
--trace-sched=no|yesShow thread scheduler details
--wait-for-gdb=yes|noPause on startup to wait for gdb attach
--sym-offsets=yes|noShow symbols in form 'name+offset'
--vex-iropt-verbosity=nShow IR Optimization details. n ranges from 0 to 9
--vex-iropt-level=nControl IR Optimization level. n ranges from 0 to 2. n=0 means no optimization
--vex-iropt-precise-memory-exns=no|yesPrecise exceptions hadling required.
--vex-iropt-unroll-thresh=nUnroll the loops with maximum of loop count n. Default is 120.

Valgrind Core

(Disclaimer: Much of the following text is taken from the papers Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation by Nicholas Nethercote and Julian Seward, and Optimizing Binary Code Produced by Valgrind by Filipe Cabecinhas, Nuno Lopes, and Renato Crisostomo)

Valgrind's core is split in two: CoreGrind and VEX. VEX is responsible for dynamic code translation and for calling tools' hooks for IR (Intermediate Representation) instrumentation, while CoreGrind is responsible for the rest (dispatching, scheduling, block cache management, symbol name demangling, etc..)

Code translation is done by VEX and it is done in the following phases

  1. Code Disassembly
    Conversion of the machine code to VEX's machine-independent IR. The IR is based on single-static-assignment form and has some RISC-like features. Most machine instructions get disassembled to more than one IR opcodes.

    For x86_64, this phase's source files are

  2. IR Optimization
    Some standard compiler optimizations are applied to the IR, including dead code elimination, constant folding, common subexpression elimination, etc.

    The main source file is ir_opt.c, and the main entry function is do_iropt_BB. The level of optimization can be controlled by Valgrind's command-line option --vex-iropt-level=n, with n=0 being no optimization at all.

  3. Instrumentation
    VEX calls the Valgrind tool's hooks to instrument the code.
  4. IR Optimization
    Similar to the previous optimization pass, albeit a little simpler, i.e. only dead code elimination.

    This phase's code is inside LibVEX_Translate function: Search for the comment Do a post-instrumentation cleanup pass.

  5. Tree Building
    Transform the flat IR to tree IR, to simplify the next phase.

    The main source file is ir_opt.c, and the main entry function is ado_treebuild_BB.

  6. Instruction Selection
    Conversion of the IR to machine code. This phase still uses virtual registers.

    For x86_64, this phase's source file is host_amd64_isel.c. The main function involved is iselStmt. Each statement involves some expressions, which involves some operators (enum type IROp, defined in libvex_ir.h). For example, if an expression is Iop_Add32Fx4 (which comes from the disassembly of the SSE instruction ADDPS in guest_amd64_toIR.c), then in host_amd64_isel.c it will generate machine-specific instruction AMD64Instr_Sse32Fx4 with operator equal to Asse_ADDF (which is part of the enum type AMD64SseOp defined in host_amd64_defs.h). This instruction is also tagged with Ain_Sse32Fx4 to indicate it is a vectorized instruction (summing two vector of 4 single-precision floating-point numbers.)

  7. Register Allocation
    Allocates real host registers to virtual registers, using a linear scan algorithm. This phase can create additional instructions for register spills and reloads (especially in register-constrained architectures like x86).

    The main source file is host_generic_reg_alloc2.c, and the main entry function is doRegisterAllocation.

  8. Code Generation
    Generates the final machine code, by simply encoding the previously generated instructions and storing them to a memory block.

    For x86_64, this phase's source file is host_amd64_defs.c. The main function involved is emit_AMD64Instr. It uses the information passed from phase 6, e.g. Asse_ADDF and Ain_Sse32Fx4, to generate the machine code of ADDPS

Client & Valgrind

Valgrind has a trapdoor mechanism via which the client program can pass all manner of requests and queries to Valgrind and the current tool. For example, by examining the value of RUNNING_ON_VALGRIND (this is a macro defined in valgrind.h), the client program can tell if it is running on Valgrind or on a real CPU.

(See here for other trapdoors/client requests)

This mechanism is implemented in Valgrind as follows (See Memory Debugging of MPI-Parallel Applications in Open MPI by Rainer Keller, Shiqing Fan, and Michael Resch) The client program calls VALGRIND_DO_CLIENT_REQUEST, which contains a special platform-dependent instruction preamble (__SPECIAL_INSTRUCTION_PREAMBLE in valgrind.h). Valgrind can detect it and steer the instrumentation. This preamble is usually a series of rotations which will not change the original value after the rotations. On x86_64, this preamble is

   rolq $3,  %edi
   rolq $13, %edi
   rolq $61, %edi
   rolq $51, %edi
which is just rotating left the 64-bit register %rdi by 3,13,61, and 51 (a total of 128).

Function Wrapping

Valgrind allows calls to some specified functions to be intercepted and rerouted to a different, user-supplied function. For details, see here.

Valgrind source code annoyances

CoreGrind and many of Valgrind tools tend to use VG_ and ML_ symbol prefixings in function or variable naming. This prevents source code editor/browser from recognizing them. The VG_ and ML_ macros are defined in pub_tool_basics.h. One can also verify this by
 nm -n coregrind/libcoregrind-amd64-linux.a
and will see that VG_(str) expands to vgPlain_str and ML_(str) to vgModuleLocal_str.

To fix this, in UltraEdit, open the Replace dialog, enable Unix-style Regular Expressions, and replace

VG_\(([a-zA-Z0-9_]+)\)
with
vgPlain_\1
, and replace
ML_\(([a-zA-Z0-9_]+)\)
with
vgModuleLocal_\1
.

Adding new tools or new source files to Valgrind

See this link.

The required autogen.sh can be downloaded here.

Change the default tool from memcheck to your own tool

Modify coregrind/launcher-linux.c. In main(), change the following
   if (toolname) {
      vgPlain_debugLog(1, "launcher", "tool '%s' requested\n", toolname);
   } else {
      vgPlain_debugLog(1, "launcher",
                          "no tool requested, defaulting to 'memcheck'\n");
      toolname = "memcheck";
   }