A Technical Look at QEMU's Tracing Debugging Tool

This article was first published on the GTOC WeChat account. It is based on QEMU’s tracing documentation, located at: docs/devel/tracing.rst

QEMU has a very useful debugging tool, tracing, which can be used to track the execution of internal QEMU functions and for performance tuning.

For example, to trace a guest program’s memory accesses, you can print the read/write records of QEMU memory regions, as long as the corresponding trace-events are registered.

Let’s first discuss the simplest way to start using tracing.

Quick Start

In QEMU’s startup options, add a trace parameter to specify the events to trace. Here we use guest memory-access events as an example:

$ qemu-system-riscv64 -M virt --trace "memory_region_ops_*" # * matches the preceding characters
...
719585@1608130130.441188:memory_region_ops_read cpu 0 mr 0x562fdfbb3820 addr 0x3cc value 0x67 size 1
719585@1608130130.441190:memory_region_ops_write cpu 0 mr 0x562fdfbd2f00 addr 0x3d4 value 0x70e size 2

We can find the relevant mr trace-events in QEMU’s source tree in system/trace-events:

# memory.c
memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u name '%s'"
...

If you want to enable multiple trace-events, just append --trace name to the startup options. To avoid long command lines, you can put the trace-events you want to follow into a configuration file and load it:

$ echo "memory_region_ops_*" >/tmp/events
$ echo "kvm_*" >>/tmp/events
$ qemu-system-riscv64 -M --trace events=/tmp/events ...

Tracing can also write output to a file. Modify the QEMU command above as follows:

$ qemu-system-riscv64 -M --trace events=/tmp/events,file=/tmp/event.log ...

If you do not want to enable tracing in QEMU’s startup options, you can also turn it on dynamically in the QEMU monitor, which is more flexible:

$ qemu-system-riscv64 -M virt -monitor stdio -S -display none
(qemu) trace-event memory_region_ops_read on
(qemu) c
...
memory_region_ops_write cpu 0 mr 0x55a289a24d80 addr 0x10000000 value 0x78 size 1 name 'serial'

Using tracing in the monitor has another advantage: you can use Tab completion for commands, instead of manually searching the source tree for the trace-events supported by each component.

You can also use the info trace-events command to query supported trace events.

You can also use the trace-file command to write trace logs to a file.

Tracing supports multiple backends. By default, QEMU’s log backend is used. For simple debugging scenarios, you do not need to build QEMU yourself; you can use the QEMU packages provided by Linux or Windows distributions directly.

Introduction to trace-events

At every directory level in the QEMU source tree, you can add a trace-events file. As long as its relative path is declared in the top-level meson.build, you can add custom trace-events to it:

if have_system
  trace_events_subdirs += [
    'accel/kvm',
    'backends/tpm',
    'ebpf',
    'hw/arm',
    ...
  ]
endif

During the QEMU build, each trace-events file is processed by the tracetool script, which automatically generates trace-related code under the [builddir]/trace/ path. The generated files mainly include:

trace-[subdir].c
trace-[subdir].h
trace-dtrace-[subdir].h
trace-dtrace-[subdir].dtrace
trace-dtrace-[subdir].o
trace-ust-[subdir].h

Here, [subdir] means replacing / in the subdirectory path with _.

For example, accel/kvm becomes accel_kvm, and the generated trace-[subdir].c file is trace-accel_kvm.c.

The trace-events files from all subdirectories are merged into a single trace-events-all file, which is also generated under [builddir]/trace/.

That file is also installed under /usr/share/qemu. The merged file is used by QEMU’s simpletrace.py script for later analysis of trace records in the simple trace data format.

Source files in the source tree do not include the trace source files generated under the build directory directly. Instead, they include the local trace.h file, without any subdirectory prefix, via #include.

For example, io/channel-buffer.c includes it like this:

#include "trace.h"

In addition, we must manually create io/trace.h,

and include the corresponding trace/trace-[subdir].h file in it. That file is generated in [builddir]:

$ echo '#include "trace/trace-io.h"' > io/trace.h

Note that although trace.h can be included from outside the subdirectory where the source file lives, that is usually not recommended.

It is strongly recommended that every trace-event be declared directly in the subdirectory that uses it.

The only exception is shared trace events defined in the trace-events file at the top level.

Trace files generated for the top-level directory use the trace/trace-root prefix, not just trace, to avoid ambiguity between trace.h in the current directory and files in the top-level directory.

Adding a New trace-event

Adding a new trace-event takes only two steps: first declare the trace-event in the trace-events file in the corresponding directory, then add the function call for that event in the source file you want to debug.

Using QEMU memory allocation and release tracing as an example, the trace-event format is as follows:

qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p"
qemu_vfree(void *ptr) "ptr %p"

Each event declaration starts with the event name, then its arguments, and finally a format string used for pretty printing.

The format string should reflect the types defined by the tracing event. Tracing only supports basic scalar types (char, int, long) and does not support floating-point types (float, double).

In particular, use PRId64 and PRIu64 for int64_t and uint64_t respectively; this ensures portability across 32-bit and 64-bit platforms.

The format string must not end with a newline. The backend is responsible for adjusting the line ending to produce correct log output.

After defining the trace-event, call it directly from the target source file, as shown below:

#include "trace.h"  /* needed for trace event prototype */
void *qemu_vmalloc(size_t size)
{
    void *ptr;
    size_t align = QEMU_VMALLOC_ALIGN;
    if (size < align) {
        align = getpagesize();
    }
    ptr = qemu_memalign(align, size);
    /* Insert the trace-event in the form: trace_[event-name] */
    trace_qemu_vmalloc(size, ptr);
    return ptr;
}

In addition, if a function contains multiple trace events, append a unique suffix to the name to distinguish them.

In some cases, you may need to perform relatively complex computations to generate values that are used only as arguments to the tracing function.

In such cases, you can use the following function to guard that computation:

 trace_event_get_state_backends()

When the event is disabled at compile time or at runtime, the associated computation is skipped. If the event is disabled at build time, this check incurs no performance overhead.

The sample code is as follows.

#include "trace.h"  /* needed for trace event prototype */
void *qemu_vmalloc(size_t size)
{
    void *ptr;
    size_t align = QEMU_VMALLOC_ALIGN;
    if (size < align) {
        align = getpagesize();
    }
    ptr = qemu_memalign(align, size);
    if (trace_event_get_state_backends(TRACE_QEMU_VMALLOC)) {
        void *complex;
        /* some complex computations to produce the 'complex' value */
        trace_qemu_vmalloc(size, ptr, complex);
    }
    return ptr;
}

The following cases are particularly well suited to tracing-based debugging:

Tracking state transitions in code. Key points in code typically involve state changes such as start, stop, allocation, and release. State transitions are ideal trace events because they help explain how a system executes.
Tracking guest activity. Guest I/O accesses, such as reading device registers, are good trace events for analyzing guest interactions.
Using correlation fields to understand the context of single-line trace output. For example, tracing the pointer returned by malloc and how it is later used as the argument to free makes it possible to match malloc and free operations. Trace events without context are much less useful.

Trace backend overview

QEMU’s tracing uses a front-end/back-end split and supports multiple backends. In addition to the log backend mentioned above, it also supports the lighter-weight simple backend, as well as ftrace and dtrace.

We can also add support for additional backends through the tracetool script.

To enable different backends, use the following QEMU configure command:

$ ./configure --enable-trace-backends=simple,dtrace

Run ./configure --help to view all supported backends. If no backend is explicitly selected, the configuration defaults to the log backend.

Analyzing Trace Files

Using the simple backend as an example, this backend writes binary trace logs to a file from a dedicated thread, which has lower overhead than the log backend.

The QEMU source tree also provides Python scripts for offline trace-file analysis. While they may not be as powerful as platform-specific or third-party tracing backends, they are portable and do not require special library dependencies.

Formatting with simpletrace.py requires the trace-events-all file and the binary trace file:

./scripts/simpletrace.py [trace-events-all] [trace-log]

You must ensure that the trace-events-all file you use is the same one generated when building QEMU; otherwise, the trace-event declarations may have changed, which can cause inconsistent output.

Follow the GTOC WeChat account to use the community knowledge base for free. Available topics include: QEMU/GEM5/compilers/Linux You can access it from the official account menu; if you use it often, star the account.

Quick Start#

Introduction to trace-events#

Adding a New trace-event#

Trace backend overview#

Analyzing Trace Files#

Quick Start

Introduction to trace-events

Adding a New trace-event

Trace backend overview

Analyzing Trace Files