Analyzing QEMU's GD32VF103 Boot Flow

After QEMU starts, it does not execute the guest program’s first instruction immediately. Instead, it first runs the reset-vector code segment that Machine sets during initialization, and only then jumps to the guest’s first instruction.

 1static const struct MemmapEntry
 2{
 3    hwaddr base;
 4    hwaddr size;
 5} gd32vf103_memmap[] = {
 6    [GD32VF103_MFOL] = {0x0, 0x20000},
 7};
 8static void nuclei_board_init(MachineState *machine)
 9{
10    ...
11
12    /* reset vector */
13    uint32_t reset_vec[8] = {
14        0x00000297, /* 1:  auipc  t0, %pcrel_hi(dtb) */
15        0x02028593, /*     addi   a1, t0, %pcrel_lo(1b) */
16        0xf1402573, /*     csrr   a0, mhartid  */
17#if defined(TARGET_RISCV32)
18        0x0182a283, /*     lw     t0, 24(t0) */
19#elif defined(TARGET_RISCV64)
20        0x0182b283, /*     ld     t0, 24(t0) */
21#endif
22        0x00028067, /*     jr     t0 */
23        0x00000000,
24        memmap[GD32VF103_MAINFLASH].base, /* start: .dword */
25        0x00000000,
26        /* dtb: */
27    };
28
29    /* copy in the reset vector in little_endian byte order */
30    for (i = 0; i < sizeof(reset_vec) >> 2; i++)
31    {
32        reset_vec[i] = cpu_to_le32(reset_vec[i]);
33    }
34    rom_add_blob_fixed_as("mrom.reset", reset_vec, sizeof(reset_vec),
35                          memmap[GD32VF103_MFOL].base + 0x1000, &address_space_memory);
36...
37}

Therefore the PC of the first instruction is memmap[GD32VF103_MFOL].base + 0x1000. From the code above, memmap[GD32VF103_MFOL].base is 0, so the initial PC is 0x1000.

The verification method is simple: use riscv-gdb to remotely debug QEMU, as follows.

Open the first terminal window and start QEMU:

1qemu (nuclei_gd32vf103) $ ./build/qemu-system-riscv32 -M gd32vf103_rvstar -cpu nuclei-n205 -icount shift=0 -nodefaults -nographic -kernel ../nuclei-sdk/application/baremetal/helloworld/helloworld.elf -serial stdio -gdb tcp::1234 -S

Open the second terminal window and start GDB. You can see the first instruction, and compare the disassembly with the reset vector above:

 1qemu (nuclei_gd32vf103) $ riscv-nuclei-linux-gnu-gdb ../nuclei-sdk/application/baremetal/helloworld/helloworld.elf 
 2(gdb) target remote localhost:1234
 3Remote debugging using localhost:1234
 40x00001000 in ?? ()
 5(gdb) x /10i $pc
 6=> 0x1000:      auipc   t0,0x0
 7   0x1004:      addi    a1,t0,32
 8   0x1008:      csrr    a0,mhartid
 9   0x100c:      lw      t0,24(t0)
10   0x1010:      jr      t0
11   0x1014:      unimp
12   0x1016:      unimp
13   0x1018:      unimp
14   0x101a:      addi    s0,sp,16
15   0x101c:      unimp

This boot stub mainly does two things: it uses the DTB to obtain the guest program’s start address, and it reads the mhartid register (the current hart’s ID).

Let’s analyze the first instruction.

We are running a bare-metal program, so there is no DTB. Therefore the instruction auipc t0, 0x0 writes the current PC value, 0x1000, into the t0 register.
Next, t0 is incremented by 32 to obtain address 0x1020, which is written into a1.
The csrr instruction reads the value of the mhartid register into a0.
t0 is then offset by 24 to obtain address 0x1018, and the guest program’s start address is loaded from that location into t0.
The jr instruction then jumps to the guest program’s start address.

Using GDB, we can observe that the data at address 0x1018 is 0x8000000, which is right after the reset vector.

We know that when QEMU starts, loading the guest program helloword.elf with -kernel must involve a flow that writes 0x8000000 into 0x1018.

So how can we quickly locate that flow?

The method is simple: 0x1018 is a guest physical address (GPA). We only need to calculate the corresponding QEMU-process virtual address, that is, the host virtual address (HVA). If we use GDB to launch QEMU and set a watchpoint on that HVA, we can find it.

The 0-address GPA corresponds to the memory region for memmap[GD32VF103_MFOL].base. Through the RAM initialization process for that memory region, we can obtain the HVA base address. Then, adding the 0x1018 offset gives us the real HVA. The process is as follows:

Use GDB to launch QEMU and set a breakpoint to stop when initializing GD32VF103_MFOL:

1qemu (nuclei_gd32vf103) $ gdb ./build/qemu-system-riscv32
2(gdb) set args -M gd32vf103_rvstar -cpu nuclei-n205 -icount shift=0 -nodefaults -nographic -kernel ../nuclei-sdk/application/baremetal/helloworld/helloworld.elf
3(gdb) b memory_region_init_rom
4Breakpoint 1 at 0x6a7d60: file ../system/memory.c, line 3612.
5(gdb) run
6(gdb) finish

Inspect the RAM block of the memory region corresponding to GD32VF103_MFOL and read the HVA base address:

1(gdb) p s->internal_rom.ram_block->host 
2$5 = (uint8_t *) 0x7ffff4400000 ""

Compute the HVA corresponding to GPA address 0x1018, which is 0x7ffff4401018, and set a watchpoint on it:

1(gdb) watch *(0x7ffff4400000 + 0x1018)
2Hardware watchpoint 3: *0x7ffff4401018

Continue execution until the watchpoint is hit. Then read the data at the HVA to see whether it is the guest program’s start address:

1(gdb) c
2Thread 1 "qemu-system-ris" hit Hardware watchpoint 3: *0x7ffff4401018
3
4Old value = 0
5New value = 134217728
60x00007ffff696d565 in ?? () from /usr/lib/libc.so.6
7(gdb) x 0x7ffff4401018
80x7ffff4401018: 0x08000000

Inspect the call stack:

 1(gdb) bt
 2#0  0x00007ffff696d565 in ?? () from /usr/lib/libc.so.6
 3#1  0x0000555555c01c08 in memcpy (__dest=<optimized out>, __src=0x55555692f0e0, __len=<optimized out>)
 4    at /usr/include/bits/string_fortified.h:29
 5#2  address_space_write_rom_internal (as=0x55555648c460 <address_space_memory>, addr=4096, attrs=..., 
 6    ptr=<optimized out>, len=32, type=type@entry=WRITE_DATA) at ../system/physmem.c:2967
 7#3  0x0000555555c02b1c in address_space_write_rom (as=<optimized out>, addr=<optimized out>, attrs=..., 
 8    attrs@entry=..., buf=<optimized out>, len=<optimized out>) at ../system/physmem.c:2987
 9#4  0x00005555558e0f2e in rom_reset (unused=<optimized out>) at ../hw/core/loader.c:1282
10#5  0x0000555555c6af0a in resettable_phase_hold (obj=0x555556930350, opaque=<optimized out>, type=<optimized out>)
11    at ../hw/core/resettable.c:184
12#6  0x0000555555c6a4c1 in resettable_container_child_foreach (obj=<optimized out>, 
13    cb=0x555555c6adc0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resetcontainer.c:54
14#7  0x0000555555c6ae5a in resettable_child_foreach (rc=0x5555566fbaa0, obj=0x5555567393f0, 
15    cb=0x555555c6adc0 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:96
16#8  resettable_phase_hold (obj=obj@entry=0x5555567393f0, opaque=opaque@entry=0x0, type=type@entry=RESET_TYPE_COLD)
17    at ../hw/core/resettable.c:173
18#9  0x0000555555c6b290 in resettable_assert_reset (obj=obj@entry=0x5555567393f0, type=type@entry=RESET_TYPE_COLD)
19    at ../hw/core/resettable.c:60
20--Type <RET> for more, q to quit, c to continue without paging--
21#10 0x0000555555c6b651 in resettable_reset (obj=0x5555567393f0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45
22#11 0x0000555555a6a3f4 in qemu_system_reset (reason=reason@entry=SHUTDOWN_CAUSE_NONE) at ../system/runstate.c:494
23#12 0x00005555558eaad3 in qdev_machine_creation_done () at ../hw/core/machine.c:1607
24#13 0x0000555555a6e043 in qemu_machine_creation_done (errp=0x5555564a0298 <error_fatal>) at ../system/vl.c:2677
25#14 qmp_x_exit_preconfig (errp=0x5555564a0298 <error_fatal>) at ../system/vl.c:2707
26#15 0x0000555555a717bb in qemu_init (argc=<optimized out>, argv=<optimized out>) at ../system/vl.c:3739
27#16 0x0000555555869ff9 in main (argc=<optimized out>, argv=<optimized out>) at ../system/main.c:47
28(gdb)

Here we focus on rom_reset():

1rom_reset (unused=<optimized out>) at ../hw/core/loader.c:1282

Corresponding source code:

 1static void rom_reset(void *unused)
 2{
 3    Rom *rom;
 4
 5    QTAILQ_FOREACH(rom, &roms, next) {
 6        if (rom->fw_file) {
 7            continue;
 8        }
 9        /*
10         * We don't need to fill in the RAM with ROM data because we'll fill
11         * the data in during the next incoming migration in all cases.  Note
12         * that some of those RAMs can actually be modified by the guest.
13         */
14        if (runstate_check(RUN_STATE_INMIGRATE)) {
15            if (rom->data && rom->isrom) {
16                /*
17                 * Free it so that a rom_reset after migration doesn't
18                 * overwrite a potentially modified 'rom'.
19                 */
20                rom_free_data(rom);
21            }
22            continue;
23        }
24
25        if (rom->data == NULL) {
26            continue;
27        }
28        if (rom->mr) {
29            void *host = memory_region_get_ram_ptr(rom->mr);
30            memcpy(host, rom->data, rom->datasize);
31            memset(host + rom->datasize, 0, rom->romsize - rom->datasize);
32        } else {
33            address_space_write_rom(rom->as, rom->addr, MEMTXATTRS_UNSPECIFIED,
34                                    rom->data, rom->datasize);
35            address_space_set(rom->as, rom->addr + rom->datasize, 0,
36                              rom->romsize - rom->datasize,
37                              MEMTXATTRS_UNSPECIFIED);
38        }
39        if (rom->isrom) {
40            /* rom needs to be written only once */
41            rom_free_data(rom);
42        }
43        /*
44         * The rom loader is really on the same level as firmware in the guest
45         * shadowing a ROM into RAM. Such a shadowing mechanism needs to ensure
46         * that the instruction cache for that new region is clear, so that the
47         * CPU definitely fetches its instructions from the just written data.
48         */
49        cpu_flush_icache_range(rom->addr, rom->datasize);
50
51        trace_loader_write_rom(rom->name, rom->addr, rom->datasize, rom->isrom);
52    }
53}

It is responsible for reloading ROM data into memory when the VM starts or after migration. The following is a detailed analysis of its logic:

Loop over the ROM list: the function iterates through a linked list named roms to process each ROM instance. This list contains information about all ROMs in the VM.
Check the ROM file: for each ROM instance rom, the function first checks whether there is a corresponding firmware file (rom->fw_file). If so, it skips that ROM, because it may already have been handled by the firmware loader.
Migration-state check: next, the function checks whether the VM is currently in migration (runstate_check(RUN_STATE_INMIGRATE)). If it is migrating, the ROM data will be filled in after the next migration completes, so nothing needs to be done now. If ROM data already exists and is marked read-only (rom->isrom is true), the existing data is freed so that potentially guest-modified data is not overwritten.
ROM data filling: if the VM is not migrating and ROM data exists (rom->data != NULL), the function writes the ROM data into memory depending on the case: if the ROM has an associated memory region (rom->mr), it obtains the RAM pointer for that region (memory_region_get_ram_ptr) and copies the ROM data into that memory region. If the ROM is not associated with a memory region and is instead tied directly to an address space (rom->as), it uses address_space_write_rom to write the data to the specified address, and address_space_set to zero out the remaining space.
Free ROM data: if the ROM data is marked read-only (rom->isrom is true), it only needs to be written once, and the ROM data buffer can then be freed to save memory.
Flush the instruction cache: to ensure that the CPU can correctly fetch and execute instructions from the new ROM data, the function calls cpu_flush_icache_range to clear the instruction cache for the corresponding memory range.

Tracing output: finally, the function calls trace_loader_write_rom to record the ROM-loading event, which may be useful for debugging or performance analysis.

At this point, we roughly understand how the guest program’s start address is written to the correct location.

What we still do not know is how the guest program is parsed from the -kernel argument.

Because of space limits, I’ll stop here for today. In the next article, we will continue exploring this path.