Original Text Information

Abstract

The article analyzes the implementation approach of the QEMU RISC-V Server Platform reference board, rvsp-ref, explaining how it is built around standard compliance and development/testing environments, while reusing the capabilities of QEMU’s virt machine model.

The content is suitable for archiving under topics such as QEMU RISC-V machine, RVA23, RVSP-REF, and server platform virtualization support.

Archive Notes

This is an index of externally published articles; the main text has been imported below.


Main Text

The overall architecture of the QEMU RISC-V Server Platform reference board (rvsp-ref board) is built around two core goals: standardized compliance and development/testing environment. This implementation strictly follows the RISC-V Server Platform 1.0 specification, providing standardized hardware and software capability support for portable system software such as operating systems (for example, OpenEuler RISC-V) and Hypervisors.

In design, rvsp-ref is extended based on QEMU’s existing riscv virt machine type, reusing a large amount of riscv virt code to reduce development complexity, and defines the virtual CPU type rvsp-ref-cpu to ensure compliance with server platform specification requirements (supporting the RVA23 ISA Profile, Sv48, Svadu, H extension, etc.).

rvsp-ref supports a maximum core count defined by RVSP_CPUS_MAX, with a default maximum of 512 cores, and uses valid_cpu_types to restrict usage to compliant CPU types only.

Therefore, as long as the existing software stack has been adapted to QEMU virt Machine, migrating it to rvsp-ref will be very straightforward.

At present, the author has built an rvsp-ref branch capable of running OpenEuler RISC-V based on the latest upstream patches. Use the following command to obtain the source code:

1git clone -b riscv-server-platform git@github.com:zevorn/qemu.git

For a detailed introduction to RVSP-REF, see this article: Running OpenEuler RISC-V 25.09 on the QEMU RISC-V Server Reference Platform (rvsp-ref) - openEuler - RISC-V Developer Community.

The author will next describe the overall architecture of QEMU rvsp-ref, key data structures, the initialization flow, the implementation of key components, and future plans in several sections.

1. Overall Architecture

Hardware Simulation Status

rvsp-ref currently includes the following main components:

Component CategorySpecific ImplementationSpecification Requirements
CPURVA23s64 + other necessary extMissing sdext
Interrupt ControllerAIA architecture (IMSIC+APLIC)Complies with server platform standards
PCIe InterfaceRoot complex + AHCI + physical NICRetain virtio-pci
Storage SystemDual PFlash flash devicesFirmware storage support
Basic PeripheralsRTC, UART, etc.Required hardware components

Interfaces for Guest Software

  • Device tree only support: Generate the minimum necessary device tree nodes for bare-metal or simple firmware boot
  • No ACPI tables: Upstream recommends that EDK II generate ACPI tables for the kernel based on the dtb provided by rvsp-ref
  • Simplified configuration: Remove the fw_cfg device to reduce coupling with specific QEMU features

Comparison with virt

To more intuitively understand the differences between rvsp-ref and virt, we compare and analyze several important components:

FeatureRVSP-REFvirt machine
CPU typeStrictly compliant with the RVSP 1.0 specificationDynamically configurable
Device typePhysical PCIe devicesVirtIO devices
Firmware interfaceDevice tree onlyDevice tree + ACPI
Configuration mechanismNo fw_cfgSupports fw_cfg
Intended useServer compliance testingGeneral-purpose virtualization

It can be seen that the design of rvsp-ref provides a reliable virtual development platform for the RISC-V server ecosystem through standardized hardware component emulation, streamlined software interfaces, and strict compliance requirements, emphasizing realism while maintaining compatibility with the existing virt machine.

2. Key Data Structures

The implementation of QEMU rvsp-ref revolves around several core data structures; these structs carry the machine state, hardware components, and configuration information.

Introduction to RVSPMachineState

RVSPMachineState is the QOM object of rvsp-ref, inheriting from QEMU’s standard MachineState:

 1// hw/riscv/server_platform_ref.c
 2struct RVSPMachineState {
 3    /*< private >*/
 4    MachineState parent;  // Inherits from the MachineState base class
 5
 6    /*< public >*/
 7    Notifier machine_done;                    // Machine initialization completion notifier
 8    RISCVHartArrayState soc[RVSP_SOCKETS_MAX]; // RISC-V hart array for multiple sockets
 9    DeviceState *irqchip[RVSP_SOCKETS_MAX];   // Interrupt controller for each socket
10    PFlashCFI01 *flash[2];                    // 2 CFI flash devices
11
12    int fdt_size;                             // Device tree size
13    int aia_guests;                           // AIA (Advanced Interrupt Architecture) number of clients
14    const MemMapEntry *memmap;                // Memory mapping table pointer
15};

The following provides additional explanation for the key fields:

  • soc: Supports a multi-socket architecture, where each socket contains a set of RISC-V CPU cores, with support for up to 4 sockets (RVSP_SOCKETS_MAX = 4)
  • irqchip: Maintains an independent interrupt controller instance for each socket, supporting distributed interrupt handling in the AIA architecture
  • flash: Manages two parallel flash devices (pflash0/pflash1) for firmware storage
  • aia_guests: Configures the number of guests supported by the AIA architecture, affecting IMSIC MMIO space allocation; in later versions this will be changed to a constant and will strictly follow the specification

Memory Map: Hardware Address Space Layout

The memory map is defined by the MemMapEntry rvsp_ref_memmap[] array. Key regions include:

Device RegionBase AddressSizePurpose
RVSP_DRAM0x800000000xff80000000ull (≈1024GB)Main memory region
RVSP_PCIE_ECAM0x300000000x10000000PCIe configuration space
RVSP_PCIE_MMIO0x400000000x40000000PCIe device MMIO
RVSP_PCIE_MMIO_HIGH0x10000000000ull0x10000000000ullHigh PCIe MMIO
RVSP_FLASH0x200000000x4000000 (64MB)Dual PFlash devices
RVSP_IMSIC_M/S0x24000000/0x28000000RVSP_IMSIC_MAX_SIZEAIA interrupt controller
RVSP_APLIC_M/S0xc000000/0xd000000APLIC_SIZE(RVSP_CPUS_MAX)Advanced PLIC

Basically consistent with virt. Since a dtb is provided, OpenSBI and the kernel can detect it automatically:

  • PCIe Space Partitioning: Provides three parts—standard ECAM, conventional MMIO, and high MMIO—supporting large-scale PCIe device mapping
  • Interrupt Controller Separation: IMSIC and APLIC separately manage message-signaled interrupts and wired interrupts, conforming to the AIA specification
  • Flash Region: A 64MB space supports dual-bank flash, with the sector size configured as 256 KiB

Constant Configuration

The rvsp-ref code defines a series of key configuration constants, mainly setting upper-limit values; basically, most of them can be configured via startup parameters:

CPU and Topology Configuration:

#define RVSP_CPUS_MAX_BITS     9     // Supports up to 512 cores (2^9)
#define RVSP_SOCKETS_MAX_BITS  2     // Supports up to 4 sockets (2^2)

Interrupt System Configuration:

#define RVSP_IRQCHIP_NUM_MSIS   255    // Maximum number of MSI interrupts
#define RVSP_IRQCHIP_NUM_SOURCES 96   // number of interrupt sources
#define RVSP_IRQCHIP_MAX_GUESTS  7    // Maximum number of Guests (2^3-1)

Device Tree Generation Configuration:

#define FDT_PCI_ADDR_CELLS    3       // PCI address 3-cell format
#define FDT_APLIC_INT_CELLS   2       // APLIC interrupt 2-cell format

3. Initialization Process

The initialization flow of QEMU rvsp-ref is implemented in the rvsp_ref_machine_init() function. As the entry point for the Machine type, this function is called during the instantiation phase. At this stage, all dynamic parsing and setting of attribute parameters has already been completed, and the creation and configuration of all components can then proceed according to hardware dependency order.

Initialization Process Overview

The entire initialization process follows these key steps:

  1. Parameter validation and basic setup
  2. Multi-socket CPU architecture initialization
  3. Memory region mapping setup
  4. Interrupt controller instantiation
  5. PCIe subsystem initialization
  6. Peripheral device creation
  7. Device tree generation and loading
  8. Completion notification registration

Detailed Initialization Steps

Below, we provide additional explanation for each step:

1. Parameter Validation Stage

This is quite similar to virt here; the checks are all fairly standard:

 1/* Check socket count limit */
 2if (RVSP_SOCKETS_MAX < socket_count) {
 3    error_report("number of sockets/nodes should be less than %d", RVSP_SOCKETS_MAX);
 4    exit(1);
 5}
 6
 7/* Check ACLINT only supports TCG mode */
 8if (!rvsp_aclint_allowed()) {
 9    error_report("'aclint' is only available with TCG acceleration");
10    exit(1);
11}

2. Multi-Socket Architecture Initialization

Loop initialization is performed for each Socket (determined by socket_count), which is also consistent with virt:

  • Hart ID continuity check: riscv_socket_check_hartids()
  • Get the base Hart ID and count: riscv_socket_first_hartid(), riscv_socket_hart_count()
  • Initialize the RISCV Hart array:
1object_initialize_child(OBJECT(machine), soc_name, &s->soc[i], TYPE_RISCV_HART_ARRAY);
2object_property_set_str(OBJECT(&s->soc[i]), "cpu-type", machine->cpu_type, &error_abort);
3object_property_set_int(OBJECT(&s->soc[i]), "hartid-base", base_hartid, &error_abort);
4object_property_set_int(OBJECT(&s->soc[i]), "num-harts", hart_count, &error_abort);
5sysbus_realize_and_unref(SYS_BUS_DEVICE(&s->soc[i]), &error_fatal);
  • Per-Socket Interrupt Controller Creation
1s->irqchip[i] = rvsp_ref_create_aia(s->aia_guests, memmap, i, base_hartid, hart_c

3. Establishing Memory Region Mapping

  • System main memory (DRAM): memory_region_add_subregion(system_memory, memmap[RVSP_DRAM].base, machine->ram)
  • Mask ROM (MROM): initialize and map to the RVSP_MROM address
  • Reset system controller: initialize the IO region and map to RVSP_RESET_SYSCON

4. PCIe Host Controller Initialization

gpex_pcie_init(system_memory, pcie_irqchip, s)

Establish a complete PCIe address space mapping, including ECAM, MMIO, and PIO regions.

5. Peripheral Device Instantiation

Create each type of peripheral according to the hardware dependency order:

Basic peripherals:

  • Serial port (UART0): serial_mm_init(), bound to the RVSP_UART0 address and interrupt
  • Real-time clock (RTC): sysbus_create_simple("goldfish_rtc", ...)
  • IOMMU system device: initialize and implement sysbus_realize_and_unref()

Storage device:

  • Flash (PFlash): Initialize two flash devices in a loop:
for (i = 0; i < ARRAY_SIZE(s->flash); i++) {
    pflash_cfi01_legacy_drive(s->flash[i], drive_get(IF_PFLASH, 0, i));
}
rvsp_flash_maps(s, system_memory);

6. Device Tree Processing

  • Conditional loading: if the user provides a DTB (machine->dtb), call load_device_tree() directly
  • Dynamic generation: otherwise call create_fdt(s, memmap) to generate the default device tree

7. Complete Notification Registration

s->machine_done.notify = rvsp_ref_machine_done;
qemu_add_machine_init_done_notifier(&s->machine_done);

The rvsp_ref_machine_done callback function is responsible for subsequent firmware/kernel loading and reset vector configuration.

Summarize key design features

Strict dependency order: CPU → memory → interrupts → PCIe → peripherals, ensuring hardware components are initialized in the correct sequence.

Error Handling Mechanism: Each critical step includes strict parameter validation and error handling, ensuring graceful exit when initialization fails.

Modular Design: Each component’s initialization function has a single responsibility, making it easy to maintain and extend.

Device Tree Flexibility: Supports both external DTB loading and internal dynamic generation modes, adapting to different usage scenarios.

The entire initialization process ensures that the RVSP-REF machine can boot in compliance with the RISC-V Server Platform specification, providing a stable and reliable hardware emulation environment for the operating system and firmware.

4. Key Component Implementation

PCIe Subsystem Implementation

The PCIe subsystem of RVSP-REF uses the standard GPEX (Generic PCI Express) host controller architecture, with gpex_pcie_init() completing the initialization and configuration of the root complex. This implementation strictly follows the PCIe specification while also being specifically optimized for server platform characteristics.

4.1 GPEX PCIe Host Controller Configuration

Register Map Configuration sets key address space properties via object_property_set_* functions:

  • ECAM space: base address 0x3000_0000, size 0x1000_0000 (256MB)
  • Low 4G MMIO space: base address 0x4000_0000, size 0x4000_0000 (1GB)
  • High 4G MMIO space: base address 0x1_0000_0000_0000, size 0x1_0000_0000_0000 (1TB)
  • PIO space: base address 0x0300_0000, size 0x1_0000 (64KB)

Interrupt configuration uses a hardware swizzling mechanism to converge INTx interrupts from endpoint devices onto 4 standard interrupt lines:

for (i = 0; i < PCI_NUM_PINS; i++) {
    sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, 
                      qdev_get_gpio_in(irqchip, RVSP_PCIE_IRQ + i));
    gpex_set_irq_num(GPEX_HOST(dev), i, RVSP_PCIE_IRQ + i);
}

4.2 AHCI SATA Controller Implementation

The AHCI controller is implemented as a standard PCIe endpoint device, with key features including:

Device Identification and Configuration

  • PCI device ID: 8086:2922 (Intel ICH9 AHCI controller)
  • PCI subsystem ID: 1af4:1100
  • BAR configuration:
    • BAR4: I/O space reserved
    • BAR5: 32-bit memory space, mapping AHCI’s ABAR (AHCI Base Address Register)

DMA Engine Support

  • Enable memory access and bus mastering by setting PCI_COMMAND_MEMORY and PCI_COMMAND_MASTER
  • Define the ahci_dma_ops structure to manage the DMA lifecycle of SATA data transfers
  • Support address mapping for the command list, FIS receive area, and data buffers

4.3 e1000e Network Adapter Device Emulation

The e1000e network card serves as a representative physical NIC, implementing full PCIe endpoint functionality:

MAC Address Assignment

  • Default address: 52:54:00:12:34:56
  • Supports automatic assignment by qemu_macaddr_default_if_unset()
  • The MAC address is stored in the E1000EState.conf.macaddr structure

Interrupt Mapping Mechanism

  • MSI-X support: Initialize the multi-vector interrupt configuration via e1000e_init_msix()
  • Interrupt registers: E1000_IVAR0 is used to configure interrupt vector allocation
  • Supports configuring interrupt vectors separately for RX/TX queues

Register Layout

  • MMIO region: 128 KB (index 0)
  • Flash region: 128 KB (index 1)
  • IO space: 32 bytes (index 2)
  • MSI-X region: 16 KB (index 3)

4.4 PCIe Interrupt Mapping Mechanism

RVSP-REF adopts a modern interrupt architecture centered on MSI/MSI-X:

MSI/MSI-X Mandatory Requirements

  • The system must support Message Signaled Interrupts (rule MSI_010)
  • INTx must be disabled: The SoC must not support interrupt signaling based on INTx virtual wires (rule MSI_020)
  • INTx emulation is allowed only in legacy device scenarios

AIA Interrupt Controller Connection

  • IMSIC role: Each CPU configures an S-mode interrupt file, supporting at least 255 interrupt identifiers
  • APLIC role: Converts wired interrupts to MSI, ensuring all external interrupts are delivered in MSI form
  • Virtualization support: In KVM mode, ensures interrupt isolation and performance

Specific implementation approach

  • Allocate an independent interrupt controller instance for the PCIe device (s->irqchip[1])
  • Link the PCIe node to the IMSIC via the msi-parent property in the device tree
  • INTx interrupts are converted to MSI by the APLIC (rule IIC_080)

4.5 Address Space Layout and BAR Allocation

The PCIe address space is managed uniformly through the rvsp_ref_memmap array:

Space TypeBase AddressSizePurpose
ECAM0x3000_0000256MBPCIe configuration space access
MMIO (32-bit)0x4000_00001GB32-bit MMIO device mapping
MMIO (64-bit)0x1_0000_0000_00001TB64-bit MMIO device mapping
PIO0x0300_000064KBPort I/O operations

BAR Allocation Strategy

  • Use the memory region alias mechanism to implement address translation
  • Create ECAM and MMIO alias regions via memory_region_init_alias
  • Directly map the PIO space through sysbus_mmio_map
  • Support pci_allow_0_address = true, allowing PCIe devices to use address 0

This PCIe subsystem design ensures that the RVSP-REF platform provides a true and reliable hardware emulation environment for server-class operating systems and hypervisors while maintaining standard compatibility.

Interrupt System Implementation

QEMU rvsp-ref’s interrupt system strictly follows the RISC-V Server Platform specification requirements and is implemented based on the AIA (Advanced Interrupt Architecture) architecture, fully replacing the traditional PLIC scheme. The system consists of two levels of controllers, IMSIC (Incoming MSI Controller) and APLIC (Advanced Platform-Level Interrupt Controller), working together to provide efficient interrupt handling for multicore server environments.

AIA Controller Instantiation and Connection

Each processor socket has its own independent interrupt controller instance, initialized by the rvsp_ref_create_aia() function:

s->irqchip[i] = rvsp_ref_create_aia(s->aia_guests, memmap, i, base_hartid, hart_count);

The core responsibilities of this function include:

  • IMSIC configuration: Calculate the number of interrupt files per CPU based on the aia_guests parameter, and map them to the corresponding MMIO regions (M-mode: 0x2400_0000, S-mode: 0x2800_0000)
  • APLIC instantiation: Create the APLIC device and configure its input sources (96 in total, starting from RVSP_PCIE_IRQ)
  • Controller interconnection: Connect the APLIC’s MSI output to the corresponding CPU’s IMSIC interrupt files

Interrupt Routing Mechanism

MSI/MSI-X Priority Principle

QEMU rvsp-ref strictly follows the MSI_010 rule, enforcing the use of MSI/MSI-X interrupts for all PCIe devices:

  • Device tree reference: The PCIe node directly points to the IMSIC controller via the msi-parent = <&imsic> property
  • Interrupt identifier: Each CPU supports up to 255 MSI interrupt IDs (RVSP_IRQCHIP_NUM_MSIS)
  • Virtualization extension: Supports up to 7 Guests (RVSP_IRQCHIP_MAX_GUESTS), providing an independent VS-mode interrupt file for each virtual CPU

INTx Legacy Compatibility Handling

For legacy scenarios that must support INTx, the system follows the IIC_080 rule:

  • PCIe INTx Wiring: The GPEX controller’s 4 INTx lines are connected to APLIC inputs:
for (i = 0; i < PCI_NUM_PINS; i++) {
    irq = qdev_get_gpio_in(irqchip, RVSP_PCIE_IRQ + i);
    sysbus_connect_irq(SYS_BUS_DEVICE(gpex), i, irq);
}
  • Conversion mechanism: APLIC converts received INTx interrupts into MSI writes to the target CPU’s IMSIC
  • Isolation design: The PCIe domain uses an independent irqchip[1] instance, isolated from the interrupt controller for MMIO devices

Interrupt Handling Process

1. MSI delivery path
PCIe device --[MSI write]--> IMSIC --[interrupt signal]--> RISC-V CPU
2. INTx-to-MSI Conversion Path
PCIe device --[INTx signal]--> APLIC --[MSI conversion]--> IMSIC --[interrupt signal]--> RISC-V CPU
3. Synchronization Assurance Mechanism

APLIC and IMSIC ensure reliable MSI delivery through a strict synchronization process:

  1. Clear the pending bit: Clear the interrupt pending bit used for synchronization in IMSIC
  2. Acquire the mutex: Lock the APLIC genmsi register
  3. Generate the MSI: Write the genmsi register to send an interrupt to the target CPU
  4. Wait for completion: Poll the Busy bit until the operation is complete
  5. Confirm delivery: Check the IMSIC interrupt pending bit to confirm the interrupt has been received

Virtualization Support

QEMU’s rvsp-ref AIA implementation fully takes virtualization requirements into account:

  • Guest interrupt files: Each virtual CPU has an independent VS-mode interrupt file, supporting direct virtual interrupt injection
  • KVM optimization: IMSIC can be emulated directly by the kernel, while APLIC is handled in user space, balancing performance and functionality
  • Priority management: Interrupt priorities are dynamically configured through CSR registers (miselect/mireg, siselect/sireg), supporting flexible scheduling across multiple privilege modes

Device Tree Node Description

During the device tree generation phase, the system creates a complete description for the interrupt controller:

IMSIC Node Properties
interrupt-controller;
#interrupt-cells = <0>;
msi-controller;
reg = <base address size>;
interrupts-extended = <interrupt line references for each CPU>;
APLIC Node Attributes
interrupt-controller;
#interrupt-cells = <2>;
reg = <base address, size>;
interrupts-extended = <references connected to each CPU>;

5. Follow-up Plan

Current Issues

During support for QEMU rvsp-ref, many issues with the patch were discovered, for example:

  1. Some components have incorrect address space mappings Re: [PATCH v3 4/4] hw/riscv/server_platform_ref.c: add riscv-iommu-sys - Chao Liu
  2. The RVSP-REF CPU cannot boot the kernel correctly when inheriting from vendor Re: [PATCH v3 2/4] target/riscv: Add server platform reference cpu - Chao Liu

Therefore, upstream will soon update the v4 version of the patch: Re: [PATCH v3 0/4] hw/riscv: Add Server Platform Reference Board - Daniel Henrique Barboza

Features to Support

  1. Add sdext extension
  2. Support RAS
  3. Support QoS

In the latest upstream discussion, there is a desire to consider merging rvsp-ref into mainline after completing sdext.