SE_01_005 - QEMU PCIe passthru on x86 hosts
About
QEMU, when running on top of Linux, can use the capabilities of the VFIO subsystem to take control of a host device and give it to the virtual machine, also known as pass-through. This allows the virtual machine to use the device natively, which results in increased performance.
QEMU itself has no restrictions about the host and virtual machine architecture. However, implementation specifics of the host do affect Qemu operation:
- x86 has identity-mapped requirements for the MSI doorbell and these (APIC/IOAPIC) live right below 4GiB.
- Current qemu RISC-V places RAM at 0x8000_0000.
...meaning that RISC-V VMs with a bit more than than 2000MiB and a PCIe pass-thru device will fail to launch due to a conflict between VM RAM and the MSI region overlapping.
The solution is to split the RISC-V VM RAM range into two regions. For example:
1 GiB range from 0x80000000 to 0xc0000000 ("low RAM")
The remainder at 0x100000000
...leaving an appropriately-sized hole between the ranges, facilitating the use of PCIe pass-thru devices on x86 hosts allowing driver developers to work on RISC-V drivers for devices without a real RISC-V system (e.g. EDK2_00_01 - MultiArchUefiPkg.
A more invasive approach may create more "low RAM", e.g. to deal with existing GRUB2 "relocation overflow" bug (TBD link to non-existing project under Distro and Integration).
Project Scope and Timelines
Changes to hw/riscv/virt.c around:
- VM physical memory layout
- Updates to the generated DT
- Misc cleanup around memmap handling (wrt floating values).
Testing with a 4GiB RAM configuration and a PCIe pass-thru device on an Intel Architecture host. For example, a NIC or a graphics adapter with EDK2_00_01 - MultiArchUefiPkg.
Changes slated for end of 2H23.
Components and Repos
Upstream Qemu, with the actual branch used consistent with other RISE Qemu projects (TBD).
There is an existing prototype patch against https://github.com/ventanamicro/qemu.git (dev-upstream, 4974a22c0332f2f677e90dd629b7f9354136c250)
See 0001-riscv-virt-split-RAM-into-low-and-high-memory.patch
Stakeholders and Partners
Other hw/riscv/virt.c contributors, including:
- RISE
- Ventana: Sunil V L, Atish Patra, Anup Patel, Daniel Henrique Barboza
- External
- Microchip: Conor Dooley
- Alistair Francis
Dependencies
As we are splitting the single memory ranges to multiple, the software stack needs to adapt accordingly:
- UBOOT_00_01 - PCIe Passthru
- DI_01_01: GRUB - Relocation overflow on RISC-V with multi-range memory layout
Measure of Success
An accepted and tested design and implementation by end of 2H23 (slated for merging).
RISE Requirements
None (not accounting any of existing engineering investment against RISE resources).
Status
The table below is rolled up to the firmware WG status page - 2023-2H - Firmware Priorities.