Exploring a New RISC-V Proposal: BF16 and Minimal OFP8 Vector Compute (Zvfbfa and Zvfofp8min)

Source information

Source: RISC-V Developer Community
Author / ID: zevorn
Original: https://ruyisdk.cn/t/topic/964
Original publication date: 2025-08-07

Summary

This article discusses new extension proposals related to RISC-V vector computing, focusing on Zvfbfa and Zvfofp8min, and covers how BF16 and the OCP OFP8 floating-point formats are supported in vector computation.

This post is closely related to the blog article “Adding floating-point precision for neural-network workloads to QEMU softfloat” and can serve as a follow-up index for RISC-V AI floating-point formats, QEMU softfloat, and RVV extension support.

Archive note

This is an index for a published off-site article; the full text has been imported below.

Body

A while ago I shared on the forum how to add floating-point precision suitable for neural-network workloads to QEMU softfloat (fp8/tf32). Around the same time, I noticed that upstream RISC-V already had related ISA extension proposals Zvfba and Zvfofp8min. In this article, we will briefly discuss them.

The main additions are two extensions:

Zvfbfa: adds more complete BF16 vector-compute support;
Zvfofp8min: provides basic support for the two 8-bit floating-point formats defined in the Open Compute Project OFP8 specification (OFP8 E4M3 and OFP8 E5M2).

Key milestones
If you need to set a planned milestone date, use the RISC-V specification planning editor.

Milestone	Date
Planned approval	2025-07-12
Internal review begins	2025-08-11
ARC review freeze request	2025-10-08
Freeze	2025-11-12
Public review begins	2025-12-13
TSC approval	2026-02-10
Board approval	2026-02-25

The following sections introduce the relevant content in sequence. The material is based on the official documentation’s machine translation and has been manually polished.

Zvfbfa: more complete BF16 vector-compute support

The Zvfbfa extension depends on the Zve32f and Zfbfmin extensions (it is compatible with Zvfbfwma and Zvfbfmin, but does not depend on them).

Zvfbfa adds a one-bit altfmt field at bit 8 of the vtype CSR. Attempting to set altfmt = 1 with SEW ≥ 32 is reserved.

Note:
The recommended assembly syntax for setting altfmt = 1 is to append the alt suffix after the SEW specifier, for example vsetvli a0, a1, e16alt, m1, ta, ma.
When a reserved combination of altfmt and SEW is selected, the implementation should set vill in vtype.

When altfmt = 0, the hardware behaves as if Zvfbfa were not implemented.

When altfmt = 1 and SEW = 8, all vector floating-point instructions become reserved except for the following, which are redefined to use BF16 for any operand that would otherwise use FP16 format:

vfwcvt.f.x[u].v
vfncvt.x[u].f.w
vfncvt.rtz.x[u].f.w

When altfmt = 1 and SEW = 16, all vector floating-point instructions are reserved except for the following, which are redefined to use BF16 for any operand that would otherwise use FP16 format:

vfadd.v[vf]
vfsub.v[vf]
vfmin.v[vf]
vfmax.v[vf]
vfsgnj.v[vf] ††
vfsgnjn.v[vf] ††
vfsgnjx.v[vf] ††
vfslide1up.vf ††
vfslide1down.vf ††
vfmv.v.f ††
vfmerge.vfm ††
vmfeq.v[vf]
vmfle.v[vf]
vmflt.v[vf]
vmfne.v[vf]
vmfgt.vf
vmfge.vf
vfmul.v[vf]
vfrsub.vf
vfmadd.v[vf]
vfnmadd.v[vf]
vfmsub.v[vf]
vfnmsub.v[vf]
vfmacc.v[vf]
vfnmacc.v[vf]
vfmsac.v[vf]
vfnmsac.v[vf]
vfwadd.v[vf]
vfwsub.v[vf]
vfwadd.w[vf]
vfwsub.w[vf]
vfwmul.v[vf]
vfwmacc.v[vf] (same semantics as vfwmaccbf16.v[vf])
vfwnmacc.v[vf]
vfwmsac.v[vf]
vfwnmsac.v[vf]
vfmv.s.f ††
vfmv.f.s †
vfwcvt.f.f.v (same semantics as vfwcvtbf16.f.f.v)
vfncvt.f.f.w (same semantics as vfncvtbf16.f.f.w)
vfncvt.rod.f.f.w
vfrsqrt7.v
vfrec7.v
vfclass.v
vfwmaccbf16.v[vf] †
vfwcvtbf16.f.f.v †
vfncvtbf16.f.f.w †

Instructions marked with † have identical semantics regardless of altfmt. Instructions marked with †† differ only in one respect: for f register operands that are not properly NaN-boxed, BF16 canonical NaN must be substituted instead of FP16 canonical NaN.

Note: The excluded operations include division, square root, reductions, and conversions to or from integers wider than 8 bits. These can be implemented through conversions with FP32.

For vfrec7.v, some inputs greater than 2^126 produce subnormal results that cannot be represented exactly with BF16’s limited precision. Such results are rounded toward zero.

Zvfofp8min: basic support for FP8

The Zvfofp8min extension depends on Zve32f.

In some applications, the OFP8 formats are used directly to represent values. In others, they are used as components of block floating-point formats, as described in the OCP microscaling specification.
The conversion instructions defined by this extension support both use cases. Software can convert OFP8 values to BF16 or FP32, and then apply scaling factors in a higher-precision format, for example with the vfmul.vf instruction. If future quantitative evidence shows demand, vector or matrix extensions may directly provide microscaling support.

Vector support is currently proposed only for OFP8 formats, because these formats are used almost exclusively in highly data-parallel computing workloads.

The canonical NaN for both E4M3 and E5M2 is 0x7f.

OFP8 to BF16 conversion instructions

The existing instruction vfwcvtbf16.f.f.v is used to convert OFP8 to BF16. When SEW = 8 and altfmt = 0, it converts the OFP8 E4M3 vector in vs2 to BF16 and writes the result to vd. No rounding is performed, and no floating-point exception flags are set. When SEW = 8 and altfmt = 1, the instruction interprets vs2 as an OFP8 E5M2 vector; the rest of the behavior is identical.

Note:
Conversion to FP32, FP16, and integer formats can be implemented by first converting to BF16 and then using existing instructions from the Zvfbfmin, Zvfbfa, Zvfhmin, and Zve32f extensions. Direct conversion from OFP8 to FP32 is uncommon because OFP8 values are usually used as multipliers. Multiplication itself can widen the result precision when needed.
Conversion between the two OFP8 formats is uncommon, but it can be implemented by first converting to BF16 and then using one of the instructions defined in the next section.

BF16 to OFP8 conversion instructions

The existing vfncvtbf16.f.f.w instruction converts BF16 to OFP8. When SEW = 8 and altfmt = 0, it converts the BF16 vector in vs2 to OFP8 E4M3 and writes the result to vd. Since E4M3 cannot represent infinity, infinite results are converted to canonical NaN.

When SEW = 8 and altfmt = 1, the instruction converts to OFP8 E5M2. In that case, infinity is representable. In both cases, results are rounded using the dynamic rounding mode in the frm register, and floating-point exceptions are reported in fflags. Register handling is the same as for other floating-point conversion operations.

The OFP8 specification also defines saturating conversions, where infinite results are converted to the largest finite value with the same sign. The new instruction vfncvtbf16.sat.f.f.w implements this behavior.

This instruction applies both when SEW = 8 and altfmt = 0 and when SEW = 8 and altfmt = 1. It behaves like vfncvtbf16.f.f.w, but with saturation. Its encoding matches vfncvtbf16.f.f.w, except that vs1 is set to 11111.

Note: Conversions from 8-bit integers to OFP8 first go through BF16 via the instructions defined by the Zvfbfa extension, and then use the instructions defined in this section.

FP32 to OFP8 conversion instructions

A new instruction, vfncvt.f.f.q, has been added to convert FP32 to OFP8. When SEW = 8 and altfmt = 0, it converts the FP32 vector in vs2 (EMUL = 4×LMUL) to OFP8 E4M3 and writes the result to vd (EMUL = LMUL).

Because E4M3 cannot represent infinity, infinite results are converted to canonical NaN. When SEW = 8 and altfmt = 1, the instruction converts to OFP8 E5M2. In that case, infinity is representable. In both cases, results are rounded using the dynamic rounding mode in the frm register, and floating-point exceptions are reported in fflags. The encoding of vfncvt.f.f.q is the same as vfncvt.f.f.w, except that vs1 is set to 11001.

Another new instruction, vfncvt.sat.f.f.q, is defined for both SEW = 8 and altfmt = 0 and for SEW = 8 and altfmt = 1. It performs the same operation as vfncvt.f.f.q, but with saturation, meaning infinite results are converted to the largest finite value with the same sign. The encoding of vfncvt.sat.f.f.q is the same as vfncvt.f.f.w, except that the vs1 field is set to 11011.

Note:
Another design option would be to first convert FP32 to BF16 using round-to-nearest-even, and then convert to OFP8 with the instructions defined earlier. However, because FP32-to-OFP8 conversions are common enough, the direct-conversion design is more reasonable.
Conversion from FP16 and 16-bit integer formats first goes through FP32 using instructions from the Zvfhmin and Zve32f extensions, and then uses the instructions defined in this section.

Source information#

Summary#

Archive note#

Body#

Zvfbfa: more complete BF16 vector-compute support#

Zvfofp8min: basic support for FP8#

OFP8 to BF16 conversion instructions#

BF16 to OFP8 conversion instructions#

FP32 to OFP8 conversion instructions#