Simulating AArch64 FCMP Instructions on x86

The most efficient way to emulate an architecture is to get as close as possible to a 1:1 instruction mapping, where one source-architecture instruction corresponds to one target-architecture instruction. Even for an interpreter, the same principle should be followed as closely as possible: emulate each source instruction with as few target instructions as possible.

Approach

The AArch64 FCMP instruction is defined as follows:

 1Manual: C7.2.59 <Arm Architecture Reference Manual (Armv8-A) 2019>
 2Floating-point quiet Compare (scalar). This instruction compares the two SIMD & FP source register values, or the 
 3first SIMD&FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.
 4It raises an Invalid Operation exception only if either operand is a signaling NaN.
 5A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception 
 6results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see 
 7Floating-point exceptions and exception traps on page D1-2313.
 8Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state 
 9and Exception level, an attempt to execute the instruction might be trapped.
10
11Instruction encoding diagram:
12│31 30 29 28│27 26 25 24│23 22 21 20│19 18 17 16│15 14 13 12│11 10 09 08│07 06 05 04│03 02 01 00│
13├───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
14│ 0  0  0  1│ 1  1  1  0│ 0  1  1  .│ .  .  .  .│ 0  0  1  0│ 0  0  .  .│ .  .  .  0│ 0  0  0  0│
15└───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┘
16
17
18Half-precision variant
19Applies when ftype == 11 && opc == 00.
20FCMP Hn, Hm
21
22Half-precision, zero variant
23Applies when ftype == 11 && Rm == (00000) && opc == 01.
24FCMP Hn, #0.0
25
26Single-precision variant
27Applies when ftype == 00 && opc == 00.
28FCMP Sn, Sm
29
30Single-precision, zero variant
31Applies when ftype == 00 && Rm == (00000) && opc == 01.
32FCMP Sn, #0.0
33
34Double-precision variant
35Applies when ftype == 01 && opc == 00.
36FCMP Dn, Dm
37
38Double-precision, zero variant
39Applies when ftype == 01 && Rm == (00000) && opc == 01.
40FCMP Dn, #0.0

Broadly speaking, there are two kinds of comparison here: single-precision floating-point comparison and double-precision floating-point comparison. Each of them can be either ordered or unordered, so the problem can be reduced to four cases.

Conveniently, each of those four cases has a corresponding x86 instruction:

x86 single-precision floating-point ordered compare instruction: comiss, corresponding to AArch64 FCMP Sn, Sm (Ordered)
x86 single-precision floating-point unordered compare instruction: ucomiss, corresponding to AArch64 FCMP Sn, Sm (Unordered)
x86 double-precision floating-point ordered compare instruction: comisd, corresponding to AArch64 FCMP Dn, Dm (Ordered)
x86 double-precision floating-point unordered compare instruction: ucomisd, corresponding to AArch64 FCMP Dn, Dm (Unordered)

AArch64’s FCMP instruction updates the NZCV bits in PSTATE, while x86 floating-point compare instructions update EFLAGS. That means we need an EFLAGS-to-NZCV mapping.

The following diagram shows how x86 floating-point compare flags map to AArch64:

               x86              ARM
            ZF PF CF        N  Z  C  V
equal       1  0  0         0  1  1  0
less        0  0  1   =>    1  0  0  0
greater     0  0  0         0  0  1  0
unordered   1  1  1         0  0  1  1

Implementation

Here the mapping is implemented with conditional instructions (which are more efficient than branch instructions). The C implementation is as follows:

 1#define asm_fcmp32(nzcv, op1, op2)                                                         \
 2{                                                                                          \
 3    int less = 0x8, greater = 0x2;                                                         \
 4    int equal = 0x6, unordered = 0x3, eq_0 = 0;                                            \
 5    asm volatile (                                                                         \
 6        "movd    %[op1], %%xmm0;"                                                         \
 7        "movd    %[op2], %%xmm1;"                                                         \
 8        /* Scalar Ordered Single-FP Compare and Set EFLAGS */                              \
 9        "comiss  %%xmm1, %%xmm0;"                                                         \
10        "mov     %[eq_0], %%eax;"                                                         \
11        "cmovb   %[less], %%eax;"  /* Move if Below (CF=1) */                             \
12        "cmovz   %[eq_0], %%eax;"  /* Move if Zero  (ZF=1) */                             \
13        "mov     %%eax,   %%ecx;"                                                         \
14        "cmovz   %[equal], %%eax;" /* Move if Zero  (ZF=1) */                             \
15        "cmovb   %%ecx, %%eax;"    /* Move if Below (CF=1) */                             \
16        "cmova   %[greater], %%eax;"   /* Move if Above (CF=0 & ZF=0) */                  \
17        "cmovp   %[unordered], %%eax;" /* Move if Parity       (PF=1) */                   \
18        "mov     %%eax, %[nzcv];"                                                         \
19        : [nzcv]"=g"(nzcv)                                                                 \
20        : [op1]"m"(op1), [op2]"m"(op2), [eq_0]"r"(eq_0), [greater]"r"(greater),            \
21          [unordered]"r"(unordered), [equal]"r"(equal), [less]"r"(less)                    \
22        : "cc", "xmm0", "xmm1", "eax", "ecx"                                               \
23    );                                                                                     \
24}

op1 and op2 are the source and destination operands, and the computed result is stored in nzcv.

PS: Taking comisd and ucomisd as examples, the difference between ordered and unordered comparisons is as follows:
comisd: when comparing two numbers, if they are equal (including the case where both are NaN), comisd sets ZF to 1. This means that for comisd, any NaN is treated as equal.
ucomisd: the key difference is how it handles NaN values. If either operand is NaN, ucomisd sets CF (the unordered flag) and PF (the parity flag), and does not change ZF (the zero flag). This means that even if both operands are NaN, ZF is not set, reflecting the mathematical property that NaN is not equal to any other value.

Approach#

Implementation#

Approach

Implementation