在 x86 平台上模拟 Aarch64 的 FCMP 指令

体系架构模拟最高效的实现，是尽量做到1：1模拟指令，即一条源目标架构指令对应到一条目标架构指令上。对于解释执行的实现，也要尽可能贴近这个原则，尽量以最少的目标架构指令，来模拟源架构指令。

一、思路介绍

aarch64的FCMP指令定义如下：

Manual: C7.2.59 <Arm Architecture Reference Manual (Armv8-A) 2019>Floating-point quiet Compare (scalar). This instruction compares the two SIMD & FP source register values, or the first SIMD&FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.It raises an Invalid Operation exception only if either operand is a signaling NaN.A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point exceptions and exception traps on page D1-2313.Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.Instruction encoding diagram:│31 30 29 28│27 26 25 24│23 22 21 20│19 18 17 16│15 14 13 12│11 10 09 08│07 06 05 04│03 02 01 00│├───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤│ 0  0  0  1│ 1  1  1  0│ 0  1  1  .│ .  .  .  .│ 0  0  1  0│ 0  0  .  .│ .  .  .  0│ 0  0  0  0│└───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┘Half-precision variantApplies when ftype == 11 && opc == 00.FCMP Hn, HmHalf-precision, zero variantApplies when ftype == 11 && Rm == (00000) && opc == 01.FCMP Hn, #0.0Single-precision variantApplies when ftype == 00 && opc == 00.FCMP Sn, SmSingle-precision, zero variantApplies when ftype == 00 && Rm == (00000) && opc == 01.FCMP Sn, #0.0Double-precision variantApplies when ftype == 01 && opc == 00.FCMP Dn, DmDouble-precision, zero variantApplies when ftype == 01 && Rm == (00000) && opc == 01.FCMP Dn, #0.0

大体可以分为两种：单精度浮点比较，双精度浮点比较，每一种又分为有序比较和无序比较，所以可以总结为四种情况。

恰好这四种情况，都有相应的 x86 指令：

x86 单精度浮点比较（有序）指令：comiss ，对应 aarch64 的 FCMP Sn, Sm (Ordered)
x86 单精度浮点比较（无序）指令：ucomiss ，对应 aarch64 的 FCMP Sn, Sm (Unordered)
x86 双精度浮点比较（有序）指令：comisd ，对应 aarch64 的 FCMP Dn, Dm (Ordered)
x86 双精度浮点比较（无序）指令：ucomisd ，对应 aarch64 的 FCMP Dn, Dm (Unordered)

Aarch64 的 FCMP 指令执行结果更新到 PSTATE寄存器的 NZCV 位，而 x86 的浮点比较指令，执行结果是更新到 eflags 寄存器上，因此需要做一个 eflags 到 NZCV 的映射。

下面给出 x86 浮点比较指令更新标志位映射到 Aarch64 的示意：

               x86              ARM            ZF PF CF        N  Z  C  Vequal       1  0  0         0  1  1  0less        0  0  1   =>    1  0  0  0greater     0  0  0         0  0  1  0unordered   1  1  1         0  0  1  1

二、代码实现

这里用条件指令实现（相比分支指令执行更高效）标志位更新的映射，C代码实现如下：

#define asm_fcmp32(nzcv, op1, op2)                                                         \{                                                                                          \    int less = 0x8, greater = 0x2;                                                         \    int equal = 0x6, unordered = 0x3, eq_0 = 0;                                            \    asm volatile (                                                                         \        "movd    %[op1], %%xmm0;"                                                          \        "movd    %[op2], %%xmm1;"                                                          \        /* Scalar Ordered Single-FP Compare and Set EFLAGS */                              \        "comiss  %%xmm1, %%xmm0;"                                                          \        "mov     %[eq_0], %%eax;"                                                          \        "cmovb   %[less], %%eax;"  /* Move if Below (CF=1) */                              \        "cmovz   %[eq_0], %%eax;"  /* Move if Zero  (ZF=1) */                              \        "mov     %%eax,   %%ecx;"                                                          \        "cmovz   %[equal], %%eax;" /* Move if Zero  (ZF=1) */                              \        "cmovb   %%ecx, %%eax;"    /* Move if Below (CF=1) */                              \        "cmova   %[greater], %%eax;"   /* Move if Above (CF=0 & ZF=0) */                   \        "cmovp   %[unordered], %%eax;" /* Move if Parity       (PF=1) */                   \        "mov     %%eax, %[nzcv];"                                                          \        : [nzcv]"=g"(nzcv)                                                                 \        : [op1]"m"(op1), [op2]"m"(op2), [eq_0]"r"(eq_0), [greater]"r"(greater),            \          [unordered]"r"(unordered), [equal]"r"(equal), [less]"r"(less)                    \        : "cc", "xmm0", "xmm1", "eax", "ecx"                                               \    );                                                                                     \}

op1 和 op2 是源操作数和目的操作数，计算完的结果放在 nzcv 当中。

PS：有序和无序的区别，以 x86 的 comisd 和 ucomisd 指令为例： comisd：当比较两个数时，如果它们相等（包括两个都是NaN的情况），comisd 会设置 ZF 为 1。这意味着，对于 comisd 来说，任何 NaN 都被认为是相等的。

ucomisd：关键的区别在于它对待 NaN 值的方式：如果任何一个操作数是 NaN，ucomisd 会设置 CF（无序标志）和 PF（奇偶标志），并且不会改变ZF（零标志）。这意味着，即使两个操作数都是 NaN，ZF 也不会被设置，这反映了 NaN 与其他任何数值都不相等的数学特性。