The ARM64 platform supports ARM-NEON using the same intrinsics as the ARM (32-bit) platform. The Windows on ARM (32-bit) platform assumes support for ARMv7, ARM-NEON, and VFPv3. The Windows on ARM (64-bit) platform assumes support for ARMv8, ARM-NEON, and VFPv4.
The ARMv8 instruction set implies support for several useful intrinsics for DirectXMath data types:
- vector divide:
- vector rounding:
- half-precision conversion:
- fused-multiply and accumulate:
In ARM (32-bit), vector division had to be implemented using multiply-by-reciprocal with 2 or 3 iterations of Newton-Raphson refinement which is less precise. For the ARM64 platform, I was able to replace all uses of divide in non-Est functions with a ‘true divide’ in XMVectorDivide, XMVectorReciprocal, and in the implementation for a number of other functions.
For ARM (32-bit) I used a number of tricks to perform the rounding operations. With the ARM64 platform, I can use the new intrinsics to implement XMVectorRound, XMVectorTruncate, XMVectorFloor, and XMVectorCeiling.
The half-precision conversion intrinsics are used when building for the ARM64 platform to implement XMConvertHalfToFloat, XMConvertFloatToHalf, XMConvertHalfToFloatStream, and XMConvertFloatToHalfStream.
- DirectXMath 3.07 was the first version to include basic ARM64 support using the same ARM-NEON implementation as used for ARM (32-bit).
- DirectXMath 3.10 uses ARMv8 intrinsics when building the ``_M_ARM64`` architecture for optimizations of specific functions including the new XMVectorSum horizontal add function.
- DirectXMath 3.12 uses ARM64 fused-multiply and accumulate to implement XMVectorMultiplyAdd and XMVectorNegativeMultiplySubtract on the ARM64 platform.
- DirectXMath 3.13 - 3.16 cleaned up the ARM-NEON implementation for better compiler portability for clang/LLVM and GNUC.