DirectXMath - ARM64

directxmath

Originally posted to Chuck Walbourn's Blog on MSDN, Dec 05, 2018

The Visual Studio 2017 (15.9 update) now supports the ARM64 architecture for the Universal Windows Platform (UWP) apps.

The ARM64 platform supports ARM-NEON using the same intrinsics as the ARM (32-bit) platform. The Windows on ARM (32-bit) platform assumes support for ARMv7, ARM-NEON, and VFPv3. The Windows on ARM (64-bit) platform assumes support for ARMv8, ARM-NEON, and VFPv4.

ARMv8

The ARMv8 instruction set implies support for several useful intrinsics for DirectXMath data types:

vector divide: vdivq_f32
vector rounding: vrndq_f32, vrndnq_f32, vrndmq_f32, vrndpq_f32
half-precision conversion: vcvt_f32_f16, vcvt_f16_f32
fused-multiply and accumulate: vfmaq_f32, vfmsq_f32

In ARM (32-bit), vector division had to be implemented using multiply-by-reciprocal with 2 or 3 iterations of Newton-Raphson refinement which is less precise. For the ARM64 platform, I was able to replace all uses of divide in non-Est functions with a ‘true divide’ in XMVectorDivide, XMVectorReciprocal, and in the implementation for a number of other functions.

For ARM (32-bit) I used a number of tricks to perform the rounding operations. With the ARM64 platform, I can use the new intrinsics to implement XMVectorRound, XMVectorTruncate, XMVectorFloor, and XMVectorCeiling.

The half-precision conversion intrinsics are used when building for the ARM64 platform to implement XMConvertHalfToFloat, XMConvertFloatToHalf, XMConvertHalfToFloatStream, and XMConvertFloatToHalfStream.

History

DirectXMath 3.07 was the first version to include basic ARM64 support using the same ARM-NEON implementation as used for ARM (32-bit).
DirectXMath 3.10 uses ARMv8 intrinsics when building the ``_M_ARM64`` architecture for optimizations of specific functions including the new XMVectorSum horizontal add function.
DirectXMath 3.12 uses ARM64 fused-multiply and accumulate to implement XMVectorMultiplyAdd and XMVectorNegativeMultiplySubtract on the ARM64 platform.
DirectXMath 3.13 - 3.16 cleaned up the ARM-NEON implementation for better compiler portability for clang/LLVM and GNUC.

Games for Windows and the DirectX SDK blog

ARMv8

History