DirectXMath 3.11

directxmath

Originally posted to Chuck Walbourn's Blog on MSDN, Jun 28, 2017

DirectXMath version 3.11 is now available on NuGet and GitHub. It will be included in the Windows 10 Fall Creators Update SDK, the Windows 10 April 2018 Update SDK, and the Xbox One XDK (June 2017 or later).

AVX optimization of XMMatrixMultiply and XMMatrixMultiplyTranspose
AVX2 optimization for XMVectorSplatX
FMA3 optimization of XMVectorMultiplyAdd and XMVectorNegativeMultiplySubtract
Conformance fixes to support compilation with Clang 3.7

The main addition for this version are the control defines for _XM_AVX2_INTRINSICS_ and _XM_FMA3_INTRINSICS_, both of which are enabled when using /arch:AVX2 along with the already existing _XM_F16C_INTRINSICS_. For details on the few AVX2 optimizations applicable to DirectXMath see this blog post, and for FMA3 see this post. This means that when you build using /arch:AVX2, the XMVerifyCPUSupport function will explicitly check for AVX2, FMA3, and F16C processor support.

Down the Conformance Rabbit Hole

For this release I did a fair amount of syntax cleanup for better C++11/C++14 conformance by getting the headers to build without warnings when using the Clang 3.7 compiler with Microsoft codegen. I can’t speak to the quality or correctness of the generated code, but I wanted to make sure the source code was as conforming as I could make it–VS 2017’s /permissive- standard enforcement switch helps, but there’s no substitute for trying to build with a different compiler toolset.

A basic issue is that intrinsics themselves are implementation dependent, and in particular the way that the type __m128 is defined is not consistent between Visual C++ and Clang. Visual C++ treats it as a union, while Clang considers it a special opaque type. Therefore, I had to modify all places where the members of the __m128 union were being manipulated. This is pretty easy because I already have portable unions that work: XMVECTORF32, XVMECTORU32, and XMVECTORI32.

A knock-on impact of the way the __m128 type is defined means that you can overload free functions based on it with Visual C++, but you cannot do so with Clang. In other words, this is legal C++ with Visual C++ but not when using Clang:

__m128 operator+(__m128 V);

Rather than break existing users of these overloads on Visual C++, I guard their definition with a new control define, _XM_NO_XMVECTOR_OVERLOADS_, which I automatically enable when building with Clang. This also meant updating all the places in the other DirectXMath implementation headers where I relied on the overloads to use explicit functions instead. Note that there’s no equivalent issue with XMMATRIX overloads because this is itself a struct.

The bulk of the remaining conformance changes were fully bracing the initialization of XMVECTORF32 and related types:

static const XMVECTORF32 c_value = { 1.f, 2.f, 3.f, 4.f };

had to be changed to:

static const XMVECTORF32 c_value = { { { 1.f, 2.f, 3.f, 4.f } } };

It also turns out that the Clang compiler doesn’t like the trick used by the UNREFERENCED_PARAMETER macro. Instead of having:

XMVECTOR Permute(FXMVECTOR v1, FXMVECTOR v2) { (v2); return XM_PERMUTE_PS(v1, Shuffle); }

The name of the unreferenced formal parameter has to be removed to make both compilers happy:

XMVECTOR Permute(FXMVECTOR v1, FXMVECTOR) { return XM_PERMUTE_PS(v1, Shuffle); }

I also added guards to #pragma prefast statements which Clang complains about (although it ignores other common #pragma statements such as #pragma warning)

Games for Windows and the DirectX SDK blog

Down the Conformance Rabbit Hole