DirectXMath 3.11
directxmathOriginally posted to Chuck Walbourn's Blog on MSDN,
DirectXMath version 3.11 is now available on NuGet and GitHub. It will be included in the Windows 10 Fall Creators Update SDK, the Windows 10 April 2018 Update SDK, and the Xbox One XDK (June 2017 or later).
- AVX optimization of
XMMatrixMultiply
andXMMatrixMultiplyTranspose
- AVX2 optimization for
XMVectorSplatX
- FMA3 optimization of
XMVectorMultiplyAdd
andXMVectorNegativeMultiplySubtract
- Conformance fixes to support compilation with Clang 3.7
The main addition for this version are the control defines for _XM_AVX2_INTRINSICS_
and _XM_FMA3_INTRINSICS_
, both of which are enabled when using /arch:AVX2
along with the already existing _XM_F16C_INTRINSICS_
. For details on the few AVX2 optimizations applicable to DirectXMath see this blog post, and for FMA3 see this post. This means that when you build using /arch:AVX2
, the XMVerifyCPUSupport
function will explicitly check for AVX2, FMA3, and F16C processor support.
Down the Conformance Rabbit Hole
For this release I did a fair amount of syntax cleanup for better C++11/C++14 conformance by getting the headers to build without warnings when using the Clang 3.7 compiler with Microsoft codegen. I can’t speak to the quality or correctness of the generated code, but I wanted to make sure the source code was as conforming as I could make it–VS 2017’s /permissive-
standard enforcement switch helps, but there’s no substitute for trying to build with a different compiler toolset.
A basic issue is that intrinsics themselves are implementation dependent, and in particular the way that the type __m128
is defined is not consistent between Visual C++ and Clang. Visual C++ treats it as a union, while Clang considers it a special opaque type. Therefore, I had to modify all places where the members of the __m128
union were being manipulated. This is pretty easy because I already have portable unions that work: XMVECTORF32
, XVMECTORU32
, and XMVECTORI32
.
A knock-on impact of the way the __m128
type is defined means that you can overload free functions based on it with Visual C++, but you cannot do so with Clang. In other words, this is legal C++ with Visual C++ but not when using Clang:
__m128 operator+(__m128 V);
Rather than break existing users of these overloads on Visual C++, I guard their definition with a new control define, _XM_NO_XMVECTOR_OVERLOADS_
, which I automatically enable when building with Clang. This also meant updating all the places in the other DirectXMath implementation headers where I relied on the overloads to use explicit functions instead. Note that there’s no equivalent issue with XMMATRIX
overloads because this is itself a struct.
The bulk of the remaining conformance changes were fully bracing the initialization of XMVECTORF32
and related types:
static const XMVECTORF32 c_value = { 1.f, 2.f, 3.f, 4.f };
had to be changed to:
static const XMVECTORF32 c_value = { { { 1.f, 2.f, 3.f, 4.f } } };
It also turns out that the Clang compiler doesn’t like the trick used by the UNREFERENCED_PARAMETER
macro. Instead of having:
XMVECTOR Permute(FXMVECTOR v1, FXMVECTOR v2) { (v2); return XM_PERMUTE_PS(v1, Shuffle); }
The name of the unreferenced formal parameter has to be removed to make both compilers happy:
XMVECTOR Permute(FXMVECTOR v1, FXMVECTOR) { return XM_PERMUTE_PS(v1, Shuffle); }
I also added guards to #pragma prefast
statements which Clang complains about (although it ignores other common #pragma
statements such as #pragma warning
)
Related: Known Issues: DirectXMath 3.03, DirectXMath 3.06, DirectXMath 3.07, DirectXMath 3.08, DirectXMath 3.09, DirectXMath 3.10, DirectXMath 3.13