DirectXMath - AVX2

directxmath, xbox

Originally posted to Chuck Walbourn's Blog on MSDN, Jun 03, 2015

The Advanced Vector Extensions 2 (AVX2) rounds out the instruction set introduced with AVX. The majority of the new instructions are for 256-bit registers, so they aren’t directly applicable to DirectXMath. AVX2 is very useful if trying to make a fully equivalent double4 version of all the DirectXMath functionality which is otherwise focused on float4 vectors, but that is beyond the scope of this article or the library generally.

The immediate value of targeting AVX2 is that you can make use of the AVX, FMA3, and F16C optimizations already covered on the blog as all of those are included.

There is one more simple substitution for DirectXMath when using AVX2 which is also a special case for XMVectorSwizzle<0,0,0,0>

inline XMVECTOR XM_CALLCONV XMVectorSplatX( FXMVECTOR V )
{
    return _mm_broadcastss_ps( V );
}

Processor Support

AVX2 is supported by Intel “Haswell”, AMD Excavator, and later processors.

In addition to the hardware supporting the new instruction set, the OS must support saving the new YMM register file or the AVX instructions will remain invalid. This support is included in Windows 7 Service Pack 1, Windows Server 2008 R2 Service Pack 1, Windows 8, and Windows Server 2012. This support is indicated by the OSXSAVE bit in CPUID being set along with the AVX2 support bit.

#if defined(__clang__) || defined(__GNUC__)
#include <cpuid.h>
#else
#include <intrin.h>
#endif

int CPUInfo[4] = {-1};
#if defined(__clang__) || defined(__GNUC__)
__cpuid(0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
__cpuid(CPUInfo, 0);
#endif
bool bAVX2 = false;
if ( CPUInfo[0] >= 7 )
{
#if defined(__clang__) || defined(__GNUC__)
    __cpuid(1, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
    __cpuid(CPUInfo, 1);
#endif
    bool bOSXSAVE = (CPUInfo[2] & 0x8000000) != 0;
#if defined(__clang__) || defined(__GNUC__)
    __cpuid_count(7, 0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
    __cpuidex(CPUInfo, 7, 0);
#endif
    bAVX2 = bOSXSAVE && (CPUInfo[1] & 0x20) != 0;
}

Compiler Support

Support for AVX2 intrinsics was added to Visual Studio 2012. The /arch:AVX2 switch is supported by VS 2012 Update 2, although IDE support wasn’t added until VS 2013.

Note that with this switch, the compiler will optimize code to make use of FMA3 automatically where applicable.

Utility Code

The source for this project and the rest of the blog series is now available on GitHub under the MIT license.

Xbox: Xbox One does not support AVX2. Xbox Series X|S does support AVX2.

Windows on ARM64: Windows 11 on ARM64 emulation of x64 now supports AVX instructions and others per this blog post. Initially it only supported up to SSE 4.2.

Games for Windows and the DirectX SDK blog

Processor Support

Compiler Support

Utility Code