DirectXMath - AVX2
directxmath, xboxOriginally posted to Chuck Walbourn's Blog on MSDN,
The Advanced Vector Extensions 2 (AVX2) rounds out the instruction set introduced with AVX. The majority of the new instructions are for 256-bit registers, so they aren’t directly applicable to DirectXMath. AVX2 is very useful if trying to make a fully equivalent double4
version of all the DirectXMath functionality which is otherwise focused on float4
vectors, but that is beyond the scope of this article or the library generally.
The immediate value of targeting AVX2 is that you can make use of the AVX, FMA3, and F16C optimizations already covered on the blog as all of those are included.
There is one more simple substitution for DirectXMath when using AVX2 which is also a special case for XMVectorSwizzle<0,0,0,0>
inline XMVECTOR XM_CALLCONV XMVectorSplatX( FXMVECTOR V )
{
return _mm_broadcastss_ps( V );
}
Processor Support
AVX2 is supported by Intel “Haswell”, AMD Excavator, and later processors.
In addition to the hardware supporting the new instruction set, the OS must support saving the new YMM register file or the AVX instructions will remain invalid. This support is included in Windows 7 Service Pack 1, Windows Server 2008 R2 Service Pack 1, Windows 8, and Windows Server 2012. This support is indicated by the OSXSAVE
bit in CPUID
being set along with the AVX2 support bit.
#if defined(__clang__) || defined(__GNUC__)
#include <cpuid.h>
#else
#include <intrin.h>
#endif
int CPUInfo[4] = {-1};
#if defined(__clang__) || defined(__GNUC__)
__cpuid(0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
__cpuid(CPUInfo, 0);
#endif
bool bAVX2 = false;
if ( CPUInfo[0] >= 7 )
{
#if defined(__clang__) || defined(__GNUC__)
__cpuid(1, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
__cpuid(CPUInfo, 1);
#endif
bool bOSXSAVE = (CPUInfo[2] & 0x8000000) != 0;
#if defined(__clang__) || defined(__GNUC__)
__cpuid_count(7, 0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
__cpuidex(CPUInfo, 7, 0);
#endif
bAVX2 = bOSXSAVE && (CPUInfo[1] & 0x20) != 0;
}
Compiler Support
Support for AVX2 intrinsics was added to Visual Studio 2012. The /arch:AVX2
switch is supported by VS 2012 Update 2, although IDE support wasn’t added until VS 2013.
Note that with this switch, the compiler will optimize code to make use of FMA3 automatically where applicable.
Utility Code
The source for this project and the rest of the blog series is now available on GitHub under the MIT license.
Xbox: Xbox One does not support AVX2. Xbox Series X|S does support AVX2.
See also: SSE. SSE2. and ARM-NEON; SSE3 and SSSE3; SSE4.1 and SSE4.2; AVX; F16C and FMA; ARM64