Games for Windows and the DirectX SDK blog

Technical tips, tricks, and news about game development for Microsoft platforms including desktop, Xbox, and UWP

Project maintained by walbourn Hosted on GitHub Pages — Theme by mattgraham
Home | Posts by Tag | Posts by Month

DirectXMath AVX and AVX2 - A Coda

directxmath, xbox

Chuck Walbourn -

Over the years, I’ve done a number of optimizations for DirectXMath using advanced instruction sets available on x86/x64 CPUs. For Xbox developers, making the choice to use these is very easy since you can count on them along with AVX. For PC developers, modern x64 development means you can rely on SSE, SSE2–and at this point, SSE3–without sacrificing any target market. I’ve recently done some work for another project unrelated to DirectMath per se, but I wanted to add some notes about using other advanced instruction sets.

The original blog series that summed up the advanced instructions applicable for DirectXMath are:


ABM (Advanced Bit Manipulation) was an instruction set originally introduced by AMD. It includes LZCNT (leading-zero count) and POPCNT (population count). After some back and forth with Intel over this and other instruction set extensions at the time, these are both supported by AMD and Intel, but you need to check more than one bit in CPUID: ABM indicates LZCNT and POPCNT indicates the population count instruction is supported.

For more on the convoluted history here, see Wikipedia.

Generally if the PC you are using supports AVX2, it will support both of these instructions. The Visual C++20 Standard Library header <bit> will therefore use LZCNT and POPCNT when building with /arch:AVX2 if you use std::popcnt and/or std::countl_zero.


BMI (Bit Manipulation Instruction) adds some interesting new instructions like ANDN (Logical and not) and BEXTR (Bit field extract) that can be useful for compiler code-generation when using /arch:AVX2. The TZCNT instruction is also used to implement C++20 std::countr_zero as well.


The BMI 2 instruction set adds a few more instructions, like variants of basic Intel ISA MUL, ROR, SAR, SHR, and SHL that don’t affect eflags. Again, mostly useful for compilers building with /arch:AVX2.

It’s generally advised to avoid using PEXT an PDEP on AMD prior to Zen 3.


The AES (Advanced Encryption Standard) instructions provide hardware acceleration support for the AES cipher. Any PC that supports AVX or AVX2 is likely to support AES.

For more details, see Wikipedia.


The MOVBE instruction (officially called “Move Data After Swapping Bytes” but the mnemonic means “Move Big-Endian”) is an instruction for swapping Big-Endian/Little-Endian 16-bit, 32-bit, and 64-bit data. Much like SSSE3’s PSHUFB which can be used to implement BE swapping for SIMD data vectors, it’s pretty specialized, but useful when you need it.

CPUID Example

This code example show checking each of the CPUID bits mentioned in this blog post.

#if defined(__clang__) || defined(__GNUC__)
#include <cpuid.h>
#include <intrin.h>

int CPUInfo[4] = { -1 };
#if defined(__clang__) || defined(__GNUC__)
__cpuid(0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
__cpuid(CPUInfo, 0);

bool bABM = false;
bool bAES = false;
bool bBMI1 = false;
bool bBMI2 = false;
bool bMOVBE = false;
bool bPOPCNT = false;

const bool checkextfeature = (CPUInfo[0] >= 7);

if (CPUInfo[0] > 0)
#if defined(__clang__) || defined(__GNUC__)
    __cpuid(1, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
    __cpuid(CPUInfo, 1);

    bAES    = (CPUInfo[2] & 0x2000000) != 0;
    bPOPCNT = (CPUInfo[2] & 0x800000) != 0;
    bMOVBE  = (CPUInfo[2] & 0x400000) != 0;

if ( checkextfeature )
#if defined(__clang__) || defined(__GNUC__)
    __cpuid_count(7, 0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
    __cpuidex(CPUInfo, 7, 0);

    bBMI2 = (CPUInfo[1] & 0x100) != 0);
    bBMI1 = (CPUInfo[1] & 0x8) != 0;

#if defined(__clang__) || defined(__GNUC__)
__cpuid(0x80000000, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
__cpuid(CPUInfo, 0x80000000);
if (uint32_t(CPUInfo[0]) > 0x80000000)
#if defined(__clang__) || defined(__GNUC__)
    __cpuid(0x80000001, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
    __cpuid(CPUInfo, 0x80000001);

   bABM = (CPUInfo[2] & 0x20) != 0;

Xbox: Xbox One supports ABM, AES, BMI1, and MOVBE. Xbox Series X|S supports those as well plus BMI2.

Related: See Visual C++ Team Blog