The AVX-512 instruction set has had a bizarre history. Originally introduced with Intel's Xeon Phi processors based on the “Knights Landing” design, it later found its way into the company's server processors starting with Skylake-SP in 2017. The first consumer processors to include AVX-512 were the laptop forms of Ice Lake, which slotted into the 10th-generation Core series, yet the desktop 10th-gen chips lacked the feature entirely.
A lot of people have a lot of strong feelings on AVX-512. Probably too strong, if we're honest. Linus Torvalds famously wished the instruction set a “painful death,” and comments around the web (including on our own AVX-512 stories) seem to indicate that many consumers see the feature as pointless excess. Torvalds himself lamented the die area and research time that AVX-512 units occupy, wishing instead for faster general-purpose performance in lieu of the focus on 512-bit-width vectors with limited application to general-use computing.
Exactly what AVX-512 *is*, however, is a more difficult question to answer, because there are no less than eighteen different categories of “AVX-512” instructions. Not only are there so many new instructions that we can't even list them all, to make matters worse, none of the CPUs with “AVX-512 support” actually support all of the types of AVX-512 instructions. Indeed, while AMD's upcoming Zen 4 CPUs will support AVX-512 in some capacity, we don't know yet exactly which instructions it will support beyond the VNNI block.
Still, even with all those instructions, you may wonder what they're good for. Well, quite a bit, as it turns out—regardless of whether you're working with 512-bit data types. One specific case that we've talked about in the past is for video game emulation. The “Dynarmic” core that translates ARM CPU functions into x86 code is used in several popular emulators, including Nintendo Switch emulator Yuzu and PlayStation Vita emulator Vita3k. It makes extensive use of AVX-512 when it's available for various significant speed-ups.
The emulator RPCS3 goes even further with AVX-512, and processors using it can see 30% or more improved performance in difficult-to-run PlayStation 3 games like God of War III and Red Dead Revolver. The reason for this is a collection of factors that programmer WhatCookie detailed in a post over at his blog. It's all pretty low-level programming stuff, and if you're not a coder, it might go over your head entirely. Don't worry; we'll briefly summarize for you.
Essentially, the benefits of AVX-512 in RPCS3 come down to five things: the larger register file, new instructions, new forms of old instructions, mask register support, and then a greater ability to accommodate the PlayStation 3's idiosyncrasies. The latter point is definitely specific to RPCS3 as an application, but the first four points are qualities of CPUs equipped with AVX-512 support that can definitely benefit almost all types of applications.
Given that AMD's Zen 4 CPUs will come with some measure of AVX-512 support, and given AMD's big drive for market share in the last couple of years, we expect that Intel will have to figure out some way to support the ISA in its hybrid architecture processors—even if that means poking Microsoft and the Linux folks for further and further scheduler modifications.
Obviously, to make use of any instruction set extensions (such as AVX, SSE, or old MMX), the program has to be compiled with such support. Developers of consumer software like PC games are loathe to move to new technologies that may lock out a portion of their customer base, but given the performance gains unlocked by these instruction set extensions, it's only a matter of time before games start to make greater use of wide SIMD.