Intel has finally defended its AVX-512 instruction set against critics who have gone so far as to wish it to die “a painful death.”
Intel Chief Architect Raja Koduri said the community loves it because it yields huge performance boosts, and Intel has an obligation to offer it across its portfolio.
“AVX-512 is a great feature. Our HPC community, AI community, love it,” Koduri said, responding to a question from PCWorld about the AVX-512 kerfuffle during Intel’s Architecture Day on August 11. “Our customers on the data center side really, really, really love it.”
Koduri said Intel has been able to help customers achieve a 285X increase in performance in “our good old CPU socket” just by taking advantage of the extension.
One person who doesn’t love AVX-512 is Linus Torvalds, the creator of the Linux. In a forum post at Real World Technologies (where he often chimes in), Torvalds spoke plainly about the instruction set that’s included in Intel’s Xeon CPUs and its 10th-gen “Ice Lake” laptop CPU such as the Core i7-1065G7.
“I hope AVX-512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on,” Torvalds wrote. “I hope Intel gets back to basics: Gets their process working again, and concentrate more on regular code that isn’t HPC or some other pointless special case.”
Torvalds said what galled him about AVX-512 on desktops was the performance hit. Intel’s original Skylake-X series, for example, would be forced to lower the CPU clock speed during anything that touched AVX-512.
“I want my power limits to be reached with regular integer code, not with some AVX-512 power virus that takes away top frequency (because people ended up using it for memcpy!) and takes away cores (because those useless garbage units take up space),” Torvalds wrote.
Torvalds wasn’t the only person to kick AVX-512 in the shins either. Former Intel engineer Francois Piednoel also said the special instruction simply didn’t belong in laptops, as the power and die space area trade-offs just aren’t worth it.
Intel’s AVX-512 enables a broad ecosystem
Koduri said he understood the hate, but Intel has obligations to the community, too.
“Our CPU cores are our crown jewels,” Koduri said. “So when we do a CPU core and add an instruction to it, historically the power of x86 and our instruction set extensions have been that we made them available everywhere. Because of that, when we have an IP like Sunny Cove and it appears both in a server like an Ice Lake server and on a client, like an Ice Lake client, you get the commonality of the instruction set.”
Koduri acknowledged some validity to Torvald’s heat, too. “Linus’ criticism from one angle that ‘hey, are there client applications that leverage this vector bit yet?’ may be valid,” he said. Koduri explained further that Intel has to maintain a hardware software contract all the way from servers to laptops, because that’s been the magic of the ecosystem.
“(That’s) the great thing about the x86 ecosystem, you could write a piece of software for your notebook and it could also run on the cloud,” Kodori said. “That’s been the power of the x86 ecosystem.”
Koduri’s comments echo similar comments by D. Wei Li, Intel’s general manager of machine learning performance, who said CPUs for AI and Deep Learning just made sense.
“Why CPU? The CPU is everywhere and general-purpose,” Li said. “When you have a data center you have many Xeons. When you have a laptop, you have a CPU. If you can make CPU work for AI, then everyone can benefit from it.”
And no, hate on AVX-512 and special instructions all you want, Intel isn’t going to change direction. Koduri said it will continue to lean on AVX-512 as well as other instructions.
“We understand Linus’ concerns, we understand some of the issues with first generation AVX-512 that had impact on the frequencies etc, etc,” he said “and we are making it much much better with every generation.”
In fact, performance-minded software blogger Travis Downs has said his testing of a Core i5-1035G4 indicates AVX-512 doesn’t appear to enforce much of a penalty at all on a laptops. Downs testing found the clock speed only dropped 100MHz when using one active core under AVX-512.
“At least, it means we need to adjust our mental model of the frequency related cost of AVX-512 instructions,” Downs concluded. “Rather than ‘generally causing significant downclocking,’ on this Ice Lake chip we can say that AVX-512 causes insignificant or zero licence-based downclocking and I expect this to be true on other Ice Lake client chips as well.” There’s more nuance to his findings, but it’s worth a read.