AMD threw out a bombshell and accused its rival Intel and BAPCo, the benchmarking consortium, of cheating.
In a video posted Thursday on Youtube, John Hampton, director of AMD’s client computing products, went so far as to refer obliquely to the recent Volkswagen scandal, where the German car manufacturer was accused of cheating on diesel emissions tests. “The recent debacle over a major auto maker provide the perfect illustration as to why the information provided by even the most established organizations can be misleading,” Hampton said.
Intel declined to comment on AMD’s accusation, but when asked BAPCo officials said its customers trusted it.
"The reason thousands of customers trust BAPCo benchmarks is because we are an industry consortium that focuses on the performance of applications that people use on a daily basis," a spokesman for the consortium said.
Why this matters: Performance still matters to consumers and organizations. Third-party benchmarks hold heavy sway over purchasing decisions even if few understand what they measure. AMD asks reasonable questions, but the answers remain murky—even from AMD.
AMD makes its case
Hampton laid out AMD's case in the video. “So truth or myth: is Sysmark a reliable, objective, unbiased benchmark to use in evaluating system performance?” Hampton asked. Hampton and AMD engineering manager Tony Salinas then ran two “similar” laptops running Sysmark 2014. The Core i5 laptop scored about 987, while the AMD FX laptop scored 659.
Salinas then ran the same laptop in Futuremark’s PCMark 8 Work Accelerated workload. While the AMD FX laptop is slower, it’s only about 7 percent slower.
One final test Salinas ran was an unidentified benchmark using Microsoft Office. The Core i5 finished in 61 seconds, while the FX chip finished in 64 seconds.
“What we concluded is that Sysmark does not use realistic every day workloads,” Hampton said. He encouraged viewers to read the FTC’s fine print, which dictated what Intel had to disclose on benchmarks.
The FTC ruling in 2010 bound Intel to say: “Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Sysmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchase, including the performance of that product when combined with other product.”
A longstanding feud
AMD’s problems actually go all the way back to 2000, when the company’s Athlon XP CPU was kicking Pentium 4 butt in Sysmark 2001. When Sysmark 2002 was released, however, the Pentium 4 was suddenly the leader. After that AMD decided to join BAPCo in an attempt to have more influence over what it tested.
The company stayed in BAPCo through 2011 when, in a much-publicized blowup, it quit and walked away, accusing the test of being cooked for Intel’s CPUs. Although they didn’t say why, Nvidia and VIA left BAPCo at the same time.
BAPCo has primarily been made up of PC OEMs, along with Intel and other companies. At one point, even Apple joined BAPCo, as well as media organizations.
Sysmark uses off-the-shelf applications such as Photoshop, Premiere, Word, and Excel. It tasks the apps with a workload and then measures only the response time to the task.
AMD’s problems haven’t always been the apps, but the workloads. When it quit in 2011, the company told me at the time that it just didn’t think Sysmark exploited the “future” of computing and didn’t test the GPU.
Unsurprisingly, five years later, AMD’s complaints are the same. In the company’s video, Hampton says: “There is an excessive amount of high CPU tasking being done (in SYSMark). That is, the benchmark is really only evaluating the CPU side of the system.”
Benchmarking vs. benchmarketing
Part of the problem is the politics behind benchmarking—the not-so-fine line when it might turn into "benchmarketing," when numbers and tests are cherry-picked to make one product look better than the other. In this case, AMD is likely telling the truth that BAPCo 2014 1.5 focuses mostly on pure CPU performance. But isn’t that what it’s supposed to do? Measure the CPU performance?
From AMD’s perspective, no. The company has long insisted the future is about GPU computing. And, well, no surprise, AMD has also long enjoyed a performance advantage over Intel’s CPUs in graphics performance.
In fact, one of the tests AMD uses to show it’s behind Intel, but not that far behind, is PCMark 8 Work Accelerated. The test has two options: One uses OpenCL, which taps the GPU, while the other relies on just the CPU.
This begs the question: What was the score on that same laptop if the GPU wasn’t factored into it? Is there a little benchmarketing going on there from AMD?
You’d also have to ask yourself, how many common work or office apps today heavily rely on OpenCL? Few to none, I’d guess.
What we run and why
As someone who has burned too many hours coaxing Sysmark to run on systems, I was glad to leave it behind. I didn’t have any proof it was cooked, but it took forever to install and forever to run. In those days, it would often bomb out, meaning you wasted yet another day.
The methodology seemed very solid, though. For example, rather than “type” a document at 1,000 wpm (which many Office suite tests did and still do), Sysmark found a way to “type” at realistic human speeds while measuring only the response time.
But in 2016, who the hell cares? In 1997 we cared about typing in Word or viewing a PowerPoint, but today any PC with an SSD, enough RAM and a reasonably fast CPU does the job for 90 pecent of work tasks. Most of us could not tell the difference between a dual-core Core i3 or 8-core Core i7 chip (with proper RAM and SSD) for standard Office drone tasks.
That’s why I often use PCMark 8 Conventional, which runs on just the CPU, to illustrate that it really doesn’t matter that much. Here’s the result from a stack of laptops. My real-world use of all of these laptops—from Haswell to Skylake, and from Core M to Core i7—confirms that I can’t tell the difference in Google Chrome, Outlook, and Word from Surface Book to a ZenBook. Atom though, that’s another story.
What you should pay attention to
The take away from this latest kerfuffle isn’t that benchmarks don’t matter, it’s that people—and testers—should apply and interpret them correctly. In Office, who cares if you have a Core i3 or FX CPU. In a video encode or a game though? Hell yes it matters.