Nvidia reveals PC-like performance for 'Denver' Tegra K1
Nvidia hopes that the 64-bit “Denver” version of its Tegra K1 processor will offer PC-like performance in a tablet form factor. On Monday, the company released its first benchmarks backing that up.
At the Hot Chips conference in San Jose, Nvidia revealed some of the differences distinguishing its 32-bit, quad-core Nvidia Tegra K1 chip, which debuted at CES in January, with the 64-bit, dual-core version of the same chip. While Nvidia has shipped its 32-bit Tegra K1, most recently in the Acer Chromebook 13, the “Denver” of the Tegra K1 version has not yet been released. Denver will run slightly faster than the 32-bit K1: up to 2.5GHz, versus 2.3GHz for the latter.
Nvidia’s Denver chip is also one of the first to use a 64-bit ARMv8 ARM architecture, a relative rarity in the space. Although ARM is moving to the second generation of its 64-bit architecture, its licensing partners have been slower to develop and manufacture their own chips based on the design. Applied Micro and AMD have also scheduled presentations at Hot Chips to discuss 64-bit ARM chips, specifically for servers.
According to Darrell Boggs, a chip architect for Nvidia, the “Denver” chip” and the 32-bit version of the Tegra K1 share the same 192-core “Kepler” graphics core that helps give the K1 its performance. But the 64-bit Denver includes chip optimizations that can push the number of instructions it can process per clock cycle to 7, versus just 3 for the 32-bit version.
Denver, Boggs said, was designed for content creation and gaming.
One of Denver’s architectural quirks involves dynamic code optimization, a computing technique that hearkens back to the days of Transmeta, a high-profile startup that also used code interpretation and optimization in the early 2000s. But Transmeta failed in part because it simply couldn’t deliver the performance that customers expected, which Boggs ascribed to the “cliff” between natively executed code and what was interpreted. (Intel and AMD, among others, also quickly ramped up the performance of their own mobile chips to compensate.)
Denver solves that problem, Boggs said, by having a much more powerful hardware platform to run non-native code, however inefficiently. It also waits until a core sits idle to begin the code translation process, and it includes new low-power states beyond what the 32-bit K1 itself offers.”The issues you saw with Transmeta devices you won’t see with Tegra,” he said.
The Denver chip includes a 128KB, 4-way level 1 instruction cache, a 64KB, 4-way level 2 data cache, and a 2MB, 16-way level 2 cache, all of which can service both cores. Denver also sets aside 128MB of main memory as an interpretation cache, that the main operating system won’t be able to see or access.
Nvidia also released its benchmarks, which compare the 64-bit Denver processor to the 32-bit version (normalized to 100%) and then to other, competing processors.
“In effect we are bringing PC-class performance into the ARM ecosystem,” Boggs said.