On Tuesday, Nvidia announced its next GPU, called Pascal, which will dramatically rearchitect the traditional graphics card by replacing the PCI Express bus and combining it with a new 3D memory technology.
And for those who need a supercomputer now: Nvidia also announced its next-generation Titan card, the GeForce GTX Titan Z, starting at a whopping $2,999. Nvidia also announced what it calls the Iray VCA: a $50,000 virtual computing appliance for rendering images using modeled photons, to produce photorealistic images at up to 60 times the speed of today’s Nvidia-powered workstations. Finally, Nvidia announced the $192 Jetson TK1, an embedded development board to take its Tegra K1 to robots and computer vision, and disclosed the name of its next-generation Tegra chip, dubbed Erista.
Nvidia will ship the Pascal processor in 2016, Nvidia chief executive Jen-Hsun Huang said during the keynote address of its GPU Technology Conference, which focuses on the application of GPUs in computational problems. The conference originally was predicated on Nvidia’s CUDA programming language, which allowed scientists to program a GPU much like a computer.
It will replace Maxwell, the graphics architecture Nvidia announced last month.
It's unclear how much Nvidia's work, as evidenced by the GPU Technology Conference, contributes to Nvidia's bottom line. But Nvidia has consistently seen growth for in its high-end, high-margin GPUs for the desktop and workstation, which has been partially offset by declines in PCs and notebooks. And problems that Nvidia can solve with its supercomputer GPUs can trickle down into the PC.
And Nvidia faces a thorny one. The problem, as Huang explained it, is relatively simple to express: GPUs are among the biggest chips ever created, but the demands for increasing GPU performance are unceasing. Accessing the memory used as a scratchpad for GPU computations can be improved by using a wider bus interface, but that adds more pins—and there can be a physical limit to how many pins can surround a chip. Simply forcing more data through those pins at higher and higher clock speeds increases the power consumed—and the heat produced—to unmanageable levels.
The answer, Huang said, was a twofold solution: NVLink, a replacement for PCI Express, which will offer between five and 12 times the performance of PCI Express; and the 3D interface, which will stack memory and other chips on top of one another, running via wires through the chips and the substrate to save space. The latter solution will help push memory bandwidth to 1000 times where they are now, Huang said.
The module that Nvidia built to house the Pascal chip architecture is one-third the size of a typical PCI Express module, Huang said. A new connector will allow it to be plugged into the motherboard, Nvidia said.
NVLink was codeveloped with IBM, and will incorporated into the OpenPower architecture that IBM is spearheading, Nvidia said. NVLink will be used not only to connect a GPU to the motherboard, but also to connect GPUs to each other, with improvements of up to 5X in terms of GPU to GPU scaling, Huang said.
Likewise, scaling is one of the problems that the new 3D architecture will help solve. Nvidia’s existing Kepler architecture already provides 288 Gbytes/s of memory bandwidth, according to Huang, but that too will inevitably increase over time. By stacking memory and other chips on top of one another, “in a couple of years we’re going to take bandwidth to a whole new level,” Huang said.
The idea is to use these GPUs to not only solve big-data simulations of weather, economics, and other computationally intensive problems, but also render images photorealistically. And the next step, as Huang said, was to combine both: for example, Nvidia sent engineers to take high-dynamic range photos of the stage itself, then added a realistic car model in the center, and moved the “camera” about to explore it.
Combining realistic graphics with a dynamic scene has typically been the provenance of CGI movies, but that has also moved from the big screen to the computer monitor. Huang showed off a demonstration of the next-generation Unreal Engine 3 running on top of the new GeForce GTX Titan Z that looked, in places, completely real. But there’s obviously a price: the Titan Z will cost $3000, although it will provide 5760 CUDA cores with two Kepler cores inside of it, 12GB of memory, and 8 teraflops of computing power. Oh, and it will consume 2000 watts by itself.
Pairing three of those GPUs together, Huang said, would provide the computational power of the “Google Brain,” the company’s effort to model the human brain that originally used a cluster of 16,000 computers. Just three GTX Titan Zs could be used instead, Huang said.
And if that’s not enough, Nvidia has an Iray VCA to offer you. Essentially, the VCA is a remote server designed as a “render farm” for companies, taking a scene and rendering it as quickly as possible. The technology uses what Nvidia calls “Irays,” modeling photons that fly though the air, bouncing off objects and being absorbed by them. Each VCA contains 8 GPUs for a total of 23,000 CUDA cores, which can access 12 Gbytes of memory per VCA. Each VCA runs on top of mobile graphics packages like Maya and 3DS Max. “What would take an hour to render, now takes a minute,” Huang said.
Each Iray can be tied to others, using Nvidia software to connect them together and run them in parallel. Nvidia combined 19 iRays together to produce the equivalent of a petaflop—that's equivalent to the fastest supercomputer in the world, six years ago, Huang said.
Finally, Nvidia showed off the Jetson TK1, an embedded module that takes the Tegra K1 Nvidia announced at CES and brings it to the mobile space. Eventually, chips like the Tegra K1 and future iterations, like the Erista, will be the computational engine of Audi’s self-driving cars and more.
According to Huang, Nvidia's chips are all predicated on the CUDA technology Nvidia unveiled a few years ago. Can the company successfully expand beyond the PC? If you ask Huang, it's already happening.