Back in September, Nvidia unveiled some details of its new graphics architecture, code-named Fermi. The focus at that time was on GPU compute features. Today, Nvidia has unveiled some details about the new chip that as it relates to traditional graphics. A complete picture of the chip code-named GF100 (it stands for “Graphics Fermi 100”, though the actual product names have not been revealed yet) is starting to come together, but we’re still left wondering about some important details.
First, here’s what was not revealed today: the size of the chip in mm2, clock speeds, board-specific specs like amount of RAM, costs, or release dates.
The GF100 architecture is a fairly big departure from past Nvidia chips. Of course, all DirectX 11 features are present. The chip is divided into four main “graphics processing clusters”, each with four “streaming multiprocessors.” Each of these 16 SMs has its own geometry setup and processing unit, which Nvidia calls a Polymorph Engine. This handles vertex fetch, tessellation, view transform, setup, and geometry stream output. The SM then has 32 shader cores, or “CUDA processors.” The four SMs in each GPC share a single raster engine unit. This is a fairly big organizational change for Nvidia’s chips – past GPUs from the company had one geometry engine shared by the entire chip, while this chip effectively has one for each group of 32 shader cores. The SM cores have 64KB of L1 cache.
All these cores communicate to each other through 48 ROPS that are attached to 768KB of L2 cache. This cache is not only larger than what was found in the previous GT200 chip (whose L2 cache was 256KB), it’s more capable, will full read/write access. The chip uses this big pool of L2 cache to share data between the 16 SM units, instead of a wide bus or massive crossbar architecture. Note that this cache, and the number of ROPs, is tied to the main memory bus bandwidth. The full chip has a 384-bit GDDR5 memory interface. There are always 8 ROPs and 128KB of L2 cache for each 64-bit memory interface. So a scaled-back version of the chip, with perhaps a 256-bit memory interface, would have 32 ROPs and 512KB of L2 cache.
Enough with the technical gobbledygook. What will cards based on the GF100 perform like in games? Unfortunately, we don’t really know. Nvidia’s early benchmarks has it performing up to twice as fast as the Radeon HD 5870 (ATI’s fastest single-chip DX11 graphics card) in some tests. Those are usually directed geometry-heavy benchmarks, though. In real games, it looks like performance will be anywhere from 20-50% faster, depending on the game and settings. Nvidia promises dramatically better performance with 8x anti-aliasing modes this time around, and a much lower performance hit over the 4x MSAA modes (which would put them nicely in line with ATI’s latest GPUs). A new 32x coverage sampling anti-aliasing mode could be the new high quality mark, and might be fast enough to be truly useable, but we’ll have to wait for our testing to bear that out.
Of course, better performance than ATI’s new cards alone won’t be enough to win the hearts and minds of gamers. Cards based on the GF100 will have to prove themselves a good value. The vast magority of consumers buy cards costing $200 or less, so the absolute performance of a $400-500 card is, in some way, academic if it can’t be effeciently scaled back to a great-performing mid-range or value card. Our estimate is that this GF100 chip is likely to be 60-70% larger than the “Cypress” chip in the Radeon HD 5870, making it far more expensive to manufacture. A 384-bit memory interface means more expensive boards, too. All things being equal, ATI has a lot more price flexibility and half a year of manufacturing the 5870 under its belt, so it probably has a lot of price flexibility. Nvidia will have to sell GF100-based cards at high prices (at least at first), while ATI can easily drop the price of the Radeon HD 5870 substantially to remain competitive, and still turn a profit.
We also don’t know what the clock speeds are going to be for the first GF100 cards, which various products they’ll release based on the chip, and how much power they’ll use. What’s more, ATI could be cherry-picking the Cypress chips that run best for a hot-rod refresh just in time to compete with the first GF100 cards. We simply don’t know how the competitive landscape will shake out, except to say that Nvidia is only now talking about their top-end version of its new DX11 GPU, while ATI has released DX11 cards across the entire range from $100 on up, including notebook graphics chips. It’s great to hear Nvidia talking about this stuff, and the GF100 is a very interesting chip, but in some ways the company is still paying catchup in its core graphics market.