How audio wizards are shaping the sound of the next generation of PC games
Get ready, PC gamers—change is coming. With major upgrades looming for both the Microsoft Xbox and Sony PlayStation platforms, developers will no longer have to throttle game performance to stay in step with aging console hardware. That means better games for everyone, including the PC gamers who have always been free to upgrade their hardware at will.
The hidden power of sound
One area where you’ll hear—not see—significant progress is game audio. And that matters more than you might realize.
The Xbox 360 and PlayStation 3 took surround sound mainstream. Aural fidelity in games is at an all-time high, and now that games have matured, audio teams are finally getting the attention and respect they deserve from audiences and from game developers. “We’ve seen through scientific research that audio can actually improve somebody’s perception of the video quality,” says Spencer Hooks, senior manager for games at Dolby.
“Try playing a game with the sound turned off. The first thing you notice is, the screen is only displaying 30 to 40 degrees in front of you,” says DICE’s Ben Minto, audio director for Battlefield 4. “If someone sneaks up behind you, you’re not going to figure that out.” Without audio, you lose half of the game experience straight away.
The hidden power of sound cuts both ways, though. Much as with acting, you often notice audio in games only when it’s poor. “If you do a bad job, everyone will notice, and you’ll be blamed for it,” says Simon Ashby, cofounder of Audiokinetic, the company responsible for the audio middleware platform Wwise. “If you do an awesome job, no one will ever talk about it. That’s the way it should be.”
But where do we go from here? Has audio plateaued in the same way some people are saying graphics have? What’s the logical progression to the next generation of audio? The answers to these questions are more dynamic than you may expect.
You’ve come a long way, baby
Video games have come a long way from the assorted chirps and bleeps of Pitfall! and other Atari 2600-era classics. Full orchestral scores, 5.1 channels of surround sound, and professionally voiced characters are as common as pocket lint now, but it wasn’t until recently that PC-game sound could rival that of a Blu-ray movie.
“Audio design ten years ago was really a slave to the hardware,” says Scott Haraldsen, audio lead at Irrational Games. “We were limited by the number of sounds that could be playing simultaneously, and most of those sounds were playing back at a quality less than half of what you hear in games today.”
Developers working on games for the PlayStation 2 had access to 2MB of RAM to store all the sounds that could play at a given moment, a huge jump from the original PlayStation’s 500KB. For comparison, 1 minute of a song on a CD is 10MB. It wasn’t until the jump to the PlayStation 3 that the amount of RAM available to sound hit triple digits: Of the PS3’s total 500MB of RAM, sound has access to just 256MB. Even if you’re playing a current multiplatform game on an Xbox 360 or your PC, the fact is that the game was designed with the PS3’s limitations in mind.
“New consoles usually give you more power in sound, although there are exceptions,” says Halo 3 audio producer Matthew Burns.
The first Xbox had a dedicated sound processor that handled all of a game’s audio tasks on its own, freeing up the CPU for general use. The Xbox 360 does not. Working within common memory on Halo 3 “was a pain,” according to Burns, who thinks Microsoft skimped on hardware.
“They lost a lot of money on the Xbox. They didn’t know what they were doing hardware-wise, so they got rid of the dedicated sound processor and were just like, ‘We have six cores on the main CPU, just run your audio there.’”
That means audio teams on modern games, even cross-platform games, now have to share machine resources with every other department.
“The graphics guys want to use all the CPU. The AI wants to use all the CPU. [As a sound designer] it becomes trying to fight for your place at the table,” Burns says.
Sound takes a backseat in development
From the beginning, sound and story have always taken a backseat to graphics and gameplay in game design. Until recently, both have been implemented very late in the development cycle. Such snubbing makes a little sense: After all, audio designers need to have something on screen before they can start curating sounds. Jack Grillo, musical director on Tomb Raider, says that when the audio specialists are looking for the best time to do their work, it’s after all the other departments are done.
“Those departments often don’t hit deadlines, so we scramble at the end because those milestones don’t change. Games are hard to make, and audio is usually last in line because we can’t do anything ahead of time,” says Grillo.
Part of what Dolby is focused on in its work with the game industry is promoting the idea that audio shouldn’t be a last-minute decision and shouldn’t be given a minimal amount of space on the disc.
“Sound is less tangible because you can’t point at the screen and say, ‘Oh yeah, see? The edge isn’t as jagged and the texture looks better,’” says Hooks.
Audio getting the short shrift is sad, but not surprising, considering the primarily visual nature of video games. “The screenshot sells the game, [whereas] a piece of music won’t,” says Grillo. And like clockwork, once a game’s announcement trailer hits the Internet or once screenshots release, forums light up with threads analyzing the graphics. No one starts flame wars over the audio in a game.
Why the middleware matters
The middleware that developers choose for game audio plays another key role. Since 2006, over 150 games have shipped using the Unreal 3 engine. Unfortunately, Unreal on its own doesn’t offer features that foster a robust audio system. Take BioShock versus BioShock Infinite, for example: The latter incorporated Wwise middleware for audio processing, while the former did not. Listen to both games in quick succession—the difference is jarring.
“Unreal definitely wasn’t the greatest for sound on its own,” says Burns. “Now, if [games] use Unreal, they use Wwise at the same time.”
Wwise includes tools that take the guesswork out of creating and implementing sound for a game. Imagine if a mechanic had to forge a wrench before working on your car, instead of just reaching into the toolbox. Wwise is that toolbox.
“Since a lot of games have started integrating that technology, they have a lot more control. That’s a big part of why the audio has gotten so much better over the generation,” says Burns. “Wwise is probably the best middleware out there.”
The engineers at Epic are well aware of this. That’s why the forthcoming Unreal Engine 4 adds more features to address the audio deficiencies in previous versions. Epic’s dedication to making jaw-dropping visuals easily accessible extends to audio as well. The new engine is combining the typically manpower-intensive, high-dynamic-range audio system pioneered by DICE—and used by Audiokinetic—with traditional methods of sound mixing. Now that HDR audio is going mainstream, you can expect to hear it everywhere—it’ll be next-gen audio’s lens flare.
How DICE developed high-dynamic-range audio
The developers at DICE don't view their audio department as an afterthought. The sound team is involved with animators, character riggers, and even artists.
“We all know we’re contributing to a singular experience,” says Minto. “We’ve got the respect from inside DICE that [means] we can go in and comment on all these different areas, not just the sound.”
It’s this type of thinking that led to the development of high-dynamic-range audio. DICE's HDR audio works similarly to how the human brain perceives sound: It gives the most important sound—an exploding propane tank or a collapsing building, for example— precedence over everything else in the area. This prioritization cuts down on the number of sounds playing at once, spurring designers to use higher-quality sounds as a result.
“If you make everything awesome, then nothing’s awesome,” says Minto. “With HDR, we don’t need to play ten thousand sounds. We play 12 to 15 sounds at the same time, and they’re the most important ones. They can be more expensive.”
The future is in volume control
Sound quality is also affected by volume, and although it seems counterintuitive, video games are actually too loud. The volume standards regulated for movies don’t exist for video games. There are no set parameters for dynamic range—the variation between the highest and the lowest sound frequencies. That’s why Far Cry 3 sounds so much quieter than Metro: Last Light, which sounds louder than Journey, which sounds louder than The Walking Dead does, all at the same volume on your TV or home theater system.
“We have some ignorant producers saying they want to be louder than another game—the loudness war is still there,” says Ashby. “Video games have suffered from that. We’re maturing there; it won’t be a problem forever.”
Ashby suggested to Microsoft, Nintendo, and Sony that they add dynamic-range requirements to their technical guidelines, stating that if a game went above a certain threshold it would be rejected.
“If they enforce rules like that, it will solve the problem in a year,” Ashby says.
Movies use the perceived volume of dialogue as the anchor point to set the levels of every other sound in the scene; nothing plays louder than the dialogue does. Even if it doesn’t make sense to be able to hear a scientist tell you that the blue key opens the blue door while a tornado rages around you, for the sake of clarity the dialogue must never be overshadowed. But because dialogue isn’t always present in games, they have no consistent anchor point. A racing game could use engine sounds, sure, but what about a rally racer with a copilot? Which sound is more important then, the engine or the copilot?
Crystal Dynamics took a creative approach to Tomb Raider’s audio anchor point: Lara Croft herself.
“We’d always come back to trying to make Lara as relatable as possible so the player could get a sense of that character arc,” Grillo says. “Everything always had to come back to her.”
This approach came down to focusing on minor sounds such as her breathing, footsteps, or any emotional moments that needed to be the most important part of a scene.
The far-out future: procedurally generated sound
Perhaps the most promising high-tech aspect of future audio is procedurally generated sound. The easiest example of any sort of procedural approach in game design is the weapon system in the Borderlands series—even with an army of designers, creating each individual gun would have been a logistical nightmare. Instead, the game designers used procedural algorithms to randomly generate and assemble hundreds of thousands of guns. The same technology could revolutionize the sound design of future games.
But the industry is nowhere near ready for it.
“There’s not a lot of research and development for audio, but it’s still real,” says Ashby. “We’ll probably need 10 to 20 years of serious R&D until we reach the level of procedurally generated visuals. That will bite us soon. We’re still turning the wheel, recording sounds and packing more and more sounds [onto a disc]."
That’s a major reason why audio requires as much RAM as it does, too. A game engine can randomly generate 200 different doors of varying materials without taxing the hardware, but those doors are still tied to just a few sound effects for opening each one. This limitation breaks your immersion as a player, but stuffing more sounds into the game eats into the same memory needed to pull the sound file at any given time.
Game audio can’t evolve unless games do, too
The Xbox One and PlayStation 4 present new opportunities for the field of game audio to evolve, and PC gamers are standing ready to reap the benefits. A rising tide lifts all boats, after all, and as consoles evolve and grow more akin to gaming PCs, the games will evolve along with them. All we need now is bodies to fill out the ballooning development teams necessary to make these next-gen games—the next generation of audio engineers, if you will.
And no matter how powerful the hardware gets or how seasoned the developers become, the best way to advance the field of game audio design is to make more games. Grillo says that if games continue to take two to four years to develop, that affords game makers fewer opportunities to learn from their mistakes and to build better game audio from the ground up.
Tomb Raider was Grillo’s only project for the last generation of gaming hardware, yet it took three years to make. If PC games continue to be built at that scale, the glacial pace of development will stifle innovation. Making games shorter or splitting them up into bite-size episodes is probably the best thing developers can do to push the field of game audio—and game development in general—forward.