How audio wizards are shaping the sound of the next generation of PC games

Why the middleware matters

The middleware that developers choose for game audio plays another key role. Since 2006, over 150 games have shipped using the Unreal 3 engine. Unfortunately, Unreal on its own doesn’t offer features that foster a robust audio system. Take BioShock versus BioShock Infinite, for example: The latter incorporated Wwise middleware for audio processing, while the former did not. Listen to both games in quick succession—the difference is jarring.

“Unreal definitely wasn’t the greatest for sound on its own,” says Burns. “Now, if [games] use Unreal, they use Wwise at the same time.”

Wwise includes tools that take the guesswork out of creating and implementing sound for a game. Imagine if a mechanic had to forge a wrench before working on your car, instead of just reaching into the toolbox. Wwise is that toolbox.

The new Unreal Engine 4 is built to deliver amazing graphics, but it also has some surprising audio features under the hood.

“Since a lot of games have started integrating that technology, they have a lot more control. That’s a big part of why the audio has gotten so much better over the generation,” says Burns. “Wwise is probably the best middleware out there.”

The engineers at Epic are well aware of this. That’s why the forthcoming Unreal Engine 4 adds more features to address the audio deficiencies in previous versions. Epic’s dedication to making jaw-dropping visuals easily accessible extends to audio as well. The new engine is combining the typically manpower-intensive, high-dynamic-range audio system pioneered by DICE—and used by Audiokinetic—with traditional methods of sound mixing. Now that HDR audio is going mainstream, you can expect to hear it everywhere—it’ll be next-gen audio’s lens flare.

How DICE developed high-dynamic-range audio

The developers at DICE don't view their audio department as an afterthought. The sound team is involved with animators, character riggers, and even artists.

“We all know we’re contributing to a singular experience,” says Minto. “We’ve got the respect from inside DICE that [means] we can go in and comment on all these different areas, not just the sound.”

DICE is famous for building top-tier games such as Battlefield 3 with best-in-class audio effects.

It’s this type of thinking that led to the development of high-dynamic-range audio. DICE's HDR audio works similarly to how the human brain perceives sound: It gives the most important sound—an exploding propane tank or a collapsing building, for example— precedence over everything else in the area. This prioritization cuts down on the number of sounds playing at once, spurring designers to use higher-quality sounds as a result.

“If you make everything awesome, then nothing’s awesome,” says Minto. “With HDR, we don’t need to play ten thousand sounds. We play 12 to 15 sounds at the same time, and they’re the most important ones. They can be more expensive.”

The future is in volume control

Sound quality is also affected by volume, and although it seems counterintuitive, video games are actually too loud. The volume standards regulated for movies don’t exist for video games. There are no set parameters for dynamic range—the variation between the highest and the lowest sound frequencies. That’s why Far Cry 3 sounds so much quieter than Metro: Last Light, which sounds louder than Journey, which sounds louder than The Walking Dead does, all at the same volume on your TV or home theater system.

“We have some ignorant producers saying they want to be louder than another game—the loudness war is still there,” says Ashby. “Video games have suffered from that. We’re maturing there; it won’t be a problem forever.”

Ashby suggested to Microsoft, Nintendo, and Sony that they add dynamic-range requirements to their technical guidelines, stating that if a game went above a certain threshold it would be rejected.

“If they enforce rules like that, it will solve the problem in a year,” Ashby says.

Simon Ashby works on Wwise, audio middleware built for games.

Movies use the perceived volume of dialogue as the anchor point to set the levels of every other sound in the scene; nothing plays louder than the dialogue does. Even if it doesn’t make sense to be able to hear a scientist tell you that the blue key opens the blue door while a tornado rages around you, for the sake of clarity the dialogue must never be overshadowed. But because dialogue isn’t always present in games, they have no consistent anchor point. A racing game could use engine sounds, sure, but what about a rally racer with a copilot? Which sound is more important then, the engine or the copilot?

Crystal Dynamics took a creative approach to Tomb Raider’s audio anchor point: Lara Croft herself.

“We’d always come back to trying to make Lara as relatable as possible so the player could get a sense of that character arc,” Grillo says. “Everything always had to come back to her.”

This approach came down to focusing on minor sounds such as her breathing, footsteps, or any emotional moments that needed to be the most important part of a scene.

The far-out future: procedurally generated sound

Perhaps the most promising high-tech aspect of future audio is procedurally generated sound. The easiest example of any sort of procedural approach in game design is the weapon system in the Borderlands series—even with an army of designers, creating each individual gun would have been a logistical nightmare. Instead, the game designers used procedural algorithms to randomly generate and assemble hundreds of thousands of guns. The same technology could revolutionize the sound design of future games.

But the industry is nowhere near ready for it.

“There’s not a lot of research and development for audio, but it’s still real,” says Ashby. “We’ll probably need 10 to 20 years of serious R&D until we reach the level of procedurally generated visuals. That will bite us soon. We’re still turning the wheel, recording sounds and packing more and more sounds [onto a disc]."

That’s a major reason why audio requires as much RAM as it does, too. A game engine can randomly generate 200 different doors of varying materials without taxing the hardware, but those doors are still tied to just a few sound effects for opening each one. This limitation breaks your immersion as a player, but stuffing more sounds into the game eats into the same memory needed to pull the sound file at any given time.

Game audio can’t evolve unless games do, too

The Xbox One and PlayStation 4 present new opportunities for the field of game audio to evolve, and PC gamers are standing ready to reap the benefits. A rising tide lifts all boats, after all, and as consoles evolve and grow more akin to gaming PCs, the games will evolve along with them. All we need now is bodies to fill out the ballooning development teams necessary to make these next-gen games—the next generation of audio engineers, if you will.

And no matter how powerful the hardware gets or how seasoned the developers become, the best way to advance the field of game audio design is to make more games. Grillo says that if games continue to take two to four years to develop, that affords game makers fewer opportunities to learn from their mistakes and to build better game audio from the ground up.

Tomb Raider was Grillo’s only project for the last generation of gaming hardware, yet it took three years to make. If PC games continue to be built at that scale, the glacial pace of development will stifle innovation. Making games shorter or splitting them up into bite-size episodes is probably the best thing developers can do to push the field of game audio—and game development in general—forward.

Subscribe to the Daily Downloads Newsletter

Comments