Microsoft Uses Karaoke Feature on China’s Bing Dictionary
By Michael Kan
Microsoft has added a karaoke-like feature to its Bing Dictionary in China, which provides English language learners a new way of practicing their pronunciation online.
The so-called “KTV function” works via videos for select sample sentences from the dictionary. Similar to the karaoke format, the videos display the sentence on the screen, while a model speaks it out loud, teaching the users how to enunciate the words correctly.
The feature is one of the latest ways Microsoft has tried to improve Bing Dictionary for China, which is still in a beta and operates as a Chinese-to-English dictionary. Karaoke, also known as KTV, is a major pastime among Chinese people, with KTV clubs present at many of the street corners in the country.
“It brings a level of user engagement that makes it more personal,” said Matt Scott, the project lead for Bing Dictionary. “They feel like language learning can be fun.”
But the KTV function also uses some cutting edge technology to make the videos for the sample sentences. Rather than repeatedly tape a person speaking the different sample sentences over and over again, Microsoft has found a way to synthesize the sounds and artificially mouth the facial movements of the model speaking in the videos.
It works by capturing shots of all the different pronunciations the model can express in a five-hour process that maps the person’s lips, Scott said. The technology then finds the best match between the lips and what word the programmers want the model to say, creating an accurate lip synch. The end result is a video of the model mouthing the words of the sample sentence while a computerized voice reads it out loud.
The videos currently online show Scott serving as the model, and mouthing the words to the sample sentence. “I never said any of those phrases,” Scott said, as he played one of the videos. “It’s a complex kind of matchmaking,” he added.
Currently, the dictionary has 10 million sample sentences with videos. “About every few weeks, several thousand new videos will go up,” Scott said.
The KTV function was added in September. But last week Microsoft announced the official new model that will be featured in the videos after holding an American Idol-like competition online. Microsoft hoped to find a person with clear pronunciation, a good smile, and a personality that could spark user interest.
Cissy Wong was declared the winner. The video she submitted, where she instructed users who to cook a meal in English, attracted more than 1.8 million views on the Chinese video site Ku6.com. For the next year, she will be featured as Bing Dictionary’s “mouth model.”
“The Bing Dictionary KTV function is really amazing,” said Wong, who decided to participate in the contest after a friend recommended it. She originally thought the process of capturing her facial expression would be more complex. “But actually I only have to sit in front of the camera and read some sentences,” she said in an e-mail. “The computer can capture my mouth and facial expression changes automatically.”
The Bing Dictionary for China is a data-mining project that taps the Web to find accurate Chinese-to-English translations. Many of the sample sentences are found from other sources online, giving users more variety and an up-to-date understanding about how to use the words. The dictionary receives more than 1 million page views a day. Microsoft researchers plan on developing a Japanese-to-English translation dictionary for Bing sometime in the future.