Google's DeepMind Teaches Computers How to Speak Human

...

Apple's Siri personal assistant is getting a lot smarter in the upcoming iOS 10, but odds are she'll still sound like a computer. Meanwhile, a subsidiary of Google (her creator's rival) is working on an entirely new model for teaching computers to convert text to speech.

It's called WaveNet, and Google says it can mimic any human voice while sounding more natural than text-to-speech algorithms available today.

WaveNet is based on research from DeepMind, which this week offered an in-depth look at its efforts to synthesize audio signals for more natural-sounding artificial voices. It all starts with convolutional neural networks, the same technology that powers everything from self-driving cars to disease detection.

Neural networks also now power some current text-to-speech products, including Siri, which two years ago was rebuilt to take advantage of this form of machine learning. But Siri and her colleagues, like Google Voice Search or Amazon's Alexa, still use a database of short speech fragments that are strung together to form complete words and sentences. The result is a halting, emotionless voice, even if it is understandable.

What if instead of using speech fragments, there was a way to efficiently compile pure audio waveforms? Not only would that allow for more natural-sounding speech, but it would also let the computer mimic virtually any sound, including the ability to faithfully reproduce music. DeepMind engineers set to work.

At first, they waged an uphill battle thanks to the inherent density of raw audio, which requires more than 16,000 samples a second for a computer to process. But the engineers were at last able to build a neural network that uses real waveforms from human speakers. They sampled each recording to create a probability distribution of utterances—in essence, teaching the computer how to speak like a human.

"Building up samples one step at a time like this is computationally expensive," according to DeepSense, "but we have found it essential for generating complex, realistic-sounding audio."

The result is remarkable. DeepSense provided samples of its speech capabilities alongside those typically used today, and the difference in inflection, tone, and emotion is immediately apparent. Have a listen for yourself.

It's only natural that computers' speech synthesis will become more, well, natural: Google and its competitors have invested significant resources in developing personal assistants. In order for them to catch on, humans need to think of them less as a gimmick and more as articulate, pleasant robots.

Categories
GAMES
0 Comment

Leave a Reply

Captcha image


RELATED BY

  • 5300c769af79e

    iOS 10: Siri Integrates With Third-Party Apps

    Over the past year, many enterprises have begun to collect and analyze IT security threat data thr As Apple gears up for its iOS 10 public beta in July, a select group of developers will now be able to sync up their apps with users' Siri demands.The software development kit (SDK) can also be used by automakers who want to let Siri control Apple's CarPlay apps.
  • 5300c769af79e

    DEAL: Moto X Pure Edition Starting at $299

    Those in need of a new phone can pick up a Moto X Pure Edition from Motorola, starting at just $299.If colored soft-touch plastic is not your style, you can opt to spend an extra $25 for a real-wood or leather back.
  • 5300c769af79e

    #TBT: The First Five Apps We Installed on Android

    Whenever I setup a new Android phone, I typically dive through the super long list of apps I have previously downloaded under the “All” section of Google Play.For today’s #TBT, I’d like to share the first five apps I ever installed on my “OG” DROID all those years ago.
  • 5300c769af79e

    Namoo – Wonders of Plant Life (for iPad)

    99) hits all the right chords for an educational iPad app introducing elementary-school students to plant physiology.I tested it with an iPad Air 2, taking advantage of that tablet's relatively large screen to display the app's beautiful graphics.