Uncategorized February 14, 2017

Better Than Siri? Creating a Computer Voice We’ll All Like

Kaleigh O’Merry

Entity explores the future of robot technology and computer voices.

In 1998, the New York Times reported on one of the first virtual admin assistants to arrive to the tech world. Her name was Wildfire. Wildfire, with her witty and sarcastic comments, was the start of many simulated voices to have … well, somewhat of a personality.

With comments like, ‘Great. Now I’m a therapist’ or ‘I’m sorry, I cannot perform that function,’ when cursing at her—you could say that she was the comedic and sarcastic inspiration of today’s most famous voices, like Siri and Alexa.

Yet, despite significant advancements in computer identities over the years, the tech industry is still having a hard time finding a computer personality that will not only listen to our commands and aide us in our times of need, but one that will also be “likable.”

According to another article by The Times, “Creating a Computer Voice That People Like,” tuning synthetic voices to sound more “human-like” and appealing continues to be growing challenge for software designers. This is especially so because so many machines are human-interactive and communicative these days.

Since everyday objects like appliances and toys are now capable of listening to our requests and communicating back, consumers are persistent on having a perfect identity to speak with. They want something that not only sounds more natural and less formulaically creepy, but they also want a personality that speaks correctly.

This is ultimately creating a huge hurdle for developers. As “Creating a Computer Voice That People Like” goes on to explain, “Most software designers acknowledge that they are still faced with crossing the ‘uncanny valley,’ in which voices that are almost human-sounding are actually disturbing or jarring.” Therefore they have the challenge of inflection and sentiment, but they are also faced with the demand of grammar and punctuation of everyday speech.

Alan Black, a professor at Carnegie Mellon University’s Language Technologies Institute, recognizes the battle and agrees that human-like precision has yet to be achieved. He states that despite innovations, “The problem is we don’t have good controls over how we say to these synthesizers, ‘Say this with feeling,’”

Because of this challenge, developers are shifting roles and are taking a more artistic approach to mastering the accuracy of the voices. They are stepping away from operating like scientists and stepping in to acting as artists. Artists who are trying to achieve a Mona Lisa-like perfection.

A voice that is joyful, but not too happy. Something that is smiling, but also a bit serious. A work that is feminine, but also masculine. A personality that is perfectly enigmatic.

Michael Picheny, Senior Manager at Watson Multimodal Lab for IBM Research, said it best, “A good computer-machine interface is a piece of art and should be treated as such.”

Sorry, no related posts found.