We have generated an audio news summary by blending human and synthetic voices
The smart speaker is reinvigorating the news summary format common in broadcast with quick headlines and context consumable in a compact package. We at the Lab are experimenting with synthesising such a bulletin, designed for Google Assistant and based on existing Guardian journalism and curation.
You can check it out on Google Home devices or through the Google Assistant on your smartphone by saying, “Hey Google, talk to the Guardian Briefing.”
Although we make in-depth podcasts, the Guardian does not produce anything appropriate for this format in audio. We do, however, create visual, predominantly text-based packages in the form of newsletters and morning briefings. The Lab is attempting to create this well-understood audio summary content by blending human and synthetic voices, harnessing the power of text-to-speech technology on the Assistant platform.
Smart speakers create new habits around old formats
The news bulletin is almost as old as radio itself. However, research shows smart speakers are rejuvenating this format. According to a 2018 Voicebot.ai study, 63% of smart speaker owners use the device daily and 34% have multiple interactions per day, creating new habits.
When users incorporate news into their daily lives, they are often looking for specific lengths of content. More than half of smart speaker owners want the latest news on a regular basis, according to the Smart audio report by NPR and Edison Research, but many would prefer shorter formats.
Demand for short up-to-date bulletins is highest at the start and end of the day. The Future of voice report by Reuters suggests the majority of news usage is in the mornings, where fresh routines are emerging, and last thing at night. Regular listeners of news updates say they like the brevity, the control and the focus.
Generating an audio bulletin
While traditional broadcasters have a range of products ready to plug into these slots, we had to think about how we might create a suitable package of content. There are services available to help adapt content for smart speakers, and of course it’s also possible to have someone record a regular update. While we think these are viable options, we were excited by the possibility of creating a new audio product for the Guardian through automation by combining rich audio and text-to-speech technology.
We began by looking at Morning Briefing content to leverage our journalism and curation, rather than simply grabbing the headlines. Through daily iteration, the team crafted a set of rules to structure that content by combing headlines with supporting text. This newly structured data was then inserted into an SSML template alongside rich audio and blended with music.
Tuning the template was an exercise in sound design, as our editorial lead tweaked the speed, prosody, and experimented with variations of the Assistant’s voice, including the new Wavenet-based options.
While our research shows prolonged interactions with a synthetic voice is taxing and less pleasant than listening to a human voice, harmonising them creates a more congenial aesthetic. The Guardian Briefing attempts to utilise the best of rich audio and text-to-speech. Relying on automation and the synthetic voice has drawbacks in terms of quality control and aesthetics. Yet we were impressed by the speed and flexibility of the text-to-speech approach.
As this product is about filling slots created by new habits, we will be using retention as our key metric. How often do users come back? Do they continue to follow the Briefing over time?
In the coming weeks, we will be examining this data. We will also try to improve the content of the Briefing by adding localisation options for the US and Australia as well as exploring visual expressions through multimodal design.
Give it a try and let us know what you think.
Find out more about the Voice Lab’s mission or get in touch at firstname.lastname@example.org.