OpenAI has published an update on its Voice Engine, a text-to-speech (TTS) model capable of generating human-like audio from text using a 15-second sample of speech. The Voice Engine employs a diffusion process to match the speaker's voice, starting with random noise. This technology, which was demonstrated to global policymakers last summer, has raised concerns about safety and potential misuse, particularly related to the Sky voice. OpenAI's existing collection of TTS voices, built using professional voice actors, also uses a 15-second sample to define each voice. The company has been addressing these concerns through ongoing safety research. OpenAI had a voice engine in 2022 and has been transparent about its development.
OpenAI seems awfully defensive about its AI voice engine https://t.co/qsempjVxQ4
OpenAI: "starting last summer we showed global policymakers at the highest levels [Voice Engine's] potential and discussed the associated risks with them." Voice Engine is not GPT-4o Voice. It's a TTS model for generating human-like audio using text and 15 sec of sample speech.
New OpenAI Voice Engine Blog Some highlights: - "The voice capability is powered by a text-to-speech (TTS) model, capable of generating human-like audio from just text and 15-seconds of sample speech." - "it employs a diffusion process, starting with random noise and… https://t.co/jnDJ8B8iaf
OpenAI had voice engine in 2022, here's their statement on it https://t.co/uXcZlW7qDd https://t.co/SPanwANvAW
Surprising detail in this new article about OpenAI's Voice Engine: apparently their existing collection of TTS voices, built using professional voice actors, still only use a 15 second sample from each of those professionals to define each voice! https://t.co/xXpykjVFRB
🔥 Unleash the power of #LargeLanguageModels to transform text into spoken magic! 🎙️ #NaturalLanguageProcessing #SpeechRecognition #AI🧵👇 https://t.co/73aDgIN0r1
OpenAI published an update on how Voice Engine works and their safety research, likely to address concerns related to the Sky voice - The Voice Engine generates human-like audio from text using a 15-second sample of speech and employs a diffusion process to match the speaker's… https://t.co/wm1SvOSTYK https://t.co/FiBaZa4NfJ
The, Um, Psychology of, Like, AI-Generated Voices; Effective Altruists Strike Back at OpenAI — The Information https://t.co/vo4tWP57PC