Meta engineers presented their latest development – the Voicebox neural network model, which opens up wide opportunities for working with oral speech. This breakthrough in speech AI models has the potential to be applied in various fields.
Generation and editing of oral speech:
Voicebox is able to speak the given text in high quality, creating a natural-sounding voice. It also provides editing of the finished voice recording, allowing you to remove extraneous sounds, such as car horns or barking dogs, while maintaining the content and style of speech.
Removing extraneous sounds and preserving style:
Voicebox has the ability to isolate the voice from ambient noise, improving the intelligibility of recordings. It retains the original manner of speech and style, allowing you to enjoy pure sound.
Spot fix entries:
If necessary, Voicebox can “replay” a fragment of the recording, accurately correcting the mispronounced word or expression. This makes it possible to improve the quality of audio recordings and make them more accurate.
Simultaneous translation and voice transmission:
Voicebox can be used as a simultaneous interpreter, conveying the voice and manner of speaking of the interlocutor. This is especially useful in metaverse applications where the natural sounding voices of virtual assistants and NPCs play an important role.
Model training and potential applications:
The Voicebox model was trained on 50 hours of audiobooks, which was enough to master the skills of oral speech. It is able to create a voice and speech profile from a sample of just two seconds, after which it can reproduce it with any text. This technology can be useful in a variety of scenarios, including creating natural-sounding voices for virtual assistants, non-player characters, and reading letters from visually impaired people with the voices of their authors.
Restrictions and privacy protection:
Meta has yet to disclose information about the materials on which Voicebox was trained, and did not offer public testing of the technology. This is due to the company’s concerns about possible abuse and violations of privacy.
Meta’s Voicebox model represents a groundbreaking step in speech processing, with many possibilities that can be applied in various areas, from virtual assistants to helping the visually impaired, concludes NIX Solutions. It opens up new prospects for the development of artificial intelligence in the field of speech technologies.