NIX Solutions: OpenAI Launches ChatGPT Voice Interface

OpenAI has begun rolling out the Advanced Voice Mode voice interface for the ChatGPT service — a small number of ChatGPT Plus subscribers have been granted access to hyper-realistic dialogues with GPT-4o. The developer promised that by autumn, all paid subscription holders will be able to use the new feature. We’ll keep you updated on the progress of this rollout.

NIX Solutions

OpenAI first demonstrated the GPT-4o voice format in May — then this feature amazed the audience not only with its ability to give quick answers, but also with the similarity of one of the voices to the voice of Scarlett Johansson. The actress said that she refused the company’s head Sam Altman the right to use her voice for these purposes; after which she had to turn to lawyers to protect her interests, and OpenAI abandoned its intentions in order to avoid a conflict. In June, the company announced that it would postpone the release of the voice interface to complete the development of security measures.

Features and Limitations of the Alpha Testing Phase

Previously announced AI assistant features, such as video support and screen sharing, will not be available during the alpha testing stage, but will appear “later”. For now, users will have to limit themselves to voice interaction. Previously, OpenAI connected three AI models to implement this feature: one for converting voice to text, the second (GPT-4) for the actual processing of requests, and the third for converting the ChatGPT text response to voice. The updated GPT-4o is multimodal – it solves all these tasks independently, ensuring minimal latency. The model is also able to recognize emotional intonations in the user’s voice, determining, for example, sadness or excitement, it also knows when a person is singing, adds NIX Solutions.

OpenAI will roll out the ChatGPT voice interface gradually to closely monitor how it is used in reality. Users included in the alpha testing group will receive a notification through the ChatGPT app, followed by an email with instructions on how to work with the new features. GPT-4o’s voice capabilities have now been tested by more than a hundred members of the closed Red Team, speaking 45 languages.

ChatGPT’s voice mode will be limited to four voices: Juniper, Breeze, Cove, and Ember, which were created with the help of actors. The company has excluded the voice of Sky, which has been compared to Scarlett Johansson. OpenAI also said it has installed filters to block requests to create music and other materials that may be protected by copyright – for startups Suno and Udio, this has resulted in lawsuits from major music publishers.