OpenAI has expanded the capabilities of ChatGPT, introducing support for both voice and image inputs alongside traditional text prompts. Commercial users can expect these new features to roll out in the next two weeks, while others will have access at a later date.
Voice Interaction with ChatGPT
ChatGPT now accommodates voice interactions, akin to conversations with standard voice assistants. OpenAI highlights the substantial improvement in response quality due to advancements in its underlying technology. Users can press a button, pose a question verbally, ChatGPT converts it into text, processes it through a robust language model, retrieves an answer, converts it back to speech, and delivers the response audibly.
These extended capabilities in ChatGPT are powered by the Whisper language model, which plays a pivotal role in transforming speech to text and vice versa. OpenAI asserts that this model can generate a human-like voice from text input, even with a speech sample of several seconds. While presently offering five voice options, OpenAI foresees considerable potential for this technology.
However, OpenAI acknowledges the associated risks, such as the potential for cybercriminals to impersonate public figures or commit fraud using synthesized voices. As a result, OpenAI plans to restrict access to the model for specific use cases and partnerships to mitigate these concerns.
Image-Based Requests in ChatGPT
To initiate a conversation with ChatGPT using images, users can simply take a photo or create a visual representation of their query and transmit it to the chatbot. During the interaction, users can utilize text or voice prompts to further elaborate on their requests or narrow down the scope of their inquiries.
Notably, using images as prompts introduces potential issues, particularly concerning inquiries related to identifying individuals in photos. OpenAI addresses this by limiting ChatGPT’s ability to analyze and make definitive statements about people, prioritizing accuracy and privacy.
OpenAI’s Ongoing Evolution
Nearly a year after its initial launch, OpenAI continues to expand the horizons of ChatGPT, striving to address associated challenges and limitations, notes NIX Solutions. The deliberate restraint in the capabilities of new AI models aligns with OpenAI’s commitment to responsible AI development. However, as voice and image interactions gain popularity and ChatGPT evolves into a versatile virtual assistant, the task of maintaining ethical AI usage becomes increasingly complex.