OpenAI has started deploying the vision feature announced over six months ago, enabling ChatGPT to analyze video from a camera or screen in real-time. This new capability allows users to interact with the chatbot in Advanced Voice Mode by showing it their surroundings via their smartphone camera or sharing their computer screen.
Currently, the vision feature is available only to ChatGPT Plus, Team, and Pro subscribers, but it’s not yet accessible in all countries. According to OpenAI, “All Team users and most Plus and Pro users will get access over the next week in a new version of the ChatGPT mobile app. We will make this feature available to Plus and Pro users in the EU, Switzerland, Iceland, Norway, and Liechtenstein as soon as possible. Enterprise and Edu users will get access in early 2025.”
How the Vision Feature Works
To activate the vision feature, users can open ChatGPT’s voice mode by clicking the button at the bottom of the app interface, followed by the camera button to start broadcasting video. The AI then analyzes the visual input to respond to user queries based on the information it sees.
In a demonstration shared on X (formerly Twitter), the development team showcased the feature’s capabilities with a simple task. The developer used the camera to show ChatGPT his colleagues, who introduced themselves one by one. The chatbot was then asked to identify the colleague wearing reindeer antlers and the one with a Santa hat. ChatGPT successfully completed the task.
Competitors and Future Updates
OpenAI isn’t alone in exploring real-time video analysis for AI, adds NIX Solutions. Google recently introduced Project Astra, a similar feature, to a group of testers, while Meta is reportedly working on comparable functionality for its AI tools.
As OpenAI continues to expand the availability of this feature, we’ll keep you updated on new integrations and regional rollouts. Stay tuned for further updates on this groundbreaking technology.