OpenAI extends ChatGPT’s capabilities to include voice and video
OpenAI unveils new voice and visual features for ChatGPT, offering users a more intuitive interface. These improvements make it possible to start a voice conversation or show ChatGPT directly what you’re talking about.
Voice and image, two new dimensions for ChatGPT
These new features open up a host of new ways to use ChatGPT on a daily basis. For example, when traveling, you can take a photo of a monument and have a real-time conversation about its history or significance. Back at home, a photograph of your fridge can help you define your dinner menu, while getting recipe suggestions. Even to help a child with a math problem, just take a photo, highlight the problem, and ChatGPT will provide tips.
Voice and image integration for ChatGPT Plus and Enterprise users
OpenAI plans to roll out these new voice and visual capabilities for Plus and Enterprise users in the coming weeks. Voice functionality will be available on iOS and Android (activation required in settings), while images will be accessible on all platforms.
Start voice conversations with ChatGPT
ChatGPT’s new voice capability, powered by an innovative text-to-speech model, enables smooth dialogues with the assistant. To activate this feature, users can go to the mobile application settings and enable voice conversations. The text-to-speech model is capable of generating realistic audio from simple texts and a few seconds of voice samples. Whisper, OpenAI’s open-source speech recognition system, is used to transcribe speech into text.
Discover the world with ChatGPT’s visual aid
Users can now show one or more images to ChatGPT for a variety of uses, whether to troubleshoot a device, explore the contents of their fridge or analyze a complex graph for business data. To get started, simply press the photo button to capture or select an image. This understanding of images is made possible by OpenAI’s GPT-3.5 and GPT-4 multimodal models, which apply their linguistic skills to a wide range of images, such as photographs, screenshots and documents containing both text and images.
Gradual deployment with a focus on safety
OpenAI aims to build General Artificial Intelligence (GAI) that is both safe and beneficial. They believe in a progressive roll-out of their tools, enabling them to perfect and refine risk mitigation measures over time. This strategy is all the more crucial with advanced models integrating voice and vision.
The challenges of new voice technology
While voice technology offers many creative and accessibility-oriented opportunities, it also presents new challenges, such as the possibility for malicious actors to impersonate public figures or commit fraud. To limit this risk, the technology is used for a specific use case: voice chat. Partnerships are also being established to use this technology in other areas, such as with Spotify for their voice translation function.
ChatGPT vision: useful and secure
Like other ChatGPT features, vision is designed to assist users in their daily lives. Technical measures have also been taken to limit ChatGPT’s ability to analyze and make direct statements about individuals, as ChatGPT is not always accurate and such systems must respect individual privacy.
Transparency regarding the limits of the model
Users may rely on ChatGPT for specialized topics, such as research. OpenAI is transparent about the limits of the model and discourages high-risk use cases without proper verification. The model is competent at transcribing English text, but has shortcomings with some other languages, especially those with a non-Roman script. OpenAI therefore advises its non-English-speaking users not to use ChatGPT for this purpose.
GPT Génie adapts its system
GPT Génie plans to adapt its system so that subscribers can benefit from the latest innovations as soon as the technology is open to developers.