New version of ChatGPT with voice and image

ChatGPT now includes voice and visual recognition - find out how this changes everything!

Home » News » Open AI - ChatGPT » New version of ChatGPT with voice and image

OpenAI extends ChatGPT’s capabilities to include voice and video

OpenAI unveils new voice and visual features for ChatGPT, offering users a more intuitive interface. These improvements make it possible to start a voice conversation or show ChatGPT directly what you’re talking about.

Voice and image, two new dimensions for ChatGPT

These new features open up a host of new ways to use ChatGPT on a daily basis. For example, when traveling, you can take a photo of a monument and have a real-time conversation about its history or significance. Back at home, a photograph of your fridge can help you define your dinner menu, while getting recipe suggestions. Even to help a child with a math problem, just take a photo, highlight the problem, and ChatGPT will provide tips.

Voice and image integration for ChatGPT Plus and Enterprise users

OpenAI plans to roll out these new voice and visual capabilities for Plus and Enterprise users in the coming weeks. Voice functionality will be available on iOS and Android (activation required in settings), while images will be accessible on all platforms.

Start voice conversations with ChatGPT

ChatGPT’s new voice capability, powered by an innovative text-to-speech model, enables smooth dialogues with the assistant. To activate this feature, users can go to the mobile application settings and enable voice conversations. The text-to-speech model is capable of generating realistic audio from simple texts and a few seconds of voice samples. Whisper, OpenAI’s open-source speech recognition system, is used to transcribe speech into text.

Discover the world with ChatGPT’s visual aid

Users can now show one or more images to ChatGPT for a variety of uses, whether to troubleshoot a device, explore the contents of their fridge or analyze a complex graph for business data. To get started, simply press the photo button to capture or select an image. This understanding of images is made possible by OpenAI’s GPT-3.5 and GPT-4 multimodal models, which apply their linguistic skills to a wide range of images, such as photographs, screenshots and documents containing both text and images.

Gradual deployment with a focus on safety

OpenAI aims to build General Artificial Intelligence (GAI) that is both safe and beneficial. They believe in a progressive roll-out of their tools, enabling them to perfect and refine risk mitigation measures over time. This strategy is all the more crucial with advanced models integrating voice and vision.

The challenges of new voice technology

While voice technology offers many creative and accessibility-oriented opportunities, it also presents new challenges, such as the possibility for malicious actors to impersonate public figures or commit fraud. To limit this risk, the technology is used for a specific use case: voice chat. Partnerships are also being established to use this technology in other areas, such as with Spotify for their voice translation function.

ChatGPT vision: useful and secure

Like other ChatGPT features, vision is designed to assist users in their daily lives. Technical measures have also been taken to limit ChatGPT’s ability to analyze and make direct statements about individuals, as ChatGPT is not always accurate and such systems must respect individual privacy.

Transparency regarding the limits of the model

Users may rely on ChatGPT for specialized topics, such as research. OpenAI is transparent about the limits of the model and discourages high-risk use cases without proper verification. The model is competent at transcribing English text, but has shortcomings with some other languages, especially those with a non-Roman script. OpenAI therefore advises its non-English-speaking users not to use ChatGPT for this purpose.

GPT Génie adapts its system

GPT Génie plans to adapt its system so that subscribers can benefit from the latest innovations as soon as the technology is open to developers.

0 Commentaires

Newest

Oldest Most Voted

Inline Feedbacks

View all comments

New version of ChatGPT with voice and image

OpenAI extends ChatGPT’s capabilities to include voice and video

Voice and image, two new dimensions for ChatGPT

Voice and image integration for ChatGPT Plus and Enterprise users

Start voice conversations with ChatGPT

Discover the world with ChatGPT’s visual aid

Gradual deployment with a focus on safety

The challenges of new voice technology

ChatGPT vision: useful and secure

Transparency regarding the limits of the model

GPT Génie adapts its system

Dans cet article

Stay informed!

About us

Affiliation

AI tool categories

Latest AI tools

Dall-E Prompts Generator

Optimizing your CV with AI

Writing a PAS Landing Page

Writing an AIDA Landing Page

Create a Facebook ad

Anti-waste recipe

Feature/benefit converter

Latest news

GPT-4o free or paid?

How is Chatgpt powered?

What can you do with ChatGPT?

6 Tips for Getting Started with Dall-E

How does Dall-E work?

What are the differences between AI and machine learning?

New version of ChatGPT with voice and image

How do I access ChatGPT-4?

What are the ethical issues raised by AI?