OpenAI’s ChatGPT: Now Listens, Speaks and Responds to Images

Sep 25, 2023

OpenAI’s ChatGPT, a popular AI chatbot, has now learned to converse using spoken language, much like Siri and Alexa, marking a significant leap in AI communication.

With this new development, users can engage with ChatGPT through voice interaction, making it more accessible and versatile. The San Francisco-based AI start-up, OpenAI, launched this version of the chatbot recently, pushing the boundaries of AI communication.

In another first, ChatGPT can now respond to images. For instance, users can upload a photo of their refrigerator’s interiors, and the chatbot can suggest potential dishes based on the available ingredients. This innovative feature aims to make ChatGPT more user-friendly and beneficial.

OpenAI has been rapidly expanding its AI tools. It recently unveiled a version of its DALL-E image generator and incorporated it into ChatGPT. Since its launch in November, ChatGPT has drawn hundreds of millions of users and inspired similar services from other companies.

The new bot outperforms its rivals like Google Bard while also challenging long-standing technologies like Alexa and Siri. These digital assistants have traditionally facilitated interactions with devices through voice. However, newer chatbots like ChatGPT and Google Bard boast superior language skills, enabling them to generate emails, write poetry, and discuss almost any topic instantly.

OpenAI’s latest offering effectively merges these two communication methods. The company sees speaking as a more intuitive way to interact with its chatbot. It claims that ChatGPT’s synthetic voices, available in five different options, outshine those used with popular digital assistants.

The new chatbot will be available to all subscribers of ChatGPT Plus, a service costing $20 a month, within the next fortnight. However, the bot can only respond with voice when used on iPhones, iPads, and Android devices.

While ChatGPT’s voice interface may remind users of earlier assistants, the technology powering it is fundamentally different. It is primarily driven by a large language model (LLM) that generates language by analyzing vast amounts of text from across the internet.

ChatGPT can respond to virtually any question in seconds, contrasting with older digital assistants like Alexa and Siri, which could only perform a limited number of tasks or answer a finite list of programmed questions.

As OpenAI evolves ChatGPT into something akin to Alexa or Siri, companies like Amazon and Apple are transforming their digital assistants to resemble ChatGPT.

Amazon recently previewed an updated Alexa system aiming for more fluid conversation on “any topic,” partly driven by a new LLM. Meanwhile, Apple has been testing a prototype of its LLM for future products, according to insiders.

The new ChatGPT can also respond to images when used via the web as well as on iPhone, iPad, and Android devices. This feature could prove invaluable for visually impaired users.

OpenAI initially demonstrated this image tool in spring but delayed its public release until they better understood its potential misuse. For instance, there were concerns that it could serve as a face recognition service used to identify people in photos quickly.

Despite these strides, the bot still has areas for improvement. For instance, it can grapple with homonyms, but it can correct itself, demonstrating the advanced learning capabilities of the bot.

In conclusion, OpenAI’s latest version of ChatGPT marks a significant leap in AI communication, offering enhanced user interaction and versatility. As AI continues to evolve, it will be interesting to see how giants like Amazon and Apple respond to these advancements.