Open In App

OpenAI announces DALL-E 3 API, Audio API, and Whisper large-v3

Last Updated : 08 Nov, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

OpenAI, a pioneer in artificial intelligence research, recently hosted its first developer day and unveiled a range of new APIs. These cutting-edge tools are set to revolutionize the way we interact with technology. In this article, we will delve into the details of OpenAI’s latest offerings, including DALL-E 3, the text-to-speech Audio API, and the improved Whisper large-v3 speech recognition model.

text-to-speech-ai

What is DALL-E 3 API?

Bridging the Gap Between Text and Images

OpenAI’s DALL-E 3, a remarkable text-to-image model, is now available through an API. This significant development follows its initial integration with ChatGPT and Bing Chat. The DALL-E 3 API is designed to streamline the process of generating images from text input. It offers various format and quality options, with resolutions ranging from 1024×1024 to 1792×1024. The pricing starts at just $0.04 per generated image.

Comparing DALL-E 3 and DALL-E 2

However, it’s essential to note that DALL-E 3 has certain limitations compared to its predecessor, DALL-E 2. For instance, DALL-E 3 cannot be used to create edited versions of images by replacing specific areas in an existing image or generating variations of an image. When you send a generation request to DALL-E 3, OpenAI mentions that it may automatically rewrite the request “for safety reasons” and “to add more detail.” This could lead to less precise results depending on the input prompt.

Features of Dall-E 3 API

The Power of Moderation

One of the standout features of the DALL-E 3 API is its built-in moderation system, a critical step in preventing misuse. OpenAI has taken lessons from its previous version, DALL-E 2, to ensure that the technology is used responsibly and ethically.

Audio API: Transforming Text into Natural Speech

OpenAI’s Audio API is set to make a significant impact on how we experience audio in applications. This text-to-speech API offers six preset voices, including Alloy, Echo, Fable, Onyx, Nova, and Shimer. Moreover, it provides two generative AI model variants. With a starting price of $0.015 per 1,000 characters, it’s a cost-effective solution for developers.

A Leap Toward Natural Interactions

OpenAI’s Sam Altman highlighted the naturalness of the generated audio, which can greatly enhance user interactions with applications. This API unlocks various use cases, such as language learning and voice assistance, by making interactions more natural and accessible.

Emotional Affect Limitations

While the Audio API brings substantial benefits, it’s important to note that OpenAI does not offer explicit control over the emotional affect of the generated audio. The company acknowledges that “certain factors” may influence how the voices sound, such as capitalization or grammar in the text being read aloud. OpenAI’s internal tests have yielded “mixed results” in this area.

Responsible Usage

OpenAI places great importance on responsible AI usage. Developers using the Audio API are required to inform users that the audio is generated by AI. This transparency is a crucial step toward ethical and informed use of the technology.

Whisper large-v3: Improved Speech Recognition

In a related announcement, OpenAI released the latest version of its open source automatic speech recognition model, Whisper large-v3. This new version is touted to deliver improved performance across different languages and is available on GitHub under a permissive license. It’s a powerful tool for applications that rely on accurate speech recognition.

Conclusion

OpenAI’s developer day brought forth a host of exciting advancements in the world of AI and machine learning. The DALL-E 3 API, the Audio API, and the Whisper large-v3 model each offer unique capabilities and possibilities, shaping the future of AI-driven applications. As developers and users, it’s essential to embrace these innovations responsibly while exploring their potential for enhancing user experiences and interactions.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads