OpenAI Launches an API for ChatGPT and its Whisper speech-to-text tech

OpenAI has announced that third-party developers may now incorporate ChatGPT into their apps and services using a newly available Whisper API. This tool will be substantially less expensive than using its current language models.

The Whisper API is a hosted version of the company’s open-source Whisper speech-to-text paradigm, which was launched in September 2022. Whisper is an automatic speech recognition technique that allows OpenAI to claim large-scale transcription in several languages for $0.006 per minute. It accepts a variety of file types, including M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM.

Despite the fact that competitors like Google, Amazon, and Meta have created high-quality speech recognition algorithms, Whisper outperforms them, having been trained on 680,000 hours of multilingual and “multitask” data acquired from the web. According to Greg Brockman, the president and chairman of OpenAI, this allows it to recognize individual accents, background noise, and technical jargon.

“We released a model, but that actually was not enough to cause the whole developer ecosystem to build around it,” Brockman said in a video call with TechCrunch yesterday afternoon. “The Whisper API is the same large model that you can get open source, but we’ve optimized to the extreme. It’s much, much faster and extremely convenient.”

In response to Brockman’s statement, there are restrictions to businesses implementing voice transcription technology. This is supported by the 2020 Statista poll, which cites accuracy, accent- or dialect-related identification challenges, and cost as the primary impediments to implementing technology such as tech-to-speech.

One of Whisper’s limitations is its “next-word” prediction. This is owing to the massive amount of data trained with the method. Nevertheless, OpenAI warns that Whisper’s transcriptions may include words that were not said, possibly because it is attempting to anticipate the next word in the audio as well as transcribe the audio recording itself.

Furthermore, Whisper’s performance varies by language, with speakers of less well-represented languages in the training set suffering a higher mistake rate.

Expanding on the previous remark, even the best systems are flawed by prejudices, with a 2020 Stanford study finding that systems from Amazon, Apple, Google, IBM, and Microsoft produced considerably less errors, roughly 19%, with white users than with black users.

Like this:

Related Posts

Leave a ReplyCancel reply

How Africa Became One of the World’s Most Mobile-First Betting Markets

Africa’s evolving gaming regulations and their impact on global betting operators

Are Fintech Startups Helping the African iGaming Experience via Improved Payment Options?

Lesotho Earns over $300 Million Annually by Supplying Water to South Africa

Afreximbank Underwrites $2.5bn of $4bn Loan for Dangote Refinery

Top 10 African Countries With The Highest Diesel Prices in March 2026

University of South Africa Becomes First African University to Own an Airport

How Africa Became One of the World’s Most Mobile-First Betting Markets

Are Fintech Startups Helping the African iGaming Experience via Improved Payment Options?

Scaling for Peak Moments: The Tech and Hidden Side of Live Betting Platforms

Khaby Lame: How the World’s Biggest TikToker Turned Silent Videos Into a $975 Million Deal