OpenAI’s transcription tool Whisper is under scrutiny for producing hallucinations and inaccuracies, raising concerns among industry experts, particularly in medical settings.
In a recent development within the tech industry, OpenAI’s artificial intelligence transcription tool, Whisper, has attracted attention due to its supposed inaccuracies. While OpenAI has promoted Whisper as possessing nearly “human-level robustness and accuracy”, reports from experts in the field have highlighted significant issues with the tool.
Whisper has been found to exhibit a tendency to produce hallucinations—fabricated content where it invents non-existent text that can range widely from racial commentary to erroneous medical information. This is particularly concerning as Whisper is extensively used in various sectors across the globe, serving roles such as translating interviews, transcribing audio to text, and generating subtitles for video content.
Particularly alarming is the adoption of Whisper-based systems within medical settings, where some health centres are using it to transcribe doctor-patient consultations. OpenAI itself has cautioned against using Whisper in “high-risk domains”, which include the medical field, due to its potential for generating inaccurate outcomes.
Despite these issues, Whisper is being applied widely. It has been integrated into OpenAI’s popular ChatGPT bot and used by major cloud computing platforms from Oracle and Microsoft, inevitably affecting myriad companies worldwide. Within the last month, Whisper’s popularity was demonstrated by its download frequency, with over 4.2 million downloads from HuggingFace, a well-known open-source AI platform.
A study led by a University of Michigan researcher underscored the extent of the hallucination issue, revealing that eight out of every ten transcriptions examined contained fabrications. Furthermore, a machine learning engineer reported similar findings, noting hallucinations in about half of over 100 hours of analyzed transcriptions. Another developer cited the presence of hallucinatory content in nearly all of the 26,000 transcripts he had created.
Additional research by Professors Allison Koenecke and Mona Sloane examined thousands of audio snippets, finding nearly 40% of hallucinations potentially harmful due to the risk of misinterpretation or misrepresentation of the speaker. Examples highlighted transcriptions erroneously morphing benign statements into violent narratives or adding unspoken racial attributes.
Whisper’s integration into critical services has sparked calls from experts and advocates for more stringent AI regulation, with suggestions for OpenAI to prioritize the resolution of this flaw. William Saunders, a former OpenAI engineer, expressed concern over the company’s direction and stressed the urgency of addressing the overconfidence in Whisper’s capabilities.
In the healthcare sector, the use of Whisper-based tools has raised particular privacy concerns. Hospitals such as Mankato Clinic and Children’s Hospital Los Angeles in the USA have implemented systems derived from Whisper to assist in transcribing medical consultations. Nabla, the company providing these tools, has adapted Whisper for medical terminology but acknowledges the technology’s limitations and is working on mitigating hallucinations.
The issue of audio deletion after transcription, as practiced by Nabla for data safety, has also been brought into question. It could complicate error verification if the original recordings are unavailable. This practice highlights concerns among medical professionals who may not be able to cross-check the generated transcripts against the original recordings.
Privacy concerns extend beyond transcription accuracy, touching upon the sharing of sensitive patient data with external vendors like Microsoft Azure, as reported by California lawmaker Rebecca Bauer-Kahan. She drew attention to the potential risks of sharing personal medical information with for-profit entities without adequate oversight.
While OpenAI continues its efforts to improve Whisper by incorporating such findings into model updates, the dialogue around AI’s place in sensitive fields like healthcare intensifies, advancing the conversation on ethical AI deployment and regulation.
Source: Noah Wire Services












