Running Python on a serverless GPU instance for machine learning inference

I was experimenting with some speech-to-text work using OpenAI’s Whisper models today, and transcribing a 15-minute audio file with Whisper tiny model on AWS Lambda (3 vcpu) took 120 seconds. I was curious how faster this could be if I ran the same transcription model on a GPU instance, and with a quick search, modal.com seemed like a nice option to spin up a GPU machine, run the code, and shut down the machine, similar to how AWS Lambda works. ...

2024-04-22 · 5 min · Saeed Esmaili

Exploring OpenAI's Whisper with Non-English Voices

TL;DR: Whisper.cpp is the fastest when you’re trying to use the large Whisper model on a Mac. For top-quality results with languages other than English, I recommend to ask model to translate into English. About Whisper Whisper is OpenAI’s speech-to-text model and it’s well-known for its impressive results. Although I knew about it for a while, I didn’t get to test its real-world performance until recently. So, I spent a weekend seeing how it could handle converting speeches, in both English and other languages, into text. ...

2023-06-21 · 6 min · Saeed Esmaili