Vibe Coding a Transcription Tool in One Evening
I create courses, and every course needs written material. Lecture notes, chapter text, explanations — it all has to be written down. But here's the thing: I already recorded videos covering everything I want to say. Why should I write it all again from scratch when the content already exists in spoken form?
On top of that, I'm dyslexic. Writing long-form text takes me significantly more time and effort than it does for most people. Speaking, on the other hand, comes naturally. So for me, a tool that turns spoken words into written text isn't just a nice shortcut — it's a game changer.
So I decided to build a tool that transcribes my course videos into text files. I then combine those transcriptions with my lecture notes to create the final written course material. What used to take hours of writing now takes minutes of transcription — and I built the tool itself in a single evening using vibe coding.
The problem
Creating course material is a two-sided job. First you prepare and record the video lectures. Then you need to produce the written version: the chapters, the explanations, the reference material students can read back later. Writing all of that from scratch is a huge amount of work — especially when you've already said everything in the video.
Online transcription services exist, but they cost money per minute of audio, have file size limits, and require uploading your unreleased course content to a third party. For a full course with dozens of videos, that adds up fast. And it's yet another external service in your workflow — upload, wait, download, repeat.
What I wanted was simple: a local tool that lives right next to my project files. I drag in a lecture video, get a text file back in minutes, and that file is already on my machine — ready to combine with my notes. No context switching, no browser tabs, no uploads. Just part of the workflow.
The ingredients: knowing what exists
I didn't write a single line of code myself. But I did bring something crucial to the table: knowledge of what exists.
I knew about OpenAI's Whisper, a state-of-the-art speech recognition model that runs locally. I had actually worked with Whisper before on a previous project, where I wrote the code myself. So I knew it was available on Hugging Face in several sizes — from tiny models that run on any laptop to large models that deliver near-human accuracy. And I knew that the transformers library from Hugging Face makes it straightforward to load and run these models.
That prior experience meant I didn't need to research what was possible — I already knew the building blocks. I just didn't feel like writing all the code again and building a nice GUI around it by hand. I wanted the tool, not the process of making it. That's where vibe coding came in.
That said, this whole project would probably have worked without any prior knowledge either. This isn't a technically hard problem — it's just time-consuming to code up. If you'd sit down with Claude or another AI assistant and simply discuss what you need ("I want to transcribe videos locally"), it would guide you towards Whisper, Hugging Face, and the right libraries. The project is straightforward enough that a conversation about your wishes is all it takes to get started.
The vibe coding process
I opened my AI coding tool and described what I wanted. My initial idea was to build a simple HTML frontend — something I could open in a browser with a drag-and-drop area. But the AI pointed out a problem: browsers don't expose the full file path of dropped files for security reasons. Since the transcription needs to run locally and access the actual files on disk, a web-based UI wouldn't work well.
Instead, it suggested using PyQt6 — a Python framework for building native desktop applications. I'd never used PyQt6 before, but that didn't matter. The AI knew how to use it, and I just needed to describe what I wanted:
"Build me a desktop application with PyQt6. It should let me drag and drop video or audio files, queue them up, and transcribe them using Whisper from Hugging Face. I want to pick different model sizes and chunk lengths. Oh, and dark theme — I'm a coder after all."
That was essentially it. The AI generated the entire application: a Transcriber class that loads Whisper models and processes audio through ffmpeg, a DropZone widget for drag-and-drop, a MainWindow with a file queue, progress indicators, and a polished dark theme.
Did I review every line? No. Did I understand the broad strokes? Yes. I knew what Whisper was, I knew what PyQt6 could do, and I could tell from running the app that it was doing the right thing.
A few rounds of "this doesn't work" and "can you also add X" later, I had a fully working tool.
The result

The tool has everything I need:
- Model selection — seven Whisper variants from Tiny (fastest) to Large V3 Turbo (best quality)
- Drag and drop — just drop video or audio files onto the window
- Batch processing — queue up multiple files and transcribe them one after another
- Chunk length control — configure how the audio is split for processing. I added this so I could tweak when timestamps are generated, which helps me create chapter markers for my videos
- Timestamped output — every transcription includes timestamps, saved as a clean text file
- Fully local — nothing leaves your machine
It supports all common media formats: MP4, MKV, AVI, MOV, WebM, WAV, MP3, FLAC, OGG, and M4A. The tool uses ffmpeg under the hood to extract audio, then feeds it to Whisper in configurable chunks.
A 30-minute lecture? Transcribed in a few minutes on a decent GPU. Even on CPU it works — just takes a bit longer. I drop in a batch of lecture recordings, hit Transcribe, and come back to a folder full of text files ready to be shaped into course chapters.
What made this work
This is exactly the kind of project where vibe coding shines. Let me explain why.
It's a personal tool. I'm the only user. If something breaks, I fix it. There's no team that needs to maintain this code, no customers who depend on uptime, no scaling requirements. The "scar tissue" problem that makes vibe coding dangerous for production software simply doesn't apply here. If you're building production code, you need to be more careful — that's where agentic engineering comes in. But for a personal tool like this, vibe coding does the job perfectly.
The scope is small and well-defined. I need exactly one thing: take a media file in, get a transcription out. There's no authentication, no database, no API, no deployment to the cloud. The entire app does one job.
It slots into a larger workflow. The tool doesn't need to be perfect — it needs to be useful. The transcriptions aren't the final product. They're a raw ingredient that I combine with my lecture notes to produce polished course material. Even if the transcription has a few rough edges, it's infinitely faster than writing everything from scratch.
Domain knowledge matters more than code. The hard part wasn't writing Python. The hard part was knowing that Whisper exists, that it's available in different sizes on Hugging Face, that transformers makes it easy to use, and that ffmpeg can extract audio from any video format. That knowledge came from me — the AI just turned it into working code. With vibe coding, you're essentially the project manager and consultant of your own project, and the AI agent is your dev team. The better you know what you want, the better the result will be.
Vibe coding for the right reasons
On this site, we've written about the limitations of vibe coding and why agentic engineering is the better approach for production software. That still holds. If you're building a product for customers, you need structure, architecture, and understanding of your codebase.
But for personal tools? Vibe coding is a superpower.
Think about all the small annoyances in your workflow. The repetitive tasks. The things you do manually because building a tool "would take too long." With vibe coding, "too long" often means one evening. You don't need to be a programmer — you need to know what's possible and be able to describe what you want.
Some ideas to get you started:
- A tool that renames and organizes files based on rules you define
- A batch image resizer with a simple GUI
- A local search tool for your notes or documents
- A script that monitors a folder and processes new files automatically
All of these are small, personal, well-defined — exactly where vibe coding thrives.
Conclusion
Vibe coding gets a bad reputation, and often for good reason. But the criticism applies to using it for the wrong things. For personal tools with a clear scope, it's genuinely transformative. I went from "writing course material takes forever" to having an automated pipeline — record the lecture, transcribe it, combine it with notes, done.
The key ingredient wasn't programming skill. It was knowing what exists: which models are out there, which libraries make them accessible, and what's possible. That kind of knowledge is something anyone can build up, programmer or not.
And coding with AI doesn't have to stop at personal tools. You can absolutely create production-grade software this way — but that requires a more thorough process and more knowledge. Understanding architecture, making deliberate technology choices, validating each step. That approach has a name: agentic engineering. If you're interested in learning how to go from vibe coding to building real, deployable products with AI, check out our Agentic Engineering classroom course.
So next time you find yourself doing something tedious and repetitive, ask yourself: could an AI build me a tool for this? The answer might surprise you.