This is JT.

JT is a writer and standup comedian. He also has cerebral palsy.
I met JT in Presuming Ed’s, via our mutual friend Scott, the sweariest man in Brighton. Like many able-bodied, I’d not spent much time actually hanging out with significantly disabled people. My attitude, I suspect, is similar to most: an abstract empathy for their condition, and a quick sense of relief – “there but for the grace of God”.
JT is a very funny man, trapped inside a malfunctioning body. One of his creative activities is making hilariously dark social media videos (he has 99k followers on TikTok).
Aside from the physical disability which keeps him mainly confined to a wheelchair, his condition makes speaking very arduous, and, for the listener, quite difficult to understand. As a result, video subtitles are essential.
Creating subtitles with cerebral palsy is, as you would expect, not a simple task. For the normally voiced, adding subs is a relatively simple process using AI text-to-speech systems, perhaps with a little minor editing of the text when it gets it wrong. JT, however, does not have a ‘normal’ voice, and so editing the text resulting from transcription is non-trivial.
This is a classic example of where AI falls down – it’s excellent at working with the middle of the bell-curve, but rubbish at the edges.
So as ‘someone who knows about this stuff’, JT asked me if I could help.
On the face of it, the problem seems tractable – JT speaks English, it’s just tricky to understand – surely an existing speech-to-text model could be persuaded to understand him, given the right kind of prodding?

Geek Stuff
There are many open weight speech-to-text AI models around, and one of the highest-rated is Whisper from OpenAI. It comes in various sizes and capabilities and seemed like a good starting point for fine-tuning.
While the default model performs very poorly on JT’s speech, the hope was that it would provide enough ground-knowledge of English speech that it could be retrained on his voice.
I decided the best approach would be to create a fine-tuning dataset of JT’s speech which, thanks to his prolific social media activities, was available in video and .SRT subtitle form (at least on YouTube).
Although there is not a lot of data, the hope was that it would be enough.
Sadly, after a few attempts at training various sized Whisper models, it was still pretty rubbish.
I was discussing this problem with Matt Franglen, and he pointed me to an article about transcription errors in the medical field , where Whisper was found to hallucinate speech when there were pauses in the audio. He suggested that perhaps the cadence of JT’s speech was part of the problem – the spaces between words are larger than normal speech, a big problem for a time-based model such as Whisper. His suggestion was to remove these pauses, and finetune Whisper on the compressed audio.
This was the breakthrough. By removing the inter-word silences, Whisper performed much more accurately. I then ran a fine-tune of the medium sized version of Whisper using silence-stripped versions of JT speech.
Testing the models was tricky, given the relatively small amount of audio I had to work with, so the obvious solution was to build a web interface and let JT use the various models himself.

Building a web interface, which can handle multiple audio formats and output both plain text and correctly timestamped subtitle files would normally be a relatively straightforward, if time-consuming coding problem. With the assistance of another AI (in the form of Claude Code), though, it was a breeze.
After some testing, we discovered that the model trained on JT’s speech was ok, but the default ‘large’ version of Whisper actually performed much better.
It seems that the larger model has a greater tolerance for accented speech, and was thus able to produce more accurate transcriptions.
From this finding I was left with the challenge of fine-tuning a Whisper-large model, which it turns out is beyond the capabilities of my GPU. So I tried creating a LoRA (in simple terms, a smaller fine-tuned network that can be ‘bolted on’ to a larger network) which learned the nuances of JTs voice, without the overhead of changing billions of weights.
The resulting model is by no means perfect, it still gets some words wrong, but it’s 90% there. Practically, it now means JT can create and edit a subtitle file, on his own, in about 15 minutes – a task that used to take 3-4 hours, with the help of ‘an able’ (as JT likes to refer to us).

(This project remains a ‘work in progress’, as we gather more data I will continue training the model – let’s see if we can make it to 99%…)
After reading this article, JT emailed me:
“Like most disabled people I want to be ‘able’ but when most disabled people say that they actually mean ‘able-bodied’. I’ve always been on a quest to be as able as i can through problem solving and using tools to aid me. For example: can anyone fly? No. So we made airplanes and now can, but not literally fly, we use a tool to do that. Same with walking and my wheelchair. I’m able to get from A to B on my own because of the technology that’s available. Less than 100 years ago I wouldn’t have been able to. Speech and being understood is probably my biggest problem at this point. AI is already giving me the ability to be understood more easily. This means for me, my disability is effectively becoming less severe. Let’s see how able this can make me before I get cancelled! “
Coda
As I write this in early 2026, much of culture is ‘against AI’, (a topic I have discussed at length elsewhere), so I felt compelled to write about this project as a demonstrably non-evil use of AI.
I used multiple kinds of AI technology to create this app – an app with a single purpose for a single user. Historically, the time and costs involved in developing this kind of thing would be considerable. Now a single person can create and deploy a bespoke AI model and associated processing in a matter of days.
That feels like something genuinely ‘disruptive’
Just remember: Not all AI is evil
(but, unfortunately, some of its wealthiest proponents are…)
You can find JT on TikTok, Instagram, YouTube, Medium and causing trouble around town in Brighton.
