On January 1st, 2009, I founded Cloud Genius with a goal to help people like you realize their dream to gain hands-on skills.
Over the past 14+ years, 1760 people with 15 nationalities from 4 continents successfully completed our programs to up-skill themselves. People like you have designed, built and deployed production grade cloud services and accomplished their career goals. This is what makes me super-proud.
This year, I plan to use Open AI whisper to transcribe all my presentations recorded over the last 14+ years since starting Cloud Genius.
I plan to post my transcription results from OpenAI whisper on this website. You will receive my updates as I run the machine learning models.
Whisper is an open source tool from Open AI. As an easy test, I am going to feed Whisper a test video of a native speaker that teaches how to enunciate. This should be an easy one for whisper to transcribe.
So I install
yt-dlp to make it easy to fetch videos locally as needed and install
FFmpeg for A/V processing.
sudo apt update && sudo apt install -y yt-dlp ffmpeg
Then, I use my handy script to set up conda.
echo Intel CPU assumed echo Using my preferred $HOME/miniconda install location rm -rf $HOME/miniconda ~/miniconda.sh wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O ~/miniconda.sh bash ~/miniconda.sh -b -p $HOME/miniconda rm -rf ~/miniconda.sh source $HOME/miniconda/bin/activate conda init bash conda config --set auto_activate_base false conda update -n base -c defaults conda -y conda update --all -y
To get me a clean python environment for whisper related work, I set up a new
conda env named
python 3.9 and activate it for use with whisper. Finally, I install whisper using
pip in that clean conda environment.
conda activate base conda create --name w python=3.9 -y conda activate w pip install git+https://github.com/openai/whisper.git
Now, my machine is ready to run whisper and help me transcribe whatever I require. I download that example video and asked whisper to transcribe it. I chose to run whisper with
medium model and asked whisper to
assume English as the spoken language. Otherwise, whisper spends a few seconds detecting the language being spoken. Yes, whisper supports many languages. Look at the related GitHub page for its complete capabilities.
[00:00.000 --> 00:06.320] How to Enunciate. [00:06.320 --> 00:10.000] Want to get the attention, the respect, and even the dates you've been missing out on? [00:10.000 --> 00:11.760] You can start by speaking clearly. [00:11.760 --> 00:20.160] You will need Mirror Voice recorder Cork or pencil and sense of humor. [00:20.160 --> 00:21.160] Step 1. [00:21.160 --> 00:24.960] Stand in front of the mirror and pretend you're having a conversation with a friend. [00:24.960 --> 00:29.280] It's much easier to identify the places where you slur if you watch yourself speak. [00:29.280 --> 00:30.280] Step 2. [00:30.280 --> 00:35.080] Stretch your face as wide as it will go, and then scrunch it up as small as you can. [00:35.080 --> 00:38.000] Move your jaw from side to side and back and forth. [00:38.000 --> 00:41.440] Stick your tongue out as far as it will go, and retract it. [00:41.440 --> 00:43.480] Repeat these steps several times. [00:43.480 --> 00:47.600] Stretching your face, jaw, and tongue makes it easier to form words clearly. [00:47.600 --> 00:48.760] Step 3. [00:48.760 --> 00:53.200] Stand in front of the mirror and repeat vocal exercises that'll help you loosen your tongue, [00:53.200 --> 00:54.200] lips, and jaw. [00:54.200 --> 00:59.520] Try to make every sound distinct, emphasizing both consonants and vowels. [00:59.520 --> 01:01.680] Say and repeat these clearly. [01:01.680 --> 01:12.280] B-b-b, w-w-w, b-b-b, w-w-w, p-p-p, f-f-f, p-p-p, f-f-f, gutta-butta, gutta-butta. [01:12.280 --> 01:14.560] Red leather, yellow leather. [01:14.560 --> 01:15.840] Step 4. [01:15.840 --> 01:20.180] Repeat tongue twisters slowly and deliberately to yourself in the mirror, and make sure that [01:20.180 --> 01:22.960] you can hear each separate consonant and syllable. [01:22.960 --> 01:27.480] Over time, say the phrases faster and faster, making sure you can still hear each part of [01:27.480 --> 01:29.320] the word clearly. [01:29.320 --> 01:31.080] Say and repeat these clearly. [01:31.080 --> 01:34.160] A noisy noise annoys an oyster. [01:34.160 --> 01:36.080] Lovely lemon liniment. [01:36.080 --> 01:39.080] Twelve twins twirled twelve twigs. [01:39.080 --> 01:40.080] Step 5. [01:40.080 --> 01:43.000] Record yourself reading a paragraph from your favorite book. [01:43.000 --> 01:48.380] Now gently hold a pencil or the small end of a cork just behind your front teeth. [01:48.380 --> 01:53.680] Carefully read the paragraph aloud several times, making every letter as clear as possible. [01:53.680 --> 01:57.680] Remove the cork or pencil and record yourself reading the paragraph again. [01:57.680 --> 02:00.720] The second recording will be much clearer. [02:00.720 --> 02:01.840] Step 6. [02:01.840 --> 02:06.120] Focus on making consonants and syllables as clear as possible, since they provide the [02:06.120 --> 02:09.080] most structure in words and sentences. [02:09.080 --> 02:10.400] Step 7. [02:10.400 --> 02:13.880] Take ten minutes each day to repeat your speech exercises. [02:13.880 --> 02:17.260] You may look silly as you talk to the mirror, but you'll sound great when you're speaking [02:17.260 --> 02:18.260] in public. [02:18.260 --> 02:23.640] Did you know As an aspiring actress, Kathleen Turner perfected her diction by biting down [02:23.640 --> 02:51.280] on pencil erasers while practicing speech exercises.
SRT and VTT formats are suitable for adding time-synchronized closed captioning.