aboutsummaryrefslogtreecommitdiffstats
path: root/subtitle-podcast/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'subtitle-podcast/README.md')
-rw-r--r--subtitle-podcast/README.md143
1 files changed, 143 insertions, 0 deletions
diff --git a/subtitle-podcast/README.md b/subtitle-podcast/README.md
new file mode 100644
index 0000000..16efb17
--- /dev/null
+++ b/subtitle-podcast/README.md
@@ -0,0 +1,143 @@
+# subtitle-podcast.ps1
+
+This script takes an audio file as input and generates a subtitle track,
+a static image, and combines them to create a subtitled video of the
+audio. I originally made this to transcribe recorded tabletop games for
+a friend to allow for easy re-reading.
+
+Transcription relies entirely on the WhisperX project which uses an
+Automatic Speech Regonition (ASR) model original developed by OpenAI. If
+you like this script, please consider supporting the WhisperX project
+with a donation!
+
+https://github.com/m-bain/whisperX
+
+## Requirements
+- Powershell
+- Python 3.9 to 3.13
+- ffmpeg
+- About 5-10 GB of free space for AI models
+
+## Setup
+
+### Install Powershell
+
+Highly suggest downloading the latest Powershell version from github:
+
+https://github.com/PowerShell/PowerShell/releases
+
+The latest version as of now:
+
+https://github.com/PowerShell/PowerShell/releases/download/v7.5.4/PowerShell-7.5.4-win-x64.msi
+
+### Install Python and pip
+
+At the time of writing (Jan 2026) WhisperX seems to only work with
+python 3.13 and lower. The latest version of Python may or may not work
+
+https://www.python.org/ftp/python/3.13.11/python-3.13.11-amd64.exe
+
+Be sure to check "Add to Path" during installation
+
+![Python install window](python-path.png)
+
+### Install Whisperx
+
+
+If you used the link above, you should be able to run this on Powershell
+
+```pwsh
+pip.exe install whisperx
+```
+
+### Install ffmpeg
+
+1. Download the latest build [here](https://github.com/BtbN/FFmpeg-Builds/releases/download/latest/ffmpeg-master-latest-win64-gpl-shared.zip)
+
+2. Unzip the folder and rename it to "ffmpeg" to keep it simple.
+
+3. Move the unzipped ffmpeg folder to a place you'll remember. `C:\Program Files\`
+ is usually a good bet
+
+4. Add the path to the ffmpeg "bin" folder to your Path. If you are not familiar with environment variables, you may want to read [this guide](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) first
+ * Search "Path" in the search bar and find the "Environment
+ variables..." button
+ * In the new window, double click the item in the list that's
+ titled "PATH"
+ * Click "New" and copy the location of the "bin" folder
+
+If you have it right, your path should look something like this with a different username of course:
+
+![Path window](ffmpeg-path.png)
+
+Once confirmed, click "OK" on all the windows.
+
+### Allow execution
+
+If your Powershell disables execution by default
+
+```pwsh
+Unblock-File subtitle-podcast.ps1
+```
+
+## Usage
+
+Move the script into the directory you have your audio files. Open up
+Powershell in that location (right click > Open in Powershell)
+
+```pwsh
+.\subtitle-podcast.ps1 episode01.mp3
+```
+
+Place cover files in the same directory as your mp3, with a name like "episode01.jpg"
+
+
+For example, if your files are in a folder called "pods"
+
+```
+pods\
+---- episode01.mp3
+---- episode01.jpg
+---- subtitle-podcast.ps1
+```
+
+Only ".png" and ".jpg" extentions are allowed since I'm lazy.
+
+### Sample output
+
+https://github.com/user-attachments/assets/c9a8b408-b48b-464a-b386-1eed8a5184c2
+
+## Hacking
+
+### Video format
+
+This script produces and mp4 file by default since it is decent for web embedding. If you want a different video output, just change line 45
+
+```pwsh
+$OutputVideo = "$Title.mp4"
+```
+
+### Accuracy
+
+WhisperX is still fairly new and actively developed, so do not expect
+perfect transcription results. For example, you may find a number of
+proper names don't get transcribed well. Grunts, laughter, and other
+non-distinct may also produce unexpected results.
+
+To make this as simple as possible without using any custom models, this
+script uses the large English model provided by WhisperX.
+
+But there are many other models the tool supports, most notably,
+other languages. For the most details on this, see the WhipserX github
+and make edits accordingly
+
+https://github.com/m-bain/whisperX?tab=readme-ov-file#other-languages
+
+A potential hack for missing names would be to bias the prompt of
+whisperx with known names that appear in the audio. If you want to
+experiment with this you can add the following line after line 38 (be
+sure to include the backtick ` at the end)
+
+```pwsh
+--initial_prompt "Names in this broadcast include: Mike Fernèz, Mr. McGuire" `
+```