DeepSpeech based forced alignment tool
DeepSpeech based forced alignment tool
It is recommended to use this tool from within a virtual environment. After cloning and changing to the root of the project, there is a script for creating one with all requirements in the git-ignored dir
venv:
shell script $ bin/createenv.sh $ ls venv bin include lib lib64 pyvenv.cfg share
bin/align.shwill automatically use it.
Internally DSAlign uses the DeepSpeech STT engine. For it to be able to function, it requires a couple of files that are specific to the language of the speech data you want to align. If you want to align English, there is already a helper script that will download and prepare all required data:
shell script $ bin/getmodel.sh [...] $ ls models/en/ alphabet.txt lm.binary output_graph.pb output_graph.pbmm output_graph.tflite trie
A typical application of the aligner is done in three phases:
There is a script for downloading and preparing some public domain speech and transcript data. It requires
ffmpegfor some sample conversion.
shell script $ bin/gettestdata.sh $ ls data test1 test2
Now the aligner can be called either "manually" (specifying all involved files directly):
shell script $ bin/align.sh --audio data/test1/audio.wav --script data/test1/transcript.txt --aligned data/test1/aligned.json --tlog data/test1/transcript.log
Or "automatically" by specifying a so-called catalog file that bundles all involved paths:
shell script $ bin/align.sh --catalog data/test1.catalog