Need help with rnnt-speech-recognition?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

noahchalifour
190 Stars 68 Forks MIT License 59 Commits 15 Opened issues

Description

End-to-end speech recognition using RNN Transducers in Tensorflow 2.0

Services available

!
?

Need anything else?

Contributors list

No Data

RNN-Transducer Speech Recognition

End-to-end speech recognition using RNN-Transducer in Tensorflow 2.0

Overview

This speech recognition model is based off Google's Streaming End-to-end Speech Recognition For Mobile Devices research paper and is implemented in Python 3 using Tensorflow 2.0

Setup Your Environment

To setup your environment, run the following command:

git clone --recurse https://github.com/noahchalifour/rnnt-speech-recognition.git
cd rnnt-speech-recognition
pip install tensorflow==2.2.0 # or tensorflow-gpu==2.2.0 for GPU support
pip install -r requirements.txt
./scripts/build_rnnt.sh # to setup the rnnt loss

Common Voice

You can find and download the Common Voice dataset here

Convert all MP3s to WAVs

Before you can train a model on the Common Voice dataset, you must first convert all the audio mp3 filetypes to wavs. Do so by running the following command:

NOTE: Make sure you have

ffmpeg
installed on your computer, as it uses that to convert mp3 to wav
./scripts/common_voice_convert.sh  
python scripts/remove_missing_samples.py \
    --data_dir  \
    --replace_old

Preprocessing dataset

After converting all the mp3s to wavs you need to preprocess the dataset, you can do so by running the following command:

python preprocess_common_voice.py \
    --data_dir  \
    --output_dir 

Training a model

To train a simple model, run the following command:

python run_rnnt.py \
    --mode train \
    --data_dir 

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.