Need help with rnnt-speech-recognition?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

190 Stars 68 Forks MIT License 59 Commits 15 Opened issues


End-to-end speech recognition using RNN Transducers in Tensorflow 2.0

Services available


Need anything else?

Contributors list

No Data

RNN-Transducer Speech Recognition

End-to-end speech recognition using RNN-Transducer in Tensorflow 2.0


This speech recognition model is based off Google's Streaming End-to-end Speech Recognition For Mobile Devices research paper and is implemented in Python 3 using Tensorflow 2.0

Setup Your Environment

To setup your environment, run the following command:

git clone --recurse
cd rnnt-speech-recognition
pip install tensorflow==2.2.0 # or tensorflow-gpu==2.2.0 for GPU support
pip install -r requirements.txt
./scripts/ # to setup the rnnt loss

Common Voice

You can find and download the Common Voice dataset here

Convert all MP3s to WAVs

Before you can train a model on the Common Voice dataset, you must first convert all the audio mp3 filetypes to wavs. Do so by running the following command:

NOTE: Make sure you have

installed on your computer, as it uses that to convert mp3 to wav
python scripts/ \
    --data_dir  \

Preprocessing dataset

After converting all the mp3s to wavs you need to preprocess the dataset, you can do so by running the following command:

python \
    --data_dir  \

Training a model

To train a simple model, run the following command:

python \
    --mode train \

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.