Need help with StreamingTransformer?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

221 Stars 36 Forks Apache License 2.0 7.7K Commits 10 Opened issues

Services available


Need anything else?

Contributors list

Streaming Transformer

This repo contains the streaming Transformer of our work

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
, which is based on ESPnet0.6.0. The streaming Transformer includes a streaming encoder, either chunk-based or look-ahead based, and a trigger-attention based decoder.

We will release following models and show reproducible results on Librispeech

  • Streamingtransformer-chunk32 with ESPnet Conv2d Encoder. (

  • Streaming_transformer-chunk32 with VGG Encoder. (

  • Streamingtransformer-lookahead with ESPnet Conv2d Encoder. (

  • Streamingtransformer-lookahead with VGG Encoder. (

Results on Librispeech (beam=10)

| Model | test-clean | test-other |latency |size | | -------- | -----: | :----: |:----: |:----: | | streamingtransformer-chunk32-conv2d | 2.8 | 7.5 | 640ms | 78M | | streamingtransformer-chunk32-vgg | 2.8 | 7.0| 640ms | 78M | | streamingtransformer-lookahead2-conv2d | 3.0 | 8.6| 1230ms | 78M | | streamingtransformer-lookahead2-vgg | 2.8 | 7.5 | 1230ms | 78M |


Our installation follow the installation process of ESPnet

Step 1. setting of the environment



Step 2. installation including Kaldi

cd tools
make -j 10

Build a streaming Transformer model

Step 1. Data Prepare

cd egs/librispeech/asr1

By default. the processed data will stored in the current directory. You can change the path by editing the scripts.

Step 2. Viterbi decoding

To train a TA based streaming Transformer, the alignments between CTC paths and transcriptions are required. In our work, we apply Viterbi decoding using the offline Transformer model.

cd egs/librispeech/asr1
./ /path/to/model

Step 3. Train a streaming Transformer

Here, we train a chunk-based streaming Transformer which is initialized with an offline Transformer provided by ESPnet. Set

to the path of your offline model.
cd egs/librispeech/asr1

If you want to train a look-ahead based streaming Transformer, set

to False and change the
left-window, right-window, dec-left-window, dec-right-window
arguments. The training log is written in
. You can monitor the output through
tail -f exp/streaming_transformer/train.log

Step 4. Decoding

Execute the following script with to decoding on testclean and testother sets

./ num_of_gpu job_per_gpu

Offline Transformer Reference

Regarding the offline Transformer model, Please visit here

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.