Streaming Transformer

This repo contains the streaming Transformer of our work

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
, which is based on ESPnet0.6.0. The streaming Transformer includes a streaming encoder, either chunk-based or look-ahead based, and a trigger-attention based decoder.

We will release following models and show reproducible results on Librispeech

  • Streamingtransformer-chunk32 with ESPnet Conv2d Encoder. (

  • Streaming_transformer-chunk32 with VGG Encoder. (

  • Streamingtransformer-lookahead with ESPnet Conv2d Encoder. (

  • Streamingtransformer-lookahead with VGG Encoder. (

Results on Librispeech (beam=10)

| Model | test-clean | test-other |latency |size | | -------- | -----: | :----: |:----: |:----: | | streamingtransformer-chunk32-conv2d | 2.8 | 7.5 | 640ms | 78M | | streamingtransformer-chunk32-vgg | 2.8 | 7.0| 640ms | 78M | | streamingtransformer-lookahead2-conv2d | 3.0 | 8.6| 1230ms | 78M | | streamingtransformer-lookahead2-vgg | 2.8 | 7.5 | 1230ms | 78M |


Our installation follow the installation process of ESPnet

Step 1. setting of the environment



Step 2. installation including Kaldi

cd tools
make -j 10

Build a streaming Transformer model

Step 1. Data Prepare

cd egs/librispeech/asr1

By default. the processed data will stored in the current directory. You can change the path by editing the scripts.

Step 2. Viterbi decoding

To train a TA based streaming Transformer, the alignments between CTC paths and transcriptions are required. In our work, we apply Viterbi decoding using the offline Transformer model.

cd egs/librispeech/asr1
./ /path/to/model

Step 3. Train a streaming Transformer

Here, we train a chunk-based streaming Transformer which is initialized with an offline Transformer provided by ESPnet. Set

to the path of your offline model.
cd egs/librispeech/asr1

If you want to train a look-ahead based streaming Transformer, set

to False and change the
left-window, right-window, dec-left-window, dec-right-window
arguments. The training log is written in
. You can monitor the output through
tail -f exp/streaming_transformer/train.log

Step 4. Decoding

Execute the following script with to decoding on testclean and testother sets

./ num_of_gpu job_per_gpu

Offline Transformer Reference

Regarding the offline Transformer model, Please visit here

