Need help with rnn-transducer?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

ZhengkunTian
167 Stars 48 Forks 18 Commits 8 Opened issues

Description

A Pytorch Implementation of Transducer Model for End-to-End Speech Recognition

Services available

!
?

Need anything else?

Contributors list

No Data

RNN-Transducer

A Pytorch Implementation of Transducer Model for End-to-End Speech Recognition.

If you have any questions, please email to me! Email: [email protected]cn

Environment

  • pytorch >= 0.4
  • warp-transducer

Preparation

We utilize Kaldi for data preparation. At least these files(text, feats.scp) should be included in the training/development/test set. If you apply cmvn, utt2spk and cmvn.scp are required. The format of these file is consistent with Kaidi. The format of vocab is as follows.

 0
 1
我 2
你 3
...

Train

python train.py -config config/aishell.yaml

Eval

python eval.py -config config/aishell.yaml

Experiments

The details of our RNN-Transducer are as follows.

yaml
model:
    enc:
        type: lstm
        hidden_size: 320
        n_layers: 4
        bidirectional: True
    dec:
        type: lstm
        hidden_size: 512
        n_layers: 1
    embedding_dim: 512
    vocab_size: 4232
    dropout: 0.2
All experiments are conducted on AISHELL-1. During decoding, we use beam search with width of 5 for all the experiments. A character-level 5-gram language model from training text, is integrated into beam searching by shallow fusion.

| MODEL | DEV(CER) | TEST(CER) | |:---: | :---:|:---: | | RNNT+pretrain+LM | 10.13 | 11.82 |

Acknowledge

Thanks to warp-transducer.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.