A No-Recurrence Sequence-to-Sequence Model for Speech Recognition
This is a speech-transformer model for end-to-end speech recognition. If you have any questions, please email to me. ([email protected])
Pytorch >= 1.2.0 (<= 1.6.0)
Torchaudio >= 0.3.0
Speech Transformer / Conformer
Tie Weights of Embedding with output softmax layer
Extract Fbank features in a online fashion
Read the feature with the kaldi or espnet format!
Visualization based Tensorboard [Will Be Updated Soon!]
Batch Beam Search with Length Penalty
Multiple Optimizers and Schedulers
Multiple Activation Functions in FFN
LM Shollow Fusion
if you want to compute features online, please make sure you have a wav.scp file.
BAC009S0764W0139 /data/aishell/wav/BAC009S0764W0139.wav ```
python python run.py -c egs/aishell/conf/transformer.yaml
python python run.py -c egs/aishell/transformer.yaml -n 2 -g 0,1
python tools/average.py your_model_expdir 50 59 # average the models from 50-th epoch to 59-th epoch
python eval.py -m model.pt
Our Model can achieve a CER of 6.7% without CMVN, any external LM and joint-CTC training on AISHELL-1, which is better than 7.4% of Chain Model in Kaldi.
OpenTransformer refer to ESPNET.