Need help with Listen-Attend-Spell?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

kaituoxu
145 Stars 48 Forks 24 Commits 9 Opened issues

Description

A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.

Services available

!
?

Need anything else?

Contributors list

# 49,771
Python
Shell
pytorch
self-at...
24 commits

Listen, Attend and Spell

A PyTorch implementation of Listen, Attend and Spell (LAS) [1], an end-to-end automatic speech recognition framework, which directly converts acoustic features to character sequence using only one nueral network.

Install

  • Python3 (Recommend Anaconda)
  • PyTorch 0.4.1+
  • Kaldi (Just for feature extraction)
  • pip install -r requirements.txt
  • cd tools; make KALDI=/path/to/kaldi
  • If you want to run
    egs/aishell/run.sh
    , download aishell dataset for free.

Usage

  1. $ cd egs/aishell
    and modify aishell data path to your path in
    run.sh
    .
  2. $ bash run.sh
    , that's all!

You can change hyper-parameter by

$ bash run.sh --parameter_name parameter_value
, egs,
$ bash run.sh --stage 3
. See parameter name in
egs/aishell/run.sh
before
. utils/parse_options.sh
.

More detail

$ cd egs/aishell/
$ . ./path.sh

Train

bash
$ train.py -h
Decode
bash
$ recognize.py -h

Workflow

Workflow of

egs/aishell/run.sh
: - Stage 0: Data Preparation - Stage 1: Feature Generation - Stage 2: Dictionary and Json Data Preparation - Stage 3: Network Training - Stage 4: Decoding

Visualize loss

If you want to visualize your loss, you can use

visdom
to do that: - Open a new terminal in your remote server (recommend tmux) and run
$ visdom
. - Open a new terminal and run
$ bash run.sh --visdom 1 --visdom_id ""
or
$ train.py ... --visdom 1 --vidsdom_id ""
. - Open your browser and type
:8097
, egs,
127.0.0.1:8097
. - In visdom website, chose
 in 
Environment
to see your loss.

Results

| Model | CER | Config | | :---: | :-: | :----: | | LSTMP | 9.85| 4x(1024-512) | | Listen, Attend and Spell | 13.2 | See egs/aishell/run.sh |

Reference

[1] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in ICASSP 2016. (https://arxiv.org/abs/1508.01211v2)

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.