ID-CNN-CWS

by hankcs

hankcs /ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"

129 Stars 41 Forks Last release: almost 3 years ago (v1.0) GNU General Public License v3.0 9 Commits 1 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation".

2017-10-20_13-23-31

It implements the following

4
models for CWS:
  • Bi-LSTM
  • Bi-LSTM-CRF
  • ID-CNN
  • ID-CNN-CRF

Dependencies

  • Python >= 3.6
  • TensorFlow >= 1.2

Both CPU and GPU are supported. GPU training is

10
times faster.

Preparation

Run following script to convert corpus to TensorFlow dataset.

$ ./scripts/make.sh

Train and Test

Quick Start

$ ./scripts/run.sh $dataset $model
  • $dataset
    can be
    pku
    ,
    msr
    ,
    asSC
    or
    cityuSC
    .
  • $model
    can be
    cnn
    or
    bilstm
    .

For example:

$ ./scripts/run.sh pku cnn

It will train a

cnn
model on
pku
dataset, then evaluate performance on test set.

CRF Layer

To enable CRF layer, simply append

--viterbi
to your command, e.g.
$ ./scripts/run.sh pku cnn --viterbi

Accuracy

2017-10-20_13-25-11

Speed

2017-10-20_11-44-42

Acknowledgments

  • Corpora are from SIGHAN05, converted to Simplified Chinese via HanLP. Note that the SIGHAN datasets should only be used for research purposes.
  • Model implementations adopted from https://github.com/iesl/dilated-cnn-ner by Emma Strubell.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.