cnn Tensorflow Natural language processing Python
Need help with ID-CNN-CWS?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.
hankcs

Description

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"

129 Stars 41 Forks GNU General Public License v3.0 9 Commits 8 Opened issues

Services available

Need anything else?

ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation".

2017-10-20_13-23-31

It implements the following

4
models for CWS:
  • Bi-LSTM
  • Bi-LSTM-CRF
  • ID-CNN
  • ID-CNN-CRF

Dependencies

  • Python >= 3.6
  • TensorFlow >= 1.2

Both CPU and GPU are supported. GPU training is

10
times faster.

Preparation

Run following script to convert corpus to TensorFlow dataset.

$ ./scripts/make.sh

Train and Test

Quick Start

$ ./scripts/run.sh $dataset $model
  • $dataset
    can be
    pku
    ,
    msr
    ,
    asSC
    or
    cityuSC
    .
  • $model
    can be
    cnn
    or
    bilstm
    .

For example:

$ ./scripts/run.sh pku cnn

It will train a

cnn
model on
pku
dataset, then evaluate performance on test set.

CRF Layer

To enable CRF layer, simply append

--viterbi
to your command, e.g.
$ ./scripts/run.sh pku cnn --viterbi

Accuracy

2017-10-20_13-25-11

Speed

2017-10-20_11-44-42

Acknowledgments

  • Corpora are from SIGHAN05, converted to Simplified Chinese via HanLP. Note that the SIGHAN datasets should only be used for research purposes.
  • Model implementations adopted from https://github.com/iesl/dilated-cnn-ner by Emma Strubell.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.