Need help with Structured-Self-Attentive-Sentence-Embedding?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

ExplorerFreda
415 Stars 94 Forks GNU General Public License v3.0 14 Commits 7 Opened issues

Description

An open-source implementation of the paper ``A Structured Self-Attentive Sentence Embedding'' (Lin et al., ICLR 2017).

Services available

!
?

Need anything else?

Contributors list

# 240,363
Python
7 commits

Structured-Self-Attentive-Sentence-Embedding

An open-source implementation of the paper ``A Structured Self-Attentive Sentence Embedding'' published by IBM and MILA. https://arxiv.org/abs/1703.03130

Requirments

PyTorch: http://pytorch.org/

spaCy: https://spacy.io/

Please refer to https://github.com/pytorch/examples/tree/master/snli for the information about obtaining GloVe model (in PyTorch model format .pt). Typically, the model should be a tuple (dict, torch.FloatTensor, int), where the first element (dict) is a mapping from word to its index, the third element (int) is the dimension of the word embeddings, and the second element (torch.FloatTensor) with the size of word_count * dim refers to the word embeddings.

Usage

Tokenization

bash
python tokenizer-yelp.py --input [Yelp dataset] --output [output path, will be a json file] --dict [output dictionary path, will be a json file]

Training Model

bash
python train.py \
--emsize [word embedding size default 300] \
--nhid [hidden layer size, default 300] \
--nlayers [hidden layer numbers in Bi-LSTM, default 2] \
--attention-unit [attention unit number, d_a in the paper, default 350] \
--attention-hops [hop number, r in the paper, default 1] \
--dropout [dropout ratio, default 0.5] \
--nfc [hidden layer size for MLP in the classifier, default 512] \
--lr [learning rate, default 0.001] \
--epochs [epoch number for training, default 40] \
--seed [initial seed for reproduction, default 1111] \
--log-interval [the interval for reporting training loss, default 200] \
--batch-size [size of a batch in training procedure, default 32] \
--optimizer [type of the optimizer, default Adam] \
--penalization-coeff [coefficient of the Frobenius Norm penalization term, default 1.0] \
--class-number [number of class for the last step of classification] \
--save [path to save model] \
--dictionary [location of the dictionary generated by the tokenizer] \
--word-vector [location of the initial word vector, e.g. GloVe, should be a torch .pt model] \
--train-data [location of training data, should be in the same format with tokenized productions] \
--val-data [development set] \
--test-data [location of testing dataset] \
--cuda [whether using GPU for training, remove this when using CPU] 

Differences between the paper and our implementation

  1. For faster Python based tokenization, we used spaCy instead of Stanford Tokenizer (https://nlp.stanford.edu/software/tokenizer.shtml)

  2. For faster performance, we manually crop the comments in Yelp to a max length of 500.

Example Experimental Command and Result

We followed Lin et al.(2017) to generate the dataset, and obtained the following result:

python train.py --train-data "data/train.json" --val-data "data/dev.json" --test-data "data/test.json" --cuda --emsize 300 --nhid 300 --nfc 300 --dropout 0.5 --attention-unit 350 --epochs 10 --lr 0.001 --clip 0.5 --dictionary "data/Yelp/data/dict.json" --word-vector "data/GloVe/glove.42B.300d.pt" --save "models/model-medium.pt" --batch-size 50 --class-number 5 --optimizer Adam --attention-hops 4 --penalization-coeff 1.0 --log-interval 100
# test loss (cross entropy loss, without the Frobenius norm penalization) 0.7544
# test accuracy: 0.6690

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.