Need help with TextFooler?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

jind11
297 Stars 46 Forks 19 Commits 13 Opened issues

Description

A Model for Natural Language Attack on Text Classification and Inference

Services available

!
?

Need anything else?

Contributors list

# 134,597
adversa...
bert-mo...
Python
14 commits
# 121,850
Python
convex-...
C
Shell
1 commit

TextFooler

A Model for Natural Language Attack on Text Classification and Inference

This is the source code for the paper: Jin, Di, et al. "Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment." arXiv preprint arXiv:1907.11932 (2019). If you use the code, please cite the paper:

@article{jin2019bert,
  title={Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment},
  author={Jin, Di and Jin, Zhijing and Zhou, Joey Tianyi and Szolovits, Peter},
  journal={arXiv preprint arXiv:1907.11932},
  year={2019}
}

Data

Our 7 datasets are here.

Prerequisites:

Required packages are listed in the requirements.txt file:

pip install requirements.txt

How to use

  • Run the following code to install the esim package:
cd ESIM
python setup.py install
cd ..
python comp_cos_sim_mat.py [PATH_TO_COUNTER_FITTING_WORD_EMBEDDINGS]
  • Run the following code to generate the adversaries for text classification:
python attack_classification.py

For Natural langauge inference:

python attack_nli.py

Examples of run code for these two files are in runattackclassification.py and runattacknli.py. Here we explain each required argument in details:

  • --dataset_path: The path to the dataset. We put the 1000 examples for each dataset we used in the paper in the folder data.
  • --target_model: Name of the target model such as ''bert''.
  • --targetmodelpath: The path to the trained parameters of the target model. For ease of replication, we shared the trained BERT model parameters, the trained LSTM model parameters, and the trained CNN model parameters on each dataset we used.
  • --counterfittingembeddings_path: The path to the counter-fitting word embeddings.
  • --counterfittingcossimpath: This is optional. If given, then the pre-computed cosine similarity scores based on the counter-fitting word embeddings will be loaded to save time. If not, it will be calculated.
  • --USEcachepath: The path to save the USE model file (Downloading is automatic if this path is empty).

Two more things to share with you:

  1. In case someone wants to replicate our experiments for training the target models, we shared the used seven datasets we have processed for you!

  2. In case someone may want to use our generated adversary results towards the benchmark data directly, here it is.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.