Need help with SemBERT?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

251 Stars 43 Forks 17 Commits 3 Opened issues


Semantics-aware BERT for Language Understanding (AAAI 2020)

Services available


Need anything else?

Contributors list

# 143,291
17 commits

SemBERT: Semantics-aware BERT for Language Understanding

(2020/10/07) Update: Tips for possible issues

1) SRL prediction mismatches the provided samples

The POS tags are slightly different using different spaCy versions. SemBERT used spacy==2.0.18 to obtain the verbs.

Refer to allenai/allennlp#3418, cooelf/SemBERT#12 (CHN).

2) SRL is not a registered name for Model.

Please try pip install --pre allennlp-models

3) Issues about AllenNLP

If you encounter issues about the class or variables in AllenNLP, please try to use a lower version, e.g., 0.8.1.

Our experiment environment for reference:

Python 3.6+ PyTorch (1.0.0) AllenNLP (0.8.1)


Codes for the paper Semantics-aware BERT for Language Understanding in AAAI 2020



(Our experiment environment for reference)

Python 3.6+ PyTorch (1.0.0) AllenNLP (0.8.1)


GLUE data can be downloaded from GLUE data by running this script and unpack it to directory gluedata. We provide an example data sample in gluedata/MNLI to show how SemBERT works.


This repo shows the example implementation of SemBERT for NLU tasks. We basically used the pre-trained BERT uncased models so do not forget to pass the parameter


The example script are as follows:

Train a model

Note: please replace the sample data with labeled data (use our labeled data or annotate your data following the instructions below).

python \
--data_dir glue_data/SNLI/ \
--task_name snli \
--train_batch_size 32 \
--max_seq_length 128 \
--bert_model bert-wwm-uncased \
--learning_rate 2e-5 \
--num_train_epochs 2 \
--do_train \
--do_eval \
--do_lower_case \
--max_num_aspect 3 \
--output_dir glue/snli_model_dir


can be used for evaluation, where the later is simplified for easy employment.

The major difference is that
takes labeled data as input, while
integrates the real-time semantic role labeling, so it uses the original raw data.

Evaluation using labeled data

python \
--data_dir glue_data/SNLI/ \
--task_name snli \
--eval_batch_size 128 \
--max_seq_length 128 \
--bert_model bert-wwm-uncased \
--do_eval \
--do_lower_case \
--max_num_aspect 3 \
--output_dir glue/snli_model_dir

Evaluation using raw data (with real-time semantic role labeling)

Our trained SNLI model (reaching 91.9% test accuracy) can be accessed here.

To use our trained SNLI model, please put the SNLI model and the SRL model to the snlimodeldir and srlmodeldir, respectively.

As shown in our example SNLI model, the folder of snlimodeldir should contain three files:

vocab.txt and bert_config.json from the BERT model folder that are used for training your model;

pytorch_model.bin that is the trained SNLI model.

python \
--data_dir /share03/zhangzs/glue_data/SNLI \
--task_name snli \
--eval_batch_size 128 \
--max_seq_length 128 \
--max_num_aspect 3 \
--do_eval \
--do_lower_case \
--bert_model snli_model_dir \
--output_dir snli_model_dir \
--tagger_path srl_model_dir

For prediction, use the flag:

for either the script
. The output pred file can be directly used for GLUE online submission and evaluation.

Data annotation (Semantic role labeling)

We provide two kinds of semantic labeling method,

  • online: each word sequence are passed to label module to obtain the tags which could be used for online prediction. This would be time-consuming for large corpus. See tag_model/

If you want to use the online one, please specify the

parameter in the file.
  • offline: the current one that pre-process the datasets and save them for later loading for training and evaluation. See tagmodel/

Our labeled data can be downloaded here for quick start.

Google Drive:

Baidu Cloud:

Link Password:sl7l

Note this repo is based on the offline version, so that the column id/index in the data-processor would be slightly different from the original, which is like this:

texta = line[-3] textb = line[-2] label = line[-1]

If you use the original data instead of our preprocessed one by tagmodel/, please modify the index according to the dataset structure.

SRL model

The SRL model in this implementation used the ELMo-based SRL model from AllenNLP.

Recently, there is a new BERT-based model, which is a nice alternative.


Please kindly cite this paper in your publications if it helps your research:

    title={Semantics-aware {BERT} for language understanding},
    author={Zhang, Zhuosheng and Wu, Yuwei and Zhao, Hai and Li, Zuchao and Zhang, Shuailiang and Zhou, Xi and Zhou, Xiang},
    booktitle={the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2020)},

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.