Need help with MPNet?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

178 Stars 21 Forks MIT License 24 Commits 3 Opened issues


MPNet: Masked and Permuted Pre-training for Language Understanding

Services available


Need anything else?

Contributors list


MPNet: Masked and Permuted Pre-training for Language Understanding, by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, is a novel pre-training method for language understanding tasks. It solves the problems of MLM (masked language modeling) in BERT and PLM (permuted language modeling) in XLNet and achieves better accuracy.

News: We have updated the pre-trained models now.

Supported Features

  • A unified view and implementation of several pre-training models including BERT, XLNet, MPNet, etc.
  • Code for pre-training and fine-tuning for a variety of language understanding (GLUE, SQuAD, RACE, etc) tasks.


We implement MPNet and this pre-training toolkit based on the codebase of fairseq. The installation is as follow:

pip install --editable pretraining/
pip install pytorch_transformers==1.0.0 transformers scipy sklearn

Pre-training MPNet

Our model is pre-trained with bert dictionary, you first need to

pip install transformers
to use bert tokenizer. We provide a script
and a dictionary file
to tokenize your corpus. You can modify
if you want to use other tokenizers (like roberta).

1) Preprocess data

We choose WikiText-103 as a demo. The running script is as follow:


for SPLIT in train valid test; do
python MPNet/
--inputs wikitext-103-raw/wiki.${SPLIT}.raw
--outputs wikitext-103-raw/wiki.${SPLIT}.bpe
--workers 60;

Then, we need to binarize data. The command of binarizing data is following:

fairseq-preprocess \
    --only-source \
    --srcdict MPNet/dict.txt \
    --trainpref wikitext-103-raw/wiki.train.bpe \
    --validpref wikitext-103-raw/wiki.valid.bpe \
    --testpref wikitext-103-raw/wiki.test.bpe \
    --destdir data-bin/wikitext-103 \
    --workers 60

2) Pre-train MPNet

The below command is to train a MPNet model: ``` TOTALUPDATES=125000 # Total number of training steps WARMUPUPDATES=10000 # Warmup the learning rate over this many updates PEAKLR=0.0005 # Peak learning rate, adjust as needed TOKENSPERSAMPLE=512 # Max sequence length MAXPOSITIONS=512 # Num. positional embeddings (usually same as above) MAXSENTENCES=16 # Number of sequences per batch (batch size) UPDATEFREQ=16 # Increase the batch size 16x


fairseq-train --fp16 $DATADIR \ --task maskedpermutationlm --criterion maskedpermutationcrossentropy \ --arch mpnetbase --sample-break-mode complete --tokens-per-sample $TOKENSPERSAMPLE \ --optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-6 --clip-norm 0.0 \ --lr-scheduler polynomialdecay --lr $PEAKLR --warmup-updates $WARMUPUPDATES --total-num-update $TOTALUPDATES \ --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \ --max-sentences $MAXSENTENCES --update-freq $UPDATEFREQ \ --max-update $TOTALUPDATES --log-format simple --log-interval 1 --input-mode 'mpnet' ``

**Notes**: You can replace arch with
and add command
--mask-whole-words --bpe bert` to use relative position embedding and whole word mask.

Notes: You can specify

to train masked language model or permutation language model.

Pre-trained models

We have updated the final pre-trained MPNet model for fine-tuning.

You can load the pre-trained MPNet model like this:

from fairseq.models.masked_permutation_net import MPNet
mpnet = MPNet.from_pretrained('checkpoints', '', 'path/to/data', bpe='bert')
assert isinstance(mpnet.model, torch.nn.Module)

Fine-tuning MPNet on down-streaming tasks


Our code is based on fairseq-0.8.0. Thanks for their contribution to the open-source commuity.


If you find this toolkit useful in your work, you can cite the corresponding papers listed below:

    title={MPNet: Masked and Permuted Pre-training for Language Understanding},
    author={Song, Kaitao and Tan, Xu and Qin, Tao and Lu, Jianfeng and Liu, Tie-Yan},
    journal={arXiv preprint arXiv:2004.09297},

Related Works

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.