Need help with Funnel-Transformer?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

laiguokun
184 Stars 13 Forks MIT License 2 Commits 7 Opened issues

Services available

!
?

Need anything else?

Contributors list

# 276,739
Python
Shell
1 commit
# 14,346
Python
Tensorf...
Jupyter...
Deep le...
1 commit

Introduction

Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, Funnel-Transformer usually has a higher capacity given the same FLOPs. In addition, with a decoder, Funnel-Transformer is able to recover the token-level deep representation for each token from the reduced hidden sequence, which enables standard pretraining.

For a detailed description of technical details and experimental results, please refer to our paper:

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Zihang Dai*, Guokun Lai*, Yiming Yang, Quoc V. Le

(*: equal contribution)

Preprint 2020

Source Code

Data Download

  • The corresponding source code and instructions are in the
    data-scrips
    folder, which specifies how to access the raw data we used in this work.

TensorFlow

  • The corresponding source code is in the
    tensorflow
    folder, which was developed and exactly used for TPU pretraining & finetuning as presented in the paper.
  • The TensorFlow funetuning code mainly supports TPU finetuining on GLUE benchmark, text classification, SQuAD and RACE.
  • Please refer to
    tensorflow/README.md
    for details.

PyTorch

  • The source code is in the
    pytorch
    folder, which only serves as an example PyTorch implementation of Funnel-Transformer.
  • Hence, the PyTorch code only supports GPU finetuning for the GLUE benchmark & text classification.
  • Please refer to
    pytorch/README.md
    for details.

Pretrained models

| Model Size | PyTorch | TensorFlow | TensorFlow-Full | | -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | B10-10-10H1024 | Link | Link | Link | | B8-8-8H1024 | Link | Link | Link | | B6-6-6H768 | Link | Link | Link | | B6-3x2-3x2H768 | Link | Link | Link | | B4-4-4H768 | Link | Link | Link |

Each

.tar.gz
file contains three items:
  • A TensorFlow or PyTorch checkpoint (
    model.ckpt-*
    or
    model.ckpt.pt
    ) checkpoint containing the pre-trained weights (Note: The TensorFlow checkpoint actually corresponds to 3 files).
  • A Word Piece model (
    vocab.uncased.txt
    ) used for (de)tokenization.
  • A config file (
    net_config.json
    or
    net_config.pytorch.json
    ) which specifies the hyperparameters of the model.

You also can use

download_all_ckpts.sh
to download all checkpoints mentioned above.

For how to use the pretrained models, please refer to

tensorflow/README.md
or
pytorch/README.md
respectively.

Results

glue-dev

qa

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.