Github url


by tensorflow

tensorflow /tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and acce...

10.2K Stars 2.6K Forks Last release: 16 days ago (v1.15.7) Apache License 2.0 4.3K Commits 76 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:


version GitHub
Issues Contributions
welcomeGitterLicenseTravisRun on FH

Tensor2Tensor, orT2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

T2T was developed by researchers and engineers in theGoogle Brain team and a community of users. It is now deprecated — we keep it running and welcome bug-fixes, but encourage users to use the successor library Trax.

Quick Start

This iPython notebookexplains T2T and runs in your browser using a free VM from Google, no installation needed. Alternatively, here is a one-command version that installs T2T, downloads MNIST, trains a model and evaluates it:

pip install tensor2tensor && t2t-trainer \ --generate\_data \ --data\_dir=~/t2t\_data \ --output\_dir=~/t2t\_train/mnist \ --problem=image\_mnist \ --model=shake\_shake \ --hparams\_set=shake\_shake\_quick \ --train\_steps=1000 \ --eval\_steps=100


Suggested Datasets and Models

Below we list a number of tasks that can be solved with T2T when you train the appropriate model on the appropriate problem. We give the problem and model below and we suggest a setting of hyperparameters that we know works well in our setup. We usually run either on Cloud TPUs or on 8-GPU machines; you might need to modify the hyperparameters if you run on a different setup.

Mathematical Language Understanding

For evaluating mathematical expressions at the character level involving addition, subtraction and multiplication of both positive and negative decimal numbers with variable digits assigned to symbolic variables, use

  • the MLU data-set: ```
  • -problem=algorithmic_math_two_variables

You can try solving the problem with different transformer models and hyperparameters as described in the paper: * Standard transformer:


* Universal transformer:


* Adaptive universal transformer:


Story, Question and Answer

For answering questions based on a story, use

  • the bAbi data-set: ```
  • -problem=babi_qa_concat_task1_1k

You can choose the bAbi task from the range [1,20] and the subset from 1k or 10k. To combine test data from all tasks into a single test set, use


Image Classification

For image classification, we have a number of standard data-sets:

  • ImageNet (a large data-set): ```
  • -problem=image_imagenet
    , or one of the re-scaled versions (
    image_imagenet32 ``` )
  • CIFAR-10: ```
  • -problem=image_cifar10
  • -problem=image_cifar10_plain ``` to turn off data augmentation)
  • CIFAR-100: ```
  • -problem=image_cifar100 ```
  • MNIST: ```
  • -problem=image_mnist

For ImageNet, we suggest to use the ResNet or Xception, i.e., use

--model=resnet --hparams\_set=resnet\_50


--model=xception --hparams\_set=xception\_base

. Resnet should get to above 76% top-1 accuracy on ImageNet.

For CIFAR and MNIST, we suggest to try the shake-shake model:

--model=shake\_shake --hparams\_set=shakeshake\_big

. This setting trained for


should yield close to 97% accuracy on CIFAR-10.

Image Generation

For (un)conditional image generation, we have a number of standard data-sets:

  • CelebA: ```
  • -problem=img2img_celeba ``` for image-to-image translation, namely, superresolution from 8x8 to 32x32.
  • CelebA-HQ: ```
  • -problem=image_celeba256_rev ``` for a downsampled 256x256.
  • CIFAR-10: ```
  • -problem=image_cifar10_plain_gen_rev ``` for class-conditional 32x32 generation.
  • LSUN Bedrooms: ```
  • -problem=image_lsun_bedrooms_rev ```
  • MS-COCO: ```
  • -problem=image_text_ms_coco_rev ``` for text-to-image generation.
  • Small ImageNet (a large data-set): ```
  • -problem=image_imagenet32_gen_rev
    for 32x32 or 
  • -problem=image_imagenet64_gen_rev
    for 64x64.

We suggest to use the Image Transformer, i.e.,


, or the Image Transformer Plus, i.e.,


that uses discretized mixture of logistics, or variational auto-encoder, i.e.,


. For CIFAR-10, using




yields 2.90 bits per dimension. For Imagenet-32, using


yields 3.77 bits per dimension.

Language Modeling

For language modeling, we have these data-sets in T2T:

  • PTB (a small data-set): ```
  • -problem=languagemodel_ptb10k
    for word-level modeling and 
  • -problem=languagemodel_ptb_characters ``` for character-level modeling.
  • LM1B (a billion-word corpus): ```
  • -problem=languagemodel_lm1b32k
    for subword-level modeling and 
  • -problem=languagemodel_lm1b_characters
    for character-level modeling.

We suggest to start with


on this task and use


for PTB and


for LM1B.

Sentiment Analysis

For the task of recognizing the sentiment of a sentence, use

  • the IMDB data-set: ```
  • -problem=sentiment_imdb

We suggest to use


here and since it is a small data-set, try


and train for few steps (e.g.,



Speech Recognition

For speech-to-text, we have these data-sets in T2T:

  • Librispeech (US English):


for the whole set and


for a smaller but nicely filtered part.

Mozilla Common Voice (US English):


for the whole set


for a quality-checked subset.


For summarizing longer text into shorter one we have these data-sets:

  • CNN/DailyMail articles summarized into a few sentences: ```
  • -problem=summarize_cnn_dailymail32k

We suggest to use




for this task. This yields good ROUGE scores.


There are a number of translation data-sets in T2T:

  • English-German: ```
  • -problem=translate_ende_wmt32k ```
  • English-French: ```
  • -problem=translate_enfr_wmt32k ```
  • English-Czech: ```
  • -problem=translate_encs_wmt32k ```
  • English-Chinese: ```
  • -problem=translate_enzh_wmt32k ```
  • English-Vietnamese: ```
  • -problem=translate_envi_iwslt32k ```
  • English-Spanish: ```
  • -problem=translate_enes_wmt32k

You can get translations in the other direction by appending


to the problem name, e.g., for German-English use


(note that you still need to download the original data with t2t-datagen



For all translation problems, we suggest to try the Transformer model:


. At first it is best to try the base setting,


. When trained on 8 GPUs for 300K steps this should reach a BLEU score of about 28 on the English-German data-set, which is close to state-of-the art. If training on a single GPU, try the


setting. For very good results or larger data-sets (e.g., for English-French), try the big model with



See this example to know how the translation works.



Here's a walkthrough training a good English-to-German translation model using the Transformer model from Attention Is All You Need on WMT data.

pip install tensor2tensor # See what problems, models, and hyperparameter sets are available. # You can easily swap between them (and add new ones). t2t-trainer --registry\_help PROBLEM=translate\_ende\_wmt32k MODEL=transformer HPARAMS=transformer\_base\_single\_gpu DATA\_DIR=$HOME/t2t\_data TMP\_DIR=/tmp/t2t\_datagen TRAIN\_DIR=$HOME/t2t\_train/$PROBLEM/$MODEL-$HPARAMS mkdir -p $DATA\_DIR $TMP\_DIR $TRAIN\_DIR # Generate data t2t-datagen \ --data\_dir=$DATA\_DIR \ --tmp\_dir=$TMP\_DIR \ --problem=$PROBLEM # Train # \* If you run out of memory, add --hparams='batch\_size=1024'. t2t-trainer \ --data\_dir=$DATA\_DIR \ --problem=$PROBLEM \ --model=$MODEL \ --hparams\_set=$HPARAMS \ --output\_dir=$TRAIN\_DIR # Decode DECODE\_FILE=$DATA\_DIR/decode\_this.txt echo "Hello world" \>\> $DECODE\_FILE echo "Goodbye world" \>\> $DECODE\_FILE echo -e 'Hallo Welt\nAuf Wiedersehen Welt' \> BEAM\_SIZE=4 ALPHA=0.6 t2t-decoder \ --data\_dir=$DATA\_DIR \ --problem=$PROBLEM \ --model=$MODEL \ --hparams\_set=$HPARAMS \ --output\_dir=$TRAIN\_DIR \ --decode\_hparams="beam\_size=$BEAM\_SIZE,alpha=$ALPHA" \ --decode\_from\_file=$DECODE\_FILE \ --decode\_to\_file=translation.en # See the translations cat translation.en # Evaluate the BLEU score # Note: Report this BLEU score in papers, not the internal approx\_bleu metric. t2t-bleu --translation=translation.en


# Assumes tensorflow or tensorflow-gpu installed pip install tensor2tensor # Installs with tensorflow-gpu requirement pip install tensor2tensor[tensorflow\_gpu] # Installs with tensorflow (cpu) requirement pip install tensor2tensor[tensorflow]


# Data generator t2t-datagen # Trainer t2t-trainer --registry\_help

Library usage:

python -c "from tensor2tensor.models.transformer import Transformer"


  • Many state of the art and baseline models are built-in and new models can be added easily (open an issue or pull request!).
  • Many datasets across modalities - text, audio, image - available for generation and use, and new ones can be added easily (open an issue or pull request for public datasets!).
  • Models can be used with any dataset and input mode (or even multiple); all modality-specific processing (e.g. embedding lookups for text tokens) is done with
    transformations, which are specified per-feature in the model.
  • Support for multi-GPU machines and synchronous (1 master, many workers) and asynchronous (independent workers synchronizing through a parameter server)distributed training.
  • Easily swap amongst datasets and models by command-line flag with the data generation script
    and the training script
  • Train on Google Cloud ML and Cloud TPUs.

T2T overview


Problems consist of features such as inputs and targets, and metadata such as each feature's modality (e.g. symbol, image, audio) and vocabularies. Problem features are given by a dataset, which is stored as a


file with


protocol buffers. All problems are imported in[


]( are registered with


. Run[


]( see the list of available problems and download them.


**``` T2TModel

s** define the core tensor-to-tensor computation. They apply a default transformation to each input and output so that models may deal with modality-independent tensors (e.g. embeddings at the input; and a linear transform at the output to produce logits for a softmax over classes). All models are imported in the[


 subpackage](, inherit from [


](, and are registered with[



### Hyperparameter Sets

**Hyperparameter sets** are encoded in[


](, and are registered with[


]( Every model and problem has a


. A basic set of hyperparameters are defined in[

]( hyperparameter set functions can compose other hyperparameter set functions.
### Trainer

The **trainer** binary is the entrypoint for training, evaluation, and inference. Users can easily switch between problems, models, and hyperparameter sets by using the




, and 


 flags. Specific hyperparameters can be overridden with the 




 and related flags control local and distributed training/evaluation ([distributed training documentation](
## Adding your own components

T2T's components are registered using a central registration mechanism that enables easily adding new ones and easily swapping amongst them by command-line flag. You can add your own components without editing the T2T codebase by specifying the


 flag in 



You can do so for models, hyperparameter sets, modalities, and problems. Please do submit a pull request if your component might be useful to others.

See the [


]( an example user directory.

## Adding a dataset

To add a new dataset, subclass[


]( register it with


. See[


]( an example. Also see the [data generators README](
## Run on FloydHub

[![Run on FloydHub](](

Click this button to open a [Workspace]( on [FloydHub]( You can use the workspace to develop and test your code on a fully configured cloud GPU machine.

Tensor2Tensor comes preinstalled in the environment, you can simply open a [Terminal]( and run your code.

Test the quick-start on a Workspace's Terminal with this command t2t-trainer \ --generate_data \ --data_dir=./t2t_data \ --output_dir=./t2t_train/mnist \ --problem=image_mnist \ --model=shake_shake \ --hparams_set=shake_shake_quick \ --train_steps=1000 \ --eval_steps=100

Note: Ensure compliance with the FloydHub [Terms of Service](

## Papers

When referencing Tensor2Tensor, please cite [this paper](

@article{tensor2tensor, author = {Ashish Vaswani and Samy Bengio and Eugene Brevdo and Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and \L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and Noam Shazeer and Jakob Uszkoreit}, title = {Tensor2Tensor for Neural Machine Translation}, journal = {CoRR}, volume = {abs/1803.07416}, year = {2018}, url = {}, }


Tensor2Tensor was used to develop a number of state-of-the-art models and deep learning methods. Here we list some papers that were based on T2T from the start and benefited from its features and architecture in ways described in the Google Research Blog post introducing T2T.

NOTE: This is not an official Google product.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.