Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
T2T was developed by researchers and engineers in the Google Brain team and a community of users. It is now deprecated — we keep it running and welcome bug-fixes, but encourage users to use the successor library Trax.
This iPython notebook explains T2T and runs in your browser using a free VM from Google, no installation needed. Alternatively, here is a one-command version that installs T2T, downloads MNIST, trains a model and evaluates it:
pip install tensor2tensor && t2t-trainer \ --generate_data \ --data_dir=~/t2t_data \ --output_dir=~/t2t_train/mnist \ --problem=image_mnist \ --model=shake_shake \ --hparams_set=shake_shake_quick \ --train_steps=1000 \ --eval_steps=100
Below we list a number of tasks that can be solved with T2T when you train the appropriate model on the appropriate problem. We give the problem and model below and we suggest a setting of hyperparameters that we know works well in our setup. We usually run either on Cloud TPUs or on 8-GPU machines; you might need to modify the hyperparameters if you run on a different setup.
For evaluating mathematical expressions at the character level involving addition, subtraction and multiplication of both positive and negative decimal numbers with variable digits assigned to symbolic variables, use
--problem=algorithmic_math_two_variables
You can try solving the problem with different transformer models and hyperparameters as described in the paper: * Standard transformer:
--model=transformer
--hparams_set=transformer_tiny* Universal transformer:
--model=universal_transformer
--hparams_set=universal_transformer_tiny* Adaptive universal transformer:
--model=universal_transformer
--hparams_set=adaptive_universal_transformer_tiny
For answering questions based on a story, use
--problem=babi_qa_concat_task1_1k
You can choose the bAbi task from the range [1,20] and the subset from 1k or 10k. To combine test data from all tasks into a single test set, use
--problem=babi_qa_concat_all_tasks_10k
For image classification, we have a number of standard data-sets:
--problem=image_imagenet, or one of the re-scaled versions (
image_imagenet224,
image_imagenet64,
image_imagenet32)
--problem=image_cifar10(or
--problem=image_cifar10_plainto turn off data augmentation)
--problem=image_cifar100
--problem=image_mnist
For ImageNet, we suggest to use the ResNet or Xception, i.e., use
--model=resnet --hparams_set=resnet_50or
--model=xception --hparams_set=xception_base. Resnet should get to above 76% top-1 accuracy on ImageNet.
For CIFAR and MNIST, we suggest to try the shake-shake model:
--model=shake_shake --hparams_set=shakeshake_big. This setting trained for
--train_steps=700000should yield close to 97% accuracy on CIFAR-10.
For (un)conditional image generation, we have a number of standard data-sets:
--problem=img2img_celebafor image-to-image translation, namely, superresolution from 8x8 to 32x32.
--problem=image_celeba256_revfor a downsampled 256x256.
--problem=image_cifar10_plain_gen_revfor class-conditional 32x32 generation.
--problem=image_lsun_bedrooms_rev
--problem=image_text_ms_coco_revfor text-to-image generation.
--problem=image_imagenet32_gen_revfor 32x32 or
--problem=image_imagenet64_gen_revfor 64x64.
We suggest to use the Image Transformer, i.e.,
--model=imagetransformer, or the Image Transformer Plus, i.e.,
--model=imagetransformerppthat uses discretized mixture of logistics, or variational auto-encoder, i.e.,
--model=transformer_ae. For CIFAR-10, using
--hparams_set=imagetransformer_cifar10_baseor
--hparams_set=imagetransformer_cifar10_base_dmolyields 2.90 bits per dimension. For Imagenet-32, using
--hparams_set=imagetransformer_imagenet32_baseyields 3.77 bits per dimension.
For language modeling, we have these data-sets in T2T:
--problem=languagemodel_ptb10kfor word-level modeling and
--problem=languagemodel_ptb_charactersfor character-level modeling.
--problem=languagemodel_lm1b32kfor subword-level modeling and
--problem=languagemodel_lm1b_charactersfor character-level modeling.
We suggest to start with
--model=transformeron this task and use
--hparams_set=transformer_smallfor PTB and
--hparams_set=transformer_basefor LM1B.
For the task of recognizing the sentiment of a sentence, use
--problem=sentiment_imdb
We suggest to use
--model=transformer_encoderhere and since it is a small data-set, try
--hparams_set=transformer_tinyand train for few steps (e.g.,
--train_steps=2000).
For speech-to-text, we have these data-sets in T2T:
Librispeech (US English):
--problem=librispeechfor the whole set and
--problem=librispeech_cleanfor a smaller but nicely filtered part.
Mozilla Common Voice (US English):
--problem=common_voicefor the whole set
--problem=common_voice_cleanfor a quality-checked subset.
For summarizing longer text into shorter one we have these data-sets:
--problem=summarize_cnn_dailymail32k
We suggest to use
--model=transformerand
--hparams_set=transformer_prependfor this task. This yields good ROUGE scores.
There are a number of translation data-sets in T2T:
--problem=translate_ende_wmt32k
--problem=translate_enfr_wmt32k
--problem=translate_encs_wmt32k
--problem=translate_enzh_wmt32k
--problem=translate_envi_iwslt32k
--problem=translate_enes_wmt32k
You can get translations in the other direction by appending
_revto the problem name, e.g., for German-English use
--problem=translate_ende_wmt32k_rev(note that you still need to download the original data with t2t-datagen
--problem=translate_ende_wmt32k).
For all translation problems, we suggest to try the Transformer model:
--model=transformer. At first it is best to try the base setting,
--hparams_set=transformer_base. When trained on 8 GPUs for 300K steps this should reach a BLEU score of about 28 on the English-German data-set, which is close to state-of-the art. If training on a single GPU, try the
--hparams_set=transformer_base_single_gpusetting. For very good results or larger data-sets (e.g., for English-French), try the big model with
--hparams_set=transformer_big.
See this example to know how the translation works.
Here's a walkthrough training a good English-to-German translation model using the Transformer model from Attention Is All You Need on WMT data.
pip install tensor2tensorSee what problems, models, and hyperparameter sets are available.
You can easily swap between them (and add new ones).
t2t-trainer --registry_help
PROBLEM=translate_ende_wmt32k MODEL=transformer HPARAMS=transformer_base_single_gpu
DATA_DIR=$HOME/t2t_data TMP_DIR=/tmp/t2t_datagen TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS
mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
Generate data
t2t-datagen
--data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--problem=$PROBLEMTrain
* If you run out of memory, add --hparams='batch_size=1024'.
t2t-trainer
--data_dir=$DATA_DIR
--problem=$PROBLEM
--model=$MODEL
--hparams_set=$HPARAMS
--output_dir=$TRAIN_DIRDecode
DECODE_FILE=$DATA_DIR/decode_this.txt echo "Hello world" >> $DECODE_FILE echo "Goodbye world" >> $DECODE_FILE echo -e 'Hallo Welt\nAuf Wiedersehen Welt' > ref-translation.de
BEAM_SIZE=4 ALPHA=0.6
t2t-decoder
--data_dir=$DATA_DIR
--problem=$PROBLEM
--model=$MODEL
--hparams_set=$HPARAMS
--output_dir=$TRAIN_DIR
--decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA"
--decode_from_file=$DECODE_FILE
--decode_to_file=translation.enSee the translations
cat translation.en
Evaluate the BLEU score
Note: Report this BLEU score in papers, not the internal approx_bleu metric.
t2t-bleu --translation=translation.en --reference=ref-translation.de
# Assumes tensorflow or tensorflow-gpu installed pip install tensor2tensorInstalls with tensorflow-gpu requirement
pip install tensor2tensor[tensorflow_gpu]
Installs with tensorflow (cpu) requirement
pip install tensor2tensor[tensorflow]
Binaries:
# Data generator t2t-datagenTrainer
t2t-trainer --registry_help
Library usage:
python -c "from tensor2tensor.models.transformer import Transformer"
bottomand
toptransformations, which are specified per-feature in the model.
t2t-datagenand the training script
t2t-trainer.
Problems consist of features such as inputs and targets, and metadata such as each feature's modality (e.g. symbol, image, audio) and vocabularies. Problem features are given by a dataset, which is stored as a
TFRecordfile with
tensorflow.Exampleprotocol buffers. All problems are imported in
all_problems.pyor are registered with
@registry.register_problem. Run
t2t-datagento see the list of available problems and download them.
T2TModels define the core tensor-to-tensor computation. They apply a default transformation to each input and output so that models may deal with modality-independent tensors (e.g. embeddings at the input; and a linear transform at the output to produce logits for a softmax over classes). All models are imported in the
modelssubpackage, inherit from
T2TModel, and are registered with
@registry.register_model.
Hyperparameter sets are encoded in
HParamsobjects, and are registered with
@registry.register_hparams. Every model and problem has a
HParams. A basic set of hyperparameters are defined in
common_hparams.pyand hyperparameter set functions can compose other hyperparameter set functions.
The trainer binary is the entrypoint for training, evaluation, and inference. Users can easily switch between problems, models, and hyperparameter sets by using the
--model,
--problem, and
--hparams_setflags. Specific hyperparameters can be overridden with the
--hparamsflag.
--scheduleand related flags control local and distributed training/evaluation (distributed training documentation).
T2T's components are registered using a central registration mechanism that enables easily adding new ones and easily swapping amongst them by command-line flag. You can add your own components without editing the T2T codebase by specifying the
--t2t_usr_dirflag in
t2t-trainer.
You can do so for models, hyperparameter sets, modalities, and problems. Please do submit a pull request if your component might be useful to others.
example_usr_dirfor an example user directory.
To add a new dataset, subclass
Problemand register it with
@registry.register_problem. See
TranslateEndeWmt8kfor an example. Also see the data generators README.
Click this button to open a Workspace on FloydHub. You can use the workspace to develop and test your code on a fully configured cloud GPU machine.
Tensor2Tensor comes preinstalled in the environment, you can simply open a Terminal and run your code.
# Test the quick-start on a Workspace's Terminal with this command t2t-trainer \ --generate_data \ --data_dir=./t2t_data \ --output_dir=./t2t_train/mnist \ --problem=image_mnist \ --model=shake_shake \ --hparams_set=shake_shake_quick \ --train_steps=1000 \ --eval_steps=100
Note: Ensure compliance with the FloydHub Terms of Service.
When referencing Tensor2Tensor, please cite this paper.
@article{tensor2tensor, author = {Ashish Vaswani and Samy Bengio and Eugene Brevdo and Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and \L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and Noam Shazeer and Jakob Uszkoreit}, title = {Tensor2Tensor for Neural Machine Translation}, journal = {CoRR}, volume = {abs/1803.07416}, year = {2018}, url = {http://arxiv.org/abs/1803.07416}, }
Tensor2Tensor was used to develop a number of state-of-the-art models and deep learning methods. Here we list some papers that were based on T2T from the start and benefited from its features and architecture in ways described in the Google Research Blog post introducing T2T.
NOTE: This is not an official Google product.