nmtpy

by lium-lst

lium-lst / nmtpy

nmtpy is a Python framework based on dl4mt-tutorial to experiment with Neural Machine Translation pi...

126 Stars 32 Forks Last release: over 3 years ago (v1.0.0) Other 1.5K Commits 1 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

nmtpy

License: MIT

UNMAINTAINED

This codebase is no longer maintained as we moved towards nmtpytorch.


If you use nmtpy, you may want to cite the following paper:

@article{nmtpy2017,
  author    = {Ozan Caglayan and
               Mercedes Garc\'{i}a-Mart\'{i}nez and
               Adrien Bardet and
               Walid Aransa and
               Fethi Bougares and
               Lo\"{i}c Barrault},
  title     = {NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems},
  journal   = {Prague Bull. Math. Linguistics},
  volume    = {109},
  pages     = {15--28},
  year      = {2017},
  url       = {https://ufal.mff.cuni.cz/pbml/109/art-caglayan-et-al.pdf},
  doi       = {10.1515/pralin-2017-0035},
  timestamp = {Tue, 12 Sep 2017 10:01:08 +0100}
}

List of Important Recent Changes

  • Model checkpoints were unnecessarily larger by 30% because of a storing format issue. This is fixed now by https://github.com/lium-lst/nmtpy/commit/0721f34924d23b02caca52e8c3fcbcaafbb4ef41.

Factored NMT

  • attention_factors_seplogits.py
    is removed and its functionality is added to
    attention_factors
    model as a configuration switch:
    sep_h2olayer: True
    .

NMT

  • tied_trg_emb: True/False
    is replaced with
    tied_emb: False/2way/3way
    to also support the sharing of "all" embeddings throughout the network.

Introduction

nmtpy is a suite of Python tools, primarily based on the starter code provided in dl4mt-tutorial for training neural machine translation networks using Theano. The basic motivation behind forking dl4mt-tutorial was to create a framework where it would be easy to implement a new model by just copying and modifying an existing model class (or even inheriting from it and overriding some of its methods).

To achieve this purpose, nmtpy tries to completely isolate training loop, beam search, iteration and model definition: -

nmt-train
script to start a training experiment -
nmt-translate
to produce model-agnostic translations. You just pass a trained model's checkpoint file and it does its job. -
nmt-rescore
to rescore translation hypotheses using an nmtpy model. - An abstract
BaseModel
class to derive from to define your NMT architecture. - An abstract
Iterator
to derive from for your custom iterators.

A non-exhaustive list of differences between nmtpy and dl4mt-tutorial is as follows:

  • No shell script, everything is in Python
  • Overhaul object-oriented refactoring of the code: clear separation of API and scripts that interface with the API
  • INI style configuration files to define everything regarding a training experiment
  • Transparent cleanup mechanism to kill stale processes, remove temporary files
  • Simultaneous logging of training details to stdout and log file
  • Supports out-of-the-box BLEU, METEOR and COCO eval metrics
  • Includes subword-nmt utilities for training and applying BPE model (NOTE: This may change as the upstream subword-nmt moves forward as well.)
  • Plugin-like text filters for hypothesis post-processing (Example: BPE, Compound, Char2Words for Char-NMT)
  • Early-stopping and checkpointing based on perplexity, BLEU or METEOR (Ability to add new metrics easily)
  • Single
    .npz
    file to store everything about a training experiment
  • Automatic free GPU selection and reservation using
    nvidia-smi
  • Shuffling support between epochs:
  • Improved parallel translation decoding on CPU
  • Forced decoding i.e. rescoring using NMT
  • Export decoding informations into
    json
    for further visualization of attention coefficients
  • Improved numerical stability and reproducibility
  • Glorot/Xavier, He, Orthogonal weight initializations
  • Efficient SGD, Adadelta, RMSProp and ADAM: Single forward/backward theano function without intermediate variables
  • Ability to stop updating a set of weights by recompiling optimizer
  • Several recurrent blocks:
    • GRU, Conditional GRU (CGRU) and LSTM
    • Multimodal attentive CGRU variants
  • Layer Normalization support for GRU
  • 2-way or 3-way tied target embeddings
  • Simple/Non-recurrent Dropout, L2 weight decay
  • Training and validation loss normalization for comparable perplexities
  • Initialization of a model with a pretrained NMT for further finetuning

Models

It is advised to check the actual model implementations for the most up-to-date informations as what is written may become outdated.

Attentional NMT:
attention.py

This is the basic attention based NMT from

dl4mt-tutorial
improved in different ways: - 3 forward dropout layers after source embeddings, source context and before softmax managed by the configuration parameters
emb_dropout, ctx_dropout, out_dropout
. - Layer normalization for source encoder (
layer_norm:True|False
) - Tied embeddings (
tied_emb:False|2way|3way
)

This model uses the simple

BitextIterator
i.e. it directly reads plain parallel text files as defined in the experiment configuration file. Please see this monomodal example for usage.

Multimodal NMT / Image Captioning:
fusion*py

These

fusion
models derived from
attention.py
and
basefusion.py
implement several multimodal NMT / Image Captioning architectures detailed in the following papers:

Caglayan, Ozan, et al. "Does Multimodality Help Human and Machine for Translation and Image Captioning?." arXiv preprint arXiv:1605.09186 (2016).

Caglayan, Ozan, Loïc Barrault, and Fethi Bougares. "Multimodal Attention for Neural Machine Translation." arXiv preprint arXiv:1609.03976 (2016).

The models are separated into 8 files implementing their own multimodal CGRU differing in the way the attention is formulated in the decoder (4 ways) x the way the multimodal contexts are fusioned (2 ways: SUM/CONCAT). These models also use a different data iterator, namely

WMTIterator
that requires converting the textual data into
.pkl
as in the multimodal example.

The

WMTIterator
only knows how to handle the ResNet-50 convolutional features that we provide in the examples page. If you would like to use FC-style fixed-length vectors or other types of multimodal features, you need to write your own iterator.

Factored NMT:
attention_factors.py

The model file

attention_factors.py
corresponds to the following paper:

García-Martínez, Mercedes, Loïc Barrault, and Fethi Bougares. "Factored Neural Machine Translation." arXiv preprint arXiv:1609.04621 (2016).

In the examples folder of this repository, you can find data and a configuration file to run this model.

RNNLM:
rnnlm.py

This is a basic recurrent language model to be used with

nmt-test-lm
utility.

Requirements

You need the following Python libraries installed in order to use nmtpy: - numpy - Theano >= 0.9

  • We recommend using Anaconda Python distribution which is equipped with Intel MKL (Math Kernel Library) greatly improving CPU decoding speeds during beam search. With a correct compilation and installation, you should achieve similar performance with OpenBLAS as well but the setup procedure may be difficult to follow for inexperienced ones.
  • nmtpy only supports Python 3.5+, please see pythonclock.org
  • Please note that METEOR requires a Java runtime so
    java
    should be in your
    $PATH
    .

Additional data for METEOR

Before installing nmtpy, you need to run

scripts/get-meteor-data.sh
to download METEOR paraphrase files.

Installation

$ python setup.py install

Note: When you add a new model under

models/
it will not be directly available in runtime as it needs to be installed as well. To avoid re-installing each time, you can use development mode with
python setup.py develop
which will directly make Python see the
git
folder as the library content.

Ensuring Reproducibility in Theano

(Update: Theano 1.0 includes a configuration option

deterministic = more
that obsoletes the below patch.)

When we started to work on dl4mt-tutorial, we noticed an annoying reproducibility problem where multiple runs of the same experiment (same seed, same machine, same GPU) were not producing exactly the same training and validation losses after a few iterations.

The solution that was discussed in Theano issues was to replace a non-deterministic GPU operation with its deterministic equivalent. To achieve this, you should patch your local Theano v0.9.0 installation using this patch unless upstream developers add a configuration option to

.theanorc
.

Configuring Theano

Here is a basic

.theanorc
file (Note that the way you install CUDA, CuDNN may require some modifications):
[global]
# Not so important as nmtpy will pick an available GPU
device = gpu0
# We use float32 everywhere
floatX = float32
# Keep theano compilation in RAM if you have a 7/24 available server
base_compiledir=/tmp/theano-%(user)s
# For Theano >= 0.10, if you want exact same results for each run
# with same seed
deterministic=more

[cuda] root = /opt/cuda-8.0

[dnn]

Make sure you use CuDNN as well

enabled = auto library_path = /opt/CUDNN/cudnn-v5.1/lib64 include_path = /opt/CUDNN/cudnn-v5.1/include

[lib]

Allocate 95% of GPU memory once

cnmem = 0.95

You may also want to try the new GPU backend after installing libgpuarray. In order to do so, pass

GPUARRAY=1
into the environment when running
nmt-train
:
$ GPUARRAY=1 nmt-train -c  ...

Checking BLAS configuration

Recent Theano versions can automatically detect correct MKL flags. You should obtain a similar output after running the following command:

$ python -c 'import theano; print theano.config.blas.ldflags'
-L/home/ozancag/miniconda/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -lm -Wl,-rpath,/home/ozancag/miniconda/lib

Acknowledgements

nmtpy includes code from the following projects:

See LICENSE file for license information.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.