nmtpy is a Python framework based on dl4mt-tutorial to experiment with Neural Machine Translation pipelines.
This codebase is no longer maintained as we moved towards nmtpytorch.
If you use nmtpy, you may want to cite the following paper:
@article{nmtpy2017, author = {Ozan Caglayan and Mercedes Garc\'{i}a-Mart\'{i}nez and Adrien Bardet and Walid Aransa and Fethi Bougares and Lo\"{i}c Barrault}, title = {NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems}, journal = {Prague Bull. Math. Linguistics}, volume = {109}, pages = {15--28}, year = {2017}, url = {https://ufal.mff.cuni.cz/pbml/109/art-caglayan-et-al.pdf}, doi = {10.1515/pralin-2017-0035}, timestamp = {Tue, 12 Sep 2017 10:01:08 +0100} }
attention_factors_seplogits.pyis removed and its functionality is added to
attention_factorsmodel as a configuration switch:
sep_h2olayer: True.
tied_trg_emb: True/Falseis replaced with
tied_emb: False/2way/3wayto also support the sharing of "all" embeddings throughout the network.
nmtpy is a suite of Python tools, primarily based on the starter code provided in dl4mt-tutorial for training neural machine translation networks using Theano. The basic motivation behind forking dl4mt-tutorial was to create a framework where it would be easy to implement a new model by just copying and modifying an existing model class (or even inheriting from it and overriding some of its methods).
To achieve this purpose, nmtpy tries to completely isolate training loop, beam search, iteration and model definition: -
nmt-trainscript to start a training experiment -
nmt-translateto produce model-agnostic translations. You just pass a trained model's checkpoint file and it does its job. -
nmt-rescoreto rescore translation hypotheses using an nmtpy model. - An abstract
BaseModelclass to derive from to define your NMT architecture. - An abstract
Iteratorto derive from for your custom iterators.
A non-exhaustive list of differences between nmtpy and dl4mt-tutorial is as follows:
.npzfile to store everything about a training experiment
nvidia-smi
jsonfor further visualization of attention coefficients
It is advised to check the actual model implementations for the most up-to-date informations as what is written may become outdated.
attention.py
This is the basic attention based NMT from
dl4mt-tutorialimproved in different ways: - 3 forward dropout layers after source embeddings, source context and before softmax managed by the configuration parameters
emb_dropout, ctx_dropout, out_dropout. - Layer normalization for source encoder (
layer_norm:True|False) - Tied embeddings (
tied_emb:False|2way|3way)
This model uses the simple
BitextIteratori.e. it directly reads plain parallel text files as defined in the experiment configuration file. Please see this monomodal example for usage.
fusion*py
These
fusionmodels derived from
attention.pyand
basefusion.pyimplement several multimodal NMT / Image Captioning architectures detailed in the following papers:
The models are separated into 8 files implementing their own multimodal CGRU differing in the way the attention is formulated in the decoder (4 ways) x the way the multimodal contexts are fusioned (2 ways: SUM/CONCAT). These models also use a different data iterator, namely
WMTIteratorthat requires converting the textual data into
.pklas in the multimodal example.
The
WMTIteratoronly knows how to handle the ResNet-50 convolutional features that we provide in the examples page. If you would like to use FC-style fixed-length vectors or other types of multimodal features, you need to write your own iterator.
attention_factors.py
The model file
attention_factors.pycorresponds to the following paper:
In the examples folder of this repository, you can find data and a configuration file to run this model.
rnnlm.py
This is a basic recurrent language model to be used with
nmt-test-lmutility.
You need the following Python libraries installed in order to use nmtpy: - numpy - Theano >= 0.9
javashould be in your
$PATH.
Before installing nmtpy, you need to run
scripts/get-meteor-data.shto download METEOR paraphrase files.
$ python setup.py install
Note: When you add a new model under
models/it will not be directly available in runtime as it needs to be installed as well. To avoid re-installing each time, you can use development mode with
python setup.py developwhich will directly make Python see the
gitfolder as the library content.
(Update: Theano 1.0 includes a configuration option
deterministic = morethat obsoletes the below patch.)
When we started to work on dl4mt-tutorial, we noticed an annoying reproducibility problem where multiple runs of the same experiment (same seed, same machine, same GPU) were not producing exactly the same training and validation losses after a few iterations.
The solution that was discussed in Theano issues was to replace a non-deterministic GPU operation with its deterministic equivalent. To achieve this, you should patch your local Theano v0.9.0 installation using this patch unless upstream developers add a configuration option to
.theanorc.
Here is a basic
.theanorcfile (Note that the way you install CUDA, CuDNN may require some modifications):
[global] # Not so important as nmtpy will pick an available GPU device = gpu0 # We use float32 everywhere floatX = float32 # Keep theano compilation in RAM if you have a 7/24 available server base_compiledir=/tmp/theano-%(user)s # For Theano >= 0.10, if you want exact same results for each run # with same seed deterministic=more[cuda] root = /opt/cuda-8.0
[dnn]
Make sure you use CuDNN as well
enabled = auto library_path = /opt/CUDNN/cudnn-v5.1/lib64 include_path = /opt/CUDNN/cudnn-v5.1/include
[lib]
Allocate 95% of GPU memory once
cnmem = 0.95
You may also want to try the new GPU backend after installing libgpuarray. In order to do so, pass
GPUARRAY=1into the environment when running
nmt-train:
$ GPUARRAY=1 nmt-train -c ...
Recent Theano versions can automatically detect correct MKL flags. You should obtain a similar output after running the following command:
$ python -c 'import theano; print theano.config.blas.ldflags' -L/home/ozancag/miniconda/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -lm -Wl,-rpath,/home/ozancag/miniconda/lib
nmtpy includes code from the following projects:
multi-bleu.perlfrom mosesdecoder
pycocoevalcapfrom coco-caption
See LICENSE file for license information.