Need help with Im2Text?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

OpenNMT
134 Stars 24 Forks MIT License 29 Commits 5 Opened issues

Description

Im2Text extension to OpenNMT

Services available

!
?

Need anything else?

Contributors list

# 31,488
Lua
neural-...
pytorch
ocr-rec...
22 commits

Im2Text

A deep learning-based approach to learning the image-to-text conversion, built on top of the OpenNMT system. It is completely data-driven, hence can be used for a variety of image-to-text problems, such as image captioning, optical character recognition and LaTeX decompilation.

Take LaTeX decompilation as an example, given a formula image:

The goal is to infer the LaTeX source that can be compiled to such an image:

 d s _ { 1 1 } ^ { 2 } = d x ^ { + } d x ^ { - } + l _ { p } ^ { 9 } \frac { p _ { - } } { r ^ { 7 } } \delta ( x ^ { - } ) d x ^ { - } d x ^ { - } + d x _ { 1 } ^ { 2 } + \; \cdots \; + d x _ { 9 } ^ { 2 } 

The paper (http://arxiv.org/pdf/1609.04938v1.pdf) provides more technical details of this model.

Installation

Im2Text is built on top of OpenNMT, which is packed in this project. It also depends on

tds
,
class
,
cudnn
,
cutorch
and
paths
. Currently we only support GPU.

Quick Start

To get started, we provide a toy Math-to-LaTex example. We assume that the working directory is

Im2Text
throughout this document.

Im2Text consists of two commands:

1) Train the model.

th src/train.lua -phase train -gpu_id 1 -input_feed -model_dir model \
-image_dir data/images -data_path data/train.txt -val_data_path data/validate.txt -label_path data/labels.txt -vocab_file data/vocab.txt \
-batch_size 20 -beam_size 1 \
-max_num_tokens 150 -max_image_width 500 -max_image_height 160 \
-max_grad_norm 20.0 -learning_rate 0.1 -decay perplexity_only

2) Translate the images.

th src/train.lua -phase test -gpu_id 1 -load_model -model_dir model \
-image_dir data/images -data_path data/test.txt \
-output_dir results \
-batch_size 2 -beam_size 5 \
-max_num_tokens 500 -max_image_width 800 -max_image_height 800

The above dataset is sampled from the processed-im2latex-100k-dataset. We provide a trained model [link] on this dataset. In order to use it, download and put it under

model_dir
before translating the images.

Data Format

  • -image_dir
    : The directory containing the images. Since images of the same size can be batched together, we suggest padding images of similar sizes to the same size in order to facilitate training.
  • -label_path
    : The file storing the tokenized labels, one label per line. It shall look like:
      ... 
      ... 
      ... 
    ...
    
  • -data_path
    : The file storing the image-label pairs. Each line starts with the path of the image (relative to
    image_dir
    ), followed by the index of the label in
    label_path
    (index counts from 0). At test time, the label indexes can be omitted.
     
     
     
    ...
    
  • -vocab_file
    : The vocabulary file. Each line corresponds to a token. The tokens not in
    vocab_file
    will be considered unknown (UNK).

Options

For a complete set of options, run

th src/train.lua -h
.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.