Need help with ToTTo?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

google-research-datasets
232 Stars 20 Forks 60 Commits 2 Opened issues

Description

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. We hope it can serve as a useful research benchmark for high-precision conditional text generation.

Services available

!
?

Need anything else?

Contributors list

# 24,105
Perl
neural-...
machine...
pytorch
7 commits

ToTTo Dataset

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.

During the dataset creation process, tables from English Wikipedia are matched with (noisy) descriptions. Each table cell mentioned in the description is highlighted and the descriptions are iteratively cleaned and corrected to faithfully reflect the content of the highlighted cells.

We hope this dataset can serve as a useful research benchmark for high-precision conditional text generation.

You can find more details, analyses, and baseline results in our paper. You can cite it as follows:

@inproceedings{parikh2020totto,
  title={{ToTTo}: A Controlled Table-To-Text Generation Dataset},
  author={Parikh, Ankur P and Wang, Xuezhi and Gehrmann, Sebastian and Faruqui, Manaal and Dhingra, Bhuwan and Yang, Diyi and Das, Dipanjan},
  booktitle={Proceedings of EMNLP},
  year={2020}
 }

Getting Started

Download the ToTTo data

The ToTTo dataset is released under the Creative Commons Share-Alike 3.0 license.

To download the data from the command line:

 wget https://storage.googleapis.com/totto/totto_data.zip
 unzip totto_data.zip
(or alternatively copy the above url into your browser address bar.)

Inside the

totto_data
directory you should see three files:
totto_train_data.jsonl
,
totto_dev_data.jsonl
, and
unlabeled_totto_test_data.jsonl
for the training, development, and unlabeled test sets respectively.

Download the evaluation scripts

You can find evaluation scripts and some exploratory processing scripts at this repository. It also includes a separate README file with instruction on how to run the evaluation.

Dataset Description

The ToTTo dataset consists of three

.jsonl
files, where each line is a JSON dictionary with the following format:
{
  "table_page_title": "'Weird Al' Yankovic",
  "table_webpage_url": "https://en.wikipedia.org/wiki/%22Weird_Al%22_Yankovic",
  "table_section_title": "Television",
  "table_section_text": "",
  "table": "[Described below]",
  "highlighted_cells": [[22, 2], [22, 3], [22, 0], [22, 1], [23, 3], [23, 1], [23, 0]],
  "example_id": 12345678912345678912,
  "sentence_annotations": [{"original_sentence": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Mr. Peanutbutter's brother, Captain Peanutbutter, and was hired to voice the lead role in the 2016 Disney XD series Milo Murphy's Law.",
                  "sentence_after_deletion": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter, and was hired to the lead role in the 2016 series Milo Murphy's Law.",
                  "sentence_after_ambiguity": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter, and was hired for the lead role in the 2016 series Milo Murphy's 'Law.",
                  "final_sentence": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter and was hired for the lead role in the 2016 series Milo Murphy's Law."}],
}

The

table
field is a
List[List[Dict]]
. The outer lists represents rows and the inner lists columns. Each
Dict
has the fields
column_span: int
,
is_header: bool
,
row_span: int
, and
value: str
. The first two rows for the example above look as follows:
[
  [
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Year"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Title"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Role"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Notes"}
  ],
  [
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "1997"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Eek! The Cat"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Himself"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Episode: 'The FugEektive'"}
  ], ...
]

-The table metadata consists of the

table_page_title
,
table_section_title
, and
table_section_text
strings to help give the model more context about the table.

-The

highlighted_cells
field is a
List[[row_index, column_index]]
where each
[row_index, column_index]
pair indicates that
table[row_index][column_index]
is highlighted.

-The

example_id
is simply a unique id for this example.

-The

sentence_annotations
field consists of the
original sentence
and the sequence of revised sentences performed in order to produce the
final_sentence
. See our paper for more details.

To help understand the dataset, you can find a sample of the train and dev sets in the

sample/
folder of our supplementary repository. It additionally provides the
create_table_to_text_html.py
script that visualizes examples, the output of which you can also find in the
sample/
folder.

Official Task

The official task described in our paper is given the

table
,
highlighted cells
, and table metadata (
table_page_title
,
table_section_title
, and
table_section_text
) as input, to generate the
final_sentence
.

Dev and Test Set

The dev and test set have between two and three references for each example, which are added to the list at the

sentence_annotations
key. The test set annotations are private and thus not included in the data.

If you want us to evaluate your model on the development or the private test set, please submit your files here. You can find more submission information below. By emailing us or by submitting prediction files, you consent to being contacted by Google about your submission, this dataset or any related competitions.

We provide two splits within the dev and test sets - one uses previously seen combinations of table headers and one uses unseen combinations. The sets are marked using the

overlap_subset: bool
flag that is added to the JSON representation. By filtering the evaluation to examples with the flag set to
true
, you will be able to test the generalization ability of your model.

Leaderboard

We are maintaining a leaderboard with official results on our test set.

The leaderboard indicates whether or not a model was trained on any auxiliary Wikipedia data. This is because our tables and (unrevised) test targets are from Wikipedia and thus we would like to study the effect of using additional Wikipedia data to train models.

We ask you to not incorporate any part of the ToTTo development set into the training data, and only use it for validation/hyperparameter tuning as development sets are typically used.

In addition to BLEU and PARENT, we also report a learnt metric BLEURT. The checkpoint used was BLEURT-base-128 which can be found here. To handle multiple references, we take the average of the scores as suggested by Sellam et al. 2020.

Overall Overlap Subset Non-Overlap Subset
Model Link Uses Wiki BLEU PARENT BLEURT BLEU PARENT BLEURT BLEU PARENT BLEURT
Anonymous Paper in preparation yes 49.2 58.7 0.249 56.9 62.8 0.371 41.5 54.6 0.126
T5-based (Google) [Kale, 2020] yes 49.5 58.4 0.230 57.5 62.6 0.351 41.4 54.2 0.1079
BERT-to-BERT (Wiki+Books) [Rothe et al., 2019] yes 44.0 52.6 0.121 52.7 58.4 0.259 35.1 46.8 -0.017
BERT-to-BERT (Books) [Rothe et al., 2019] no 43.9 52.6 0.104 52.7 58.4 0.255 34.8 46.7 -0.046
Pointer Generator [See et al., 2017] no 41.6 51.6 0.076 50.6 58.0 0.244 32.2 45.2 -0.0922
Content Planner [Puduppully et al., 2019] no 19.2 29.2 -0.576 24.5 32.5 -0.491 13.9 25.8 -0.662

Leaderboard Submission

If you want to submit dev and test outputs, please format your predictions as a single

.txt
file with line-separated predictions. The predictions should be in the same order as the examples in the
test.jsonl
file. You can upload your prediction files here. If you run into any issues, you can contact us at [email protected] By emailing us or by submitting prediction files, you consent to being contacted by Google about your submission, this dataset or any related competitions.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.