rasa-nlu-benchmark

by nghuyong

Collection of dataset and corresponding benchmark for Rasa NLU

205 Stars 38 Forks Last release: Not found MIT License 101 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

rasa-nlu-benchmark

Collection of dataset and corresponding benchmark for Rasa NLU

GitHub stars GitHub issues GitHub license

Introduction

Rasa NLU is a powerful and open-source natural language processing tool for intent classification and entity extraction in chatbots.

However, we found that there is no published public dataset and the corresponding benchmark. This makes it difficult to evaluate the performance of our own NLU system built by Rasa.

Therefore, this project aims to collect and organize datasets and baselines for Task-Oriented Dialogue, which will be in the data format required by Rasa NLU and you can directly use them in your Rasa NLU system.

Datasets

All the datasets have been organized and archived in the

data
directory

Following information is included for each dataset: - Name - Language - Task - Size(train/test) - Intent/Entity Nums - Link (Website Or Paper)

|Name|Language|Task|Size(Train/Test)|Intent/Entity Nums|Link| |:----:|:----:|:----:|:----:|:----:|:----:| |ATIS|en|Airline Travel Information|4978/893|26/129|more detail| |Snips|en|7 intents, including:AddToPlaylist, BookRestaurant...|13802/699|7/72|more detail| |AskUbuntuCorpus|en|5 intents, questions about Ubuntu|127/35|5/3|more detail| |Facebook Multilingual Task Oriented Dataset|en|3 domains, includeing:alarm,weather,remainder|30521/8621|12/25|more detail | |SMP2019|zh|29 domains, including: app, email...|2063/480|24/62| more detail | |Check flow dataset|zh|13 intents, some request and inform|809/210|13/6|more detail | |MSRA_NER|zh|1 intent, includeing various kinds of news and 3 kinds of entities|20864/4636|1/3|more detail | |ToutiaoNews|zh|7 intent, includeing 7 kinds of news|325279/57409|7/0|more detail |

Note: - For the SMP2019 and CheckFlow dataset, the official does not divide the training set and test set, we have divided according to 8:2 by ourselves.

Benchmark

Baseline Pipeline

Result

Dataset NLU Pipeline Intent Classification Entity Extraction
auc p r f1 auc p r f1
ATIS(en) pretrained_embeddings_spacy 0.91 0.91 0.91 0.91 0.98 0.98 0.98 0.98
supervised_embeddings 1.00 1.00 1.00 1.00 0.98 0.98 0.98 0.98
Snips(en) pretrained_embeddings_spacy 0.99 0.99 0.99 0.99 1.00 1.00 1.00 1.00
supervised_embeddings 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
AskUbuntuCorpus(en) pretrained_embeddings_spacy 0.89 0.89 0.89 0.89 0.95 0.95 0.95 0.95
supervised_embeddings 0.86 0.86 0.86 0.86 0.95 0.95 0.95 0.95
Facebook Multilingual Task Oriented Dataset(en) pretrained_embeddings_spacy 0.96 0.96 0.96 0.96 0.98 0.98 0.98 0.98
supervised_embeddings 0.99 0.99 0.99 0.99 0.98 0.98 0.98 0.98
SMP2019(zh) rasa_nlu_chi 0.76 0.83 0.76 0.78 0.79 0.80 0.79 0.77
CheckFlow(zh) rasa_nlu_chi 0.95 0.95 0.95 0.94 1.00 1.00 1.00 1.00
MSRA_NER(zh) rasa_nlu_chi N/A N/A N/A N/A 0.98 0.98 0.98 0.98

We feather use Rasa official

Comparing NLU Pipelines
tool to compare

pretrained_embeddings_spacy
and
supervised_embeddings
on datasets of
AskUbuntuCorpus
(small size) and
snip
(big size).

图片名称

We can see that when the training data is relatively small,

pretrained_embeddings_spacy
is better, and when the amount of data is sufficient,
supervised_embeddings
will be better.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.