Collection of dataset and corresponding benchmark for Rasa NLU
Collection of dataset and corresponding benchmark for Rasa NLU
Rasa NLU is a powerful and open-source natural language processing tool for intent classification and entity extraction in chatbots.
However, we found that there is no published public dataset and the corresponding benchmark. This makes it difficult to evaluate the performance of our own NLU system built by Rasa.
Therefore, this project aims to collect and organize datasets and baselines for Task-Oriented Dialogue, which will be in the data format required by Rasa NLU and you can directly use them in your Rasa NLU system.
All the datasets have been organized and archived in the
datadirectory
Following information is included for each dataset: - Name - Language - Task - Size(train/test) - Intent/Entity Nums - Link (Website Or Paper)
|Name|Language|Task|Size(Train/Test)|Intent/Entity Nums|Link| |:----:|:----:|:----:|:----:|:----:|:----:| |ATIS|en|Airline Travel Information|4978/893|26/129|more detail| |Snips|en|7 intents, including:AddToPlaylist, BookRestaurant...|13802/699|7/72|more detail| |AskUbuntuCorpus|en|5 intents, questions about Ubuntu|127/35|5/3|more detail| |Facebook Multilingual Task Oriented Dataset|en|3 domains, includeing:alarm,weather,remainder|30521/8621|12/25|more detail | |SMP2019|zh|29 domains, including: app, email...|2063/480|24/62| more detail | |Check flow dataset|zh|13 intents, some request and inform|809/210|13/6|more detail | |MSRA_NER|zh|1 intent, includeing various kinds of news and 3 kinds of entities|20864/4636|1/3|more detail | |ToutiaoNews|zh|7 intent, includeing 7 kinds of news|325279/57409|7/0|more detail |
Note: - For the SMP2019 and CheckFlow dataset, the official does not divide the training set and test set, we have divided according to 8:2 by ourselves.
pretrained_embeddings_spacyand
supervised_embeddingsas baseline NLU pipeline.
rasa_nlu_chias baseline NLU pipeline.
Dataset | NLU Pipeline | Intent Classification | Entity Extraction | ||||||
auc | p | r | f1 | auc | p | r | f1 | ||
ATIS(en) | pretrained_embeddings_spacy | 0.91 | 0.91 | 0.91 | 0.91 | 0.98 | 0.98 | 0.98 | 0.98 |
supervised_embeddings | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 0.98 | 0.98 | 0.98 | |
Snips(en) | pretrained_embeddings_spacy | 0.99 | 0.99 | 0.99 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 |
supervised_embeddings | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
AskUbuntuCorpus(en) | pretrained_embeddings_spacy | 0.89 | 0.89 | 0.89 | 0.89 | 0.95 | 0.95 | 0.95 | 0.95 |
supervised_embeddings | 0.86 | 0.86 | 0.86 | 0.86 | 0.95 | 0.95 | 0.95 | 0.95 | |
Facebook Multilingual Task Oriented Dataset(en) | pretrained_embeddings_spacy | 0.96 | 0.96 | 0.96 | 0.96 | 0.98 | 0.98 | 0.98 | 0.98 |
supervised_embeddings | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.98 | 0.98 | 0.98 | |
SMP2019(zh) | rasa_nlu_chi | 0.76 | 0.83 | 0.76 | 0.78 | 0.79 | 0.80 | 0.79 | 0.77 |
CheckFlow(zh) | rasa_nlu_chi | 0.95 | 0.95 | 0.95 | 0.94 | 1.00 | 1.00 | 1.00 | 1.00 |
MSRA_NER(zh) | rasa_nlu_chi | N/A | N/A | N/A | N/A | 0.98 | 0.98 | 0.98 | 0.98 |
Comparing NLU Pipelinestool to compare
pretrained_embeddings_spacyand
supervised_embeddingson datasets of
AskUbuntuCorpus(small size) and
snip(big size).
We can see that when the training data is relatively small,
pretrained_embeddings_spacyis better, and when the amount of data is sufficient,
supervised_embeddingswill be better.