KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
This is the dataset repository for our paper KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding.
We introduce KorNLI and KorSTS, which are NLI and STS datasets in Korean.
| KorNLI | Total | Train | Dev. | Test | | -------------------------- | ------- | ---------- | ----- | ----- | | Source | - | SNLI, MNLI | XNLI | XNLI | | Translated by | - | Machine | Human | Human | | # Examples | 950,354 | 942,854 | 2,490 | 5,010 | | Avg. # words (premise) | 13.6 | 13.6 | 13.0 | 13.1 | | Avg. # words (hypothesis) | 7.1 | 7.2 | 6.8 | 6.8 |
| Example | English Translation | Label |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------- |
| P: 저는, 그냥 알아내려고 거기 있었어요.
H: 이해하려고 노력하고 있었어요. | I was just there just trying to figure it out.
I was trying to understand. | Entailment |
| P: 저는, 그냥 알아내려고 거기 있었어요.
H: 나는 처음부터 그것을 잘 이해했다. | I was just there just trying to figure it out.
I understood it well from the beginning. | Contradiction |
| P: 저는, 그냥 알아내려고 거기 있었어요.
H: 나는 돈이 어디로 갔는지 이해하려고 했어요. | I was just there just trying to figure it out.
I was trying to understand where the money went. | Neutral |
| KorSTS | Total | Train | Dev. | Test | | ------------- | ----- | ------- | ----- | ----- | | Source | - | STS-B | STS-B | STS-B | | Translated by | - | Machine | Human | Human | | # Examples | 8,628 | 5,749 | 1,500 | 1,379 | | Avg. # words | 7.7 | 7.5 | 8.7 | 7.6 |
| Example | English Translation | Label |
| ------------------------------------------------------------ | -------------------------------------------------------- | ----- |
| 한 남자가 음식을 먹고 있다.
한 남자가 뭔가를 먹고 있다. | A man is eating food.
A man is eating something. | 4.2 |
| 한 비행기가 착륙하고 있다.
애니메이션화된 비행기 하나가 착륙하고 있다. | A plane is landing.
A animated airplane is landing. | 2.8 |
| 한 여성이 고기를 요리하고 있다.
한 남자가 말하고 있다. | A woman is cooking meat.
A man is speaking. | 0.0 |
Creative Commons Attribution-ShareAlike license (CC BY-SA 4.0)
If you use KorNLI or KorSTS for research, please cite our paper:
bibtex @article{ham2020kornli, title={KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding}, author={Ham, Jiyeon and Choe, Yo Joong and Park, Kyubyong and Choi, Ilji and Soh, Hyungjoon}, journal={arXiv preprint arXiv:2004.03289}, year={2020} }