Repository to track the progress in Vietnamese Natural Language Processing, including the datasets and the current state-of-the-art for the most common Vietnamese NLP tasks.
This document aims to track the progress in Vietnamese Natural Language Processing and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.
It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of
benchmark datasetsand the
state-of-the-artfor their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a
public leaderboard, the reader will be pointed there.
If you would like to add a new result, you can do so with a pull request (PR). In order to minimize noise and to make maintenance somewhat manageable, results reported in published papers will be preferred (indicate the venue of publication in your PR); an exception may be made for influential preprints. The result should include the name of the method, the citation, the score, and a link to the paper and should be added so that the table is sorted (with the best result on top).
If your pull request contains a new result, please make sure that "new result" appears somewhere in the title of the PR. This way, we can track which tasks are the most active and receive the most attention.
In order to make reproduction easier, we recommend to add a link to an implementation to each method if available. You can add a
Codecolumn (see below) to the table if it does not exist. In the
Codecolumn, indicate an official implementation with Official. If an unofficial implementation is available, use Link (see below). If no implementation is available, you can leave the cell empty.
To add a new dataset or task, follow the below steps. Any new datasets should have been used for evaluation in at least one published paper besides the one that introduced the dataset.
| Model | Score | Paper/Source | Code | | ------------- | :-----:| --- | --- | | | | | |