Need help with pythainlp?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

659 Stars 211 Forks Apache License 2.0 3.4K Commits 29 Opened issues


Thai Natural Language Processing in Python.

Services available


Need anything else?

Contributors list

PyThaiNLP: Thai Natural Language Processing in Python

pypi Python 3.7 License Download Unit test and code coverage Coverage Status FOSSA Status Google Colab Badge DOI

PyThaiNLP is a Python package for text processing and linguistic analysis, similar to NLTK with focus on Thai language.

PyThaiNLP เป็นไลบารีภาษาไพทอนสำหรับประมวลผลภาษาธรรมชาติ คล้ายกับ NLTK โดยเน้นภาษาไทย ดูรายละเอียดภาษาไทยได้ที่ README_TH.MD


Since PyThaiNLP 3.0, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1

| Version | Description | Status | |:------:|:--:|:------:| | 2.3.2 | Stable | Change Log | |

| Release Candidate for 3.0 | Change Log |

Getting Started


PyThaiNLP provides standard NLP functions for Thai, for example part-of-speech tagging, linguistic unit segmentation (syllable, word, or sentence). Some of these functions are also available via command-line interface.

List of Features
  • Convenient character and word classes, like Thai consonants (pythainlp.thai_consonants), vowels (pythainlp.thai_vowels), digits (pythainlp.thai_digits), and stop words (pythainlp.corpus.thai_stopwords) -- comparable to constants like string.letters, string.digits, and string.punctuation
  • Thai linguistic unit segmentation/tokenization, including sentence (sent_tokenize), word (word_tokenize), and subword segmentations based on Thai Character Cluster (subword_tokenize)
  • Thai part-of-speech tagging (pos_tag)
  • Thai spelling suggestion and correction (spell and correct)
  • Thai transliteration (transliterate)
  • Thai soundex (soundex) with three engines (lk82, udom83, metasound)
  • Thai collation (sort by dictionary order) (collate)
  • Read out number to Thai words (bahttext, num_to_thaiword)
  • Thai datetime formatting (thai_strftime)
  • Thai-English keyboard misswitched fix (eng_to_thai, thai_to_eng)
  • Command-line interface for basic functions, like tokenization and pos tagging (run thainlp in your shell)


pip install --upgrade pythainlp

This will install the latest stable release of PyThaiNLP.

Install different releases:

  • Stable release:
    pip install --upgrade pythainlp
  • Pre-release (near ready):
    pip install --upgrade --pre pythainlp
  • Development (likely to break things):
    pip install

Installation Options

Some functionalities, like Thai WordNet, may require extra packages. To install those requirements, specify a set of

immediately after
pip install pythainlp[extra1,extra2,...]
List of possible `extras`
  • full (install everything)
  • attacut (to support attacut, a fast and accurate tokenizer)
  • benchmarks (for word tokenization benchmarking)
  • icu (for ICU, International Components for Unicode, support in transliteration and tokenization)
  • ipa (for IPA, International Phonetic Alphabet, support in transliteration)
  • ml (to support ULMFiT models for classification)
  • thai2fit (for Thai word vector)
  • thai2rom (for machine-learnt romanization)
  • wordnet (for Thai WordNet API)

For dependency details, look at

variable in

Data directory

  • Some additional data, like word lists and language models, may get automatically download during runtime.
  • PyThaiNLP caches these data under the directory
    by default.
  • Data directory can be changed by specifying the environment variable
  • See the data catalog (
    ) at

Command-Line Interface

Some of PyThaiNLP functionalities can be used at command line, using


For example, displaying a catalog of datasets:

thainlp data catalog

Showing how to use:

thainlp help


| | License | |:---|:----| | PyThaiNLP Source Code and Notebooks | Apache Software License 2.0 | | Corpora, datasets, and documentations created by PyThaiNLP | Creative Commons Zero 1.0 Universal Public Domain Dedication License (CC0)| | Language models created by PyThaiNLP | Creative Commons Attribution 4.0 International Public License (CC-by) | | Other corpora and models that may included with PyThaiNLP | See Corpus License |

Contribute to PyThaiNLP

  • Please do fork and create a pull request :)
  • For style guide and other information, including references to algorithms we use, please refer to our contributing page.


If you use

in your project or publication, please cite the library as follows
Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, & Pattarawat Chormai. (2016, Jun 27). PyThaiNLP: Thai Natural Language Processing in Python. Zenodo.

or BibTeX entry:

    author       = {Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai},
    title        = {{PyThaiNLP: Thai Natural Language Processing in Python}},
    month        = Jun,
    year         = 2016,
    doi          = {10.5281/zenodo.3519354},
    publisher    = {Zenodo},
    url          = {}


VISTEC-depa Thailand Artificial Intelligence Research Institute

Since 2019, our contributors Korakot Chaovavanich and Lalita Lowphansirikul have been supported by VISTEC-depa Thailand Artificial Intelligence Research Institute.

Made with ❤️ | PyThaiNLP Team 💻 | "We build Thai NLP" 🇹🇭

We have only one official repository at and another mirror at
Beware of malware if you use code from mirrors other than the official two at GitHub and GitLab.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.