by tsroten

tsroten / zhon

Constants used in Chinese text processing

227 Stars 32 Forks Last release: Not found MIT License 147 Commits 11 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:



.. image:: https://badge.fury.io/py/zhon.png :target: http://badge.fury.io/py/zhon

.. image:: https://travis-ci.org/tsroten/zhon.png?branch=develop :target: https://travis-ci.org/tsroten/zhon

Zhon is a Python library that provides constants commonly used in Chinese text processing.

  • Documentation: http://zhon.rtfd.org
  • GitHub: https://github.com/tsroten/zhon
  • Support: https://github.com/tsroten/zhon/issues
  • Free software:
    MIT license 


Zhon's constants can be used in Chinese text processing, for example:

  • Find CJK characters in a string:

.. code:: python

>>> re.findall('[{}]'.format(zhon.hanzi.characters), 'I broke a plate: 我打破了一个盘子.')
['我', '打', '破', '了', '一', '个', '盘', '子']
  • Validate Pinyin syllables, words, or sentences:

.. code:: python

>>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē']

>>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I) ['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē']

>>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I) ['Yuànzi lǐ tíngzhe yí liàng chē.']


  • Includes commonly-used constants:
    • CJK characters and radicals
    • Chinese punctuation marks
    • Chinese sentence regular expression pattern
    • Pinyin vowels, consonants, lowercase, uppercase, and punctuation
    • Pinyin syllable, word, and sentence regular expression patterns
    • Zhuyin characters and marks
    • Zhuyin syllable regular expression pattern
    • CC-CEDICT characters
  • Runs on Python 2.7 and 3

Getting Started

  • Install Zhon 
  • Read
    Zhon's introduction 
  • Learn from the
    API documentation 
  • Contribute 
    _ documentation, code, or feedback

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.