Need help with pynlpl?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

proycon
445 Stars 64 Forks GNU General Public License v3.0 2.2K Commits 3 Opened issues

Description

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Services available

!
?

Need anything else?

Contributors list

# 77,368
Shell
computa...
Jupyter...
rust-la...
2059 commits
# 503,231
JavaScr...
Shell
C++
evaluat...
8 commits
# 144,250
Shell
C
MATLAB
Jupyter...
8 commits
# 39,057
Shell
Ruby
webscra...
Twitter
3 commits
# 298,786
Shell
C++
C
evaluat...
2 commits
# 512,600
Jupyter...
CSS
text-an...
evaluat...
1 commit
# 45,737
python-...
jython
kafka
syslog
1 commit
# 301,026
Shell
C++
faceboo...
R
1 commit

PyNLPl - Python Natural Language Processing Library

.. image:: https://travis-ci.org/proycon/pynlpl.svg?branch=master :target: https://travis-ci.org/proycon/pynlpl

.. image:: http://readthedocs.org/projects/pynlpl/badge/?version=latest :target: http://pynlpl.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

.. image:: http://applejack.science.ru.nl/lamabadge.php/pynlpl :target: http://applejack.science.ru.nl/languagemachines/

.. image:: https://zenodo.org/badge/759484.svg :target: https://zenodo.org/badge/latestdoi/759484

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation).

The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

The following modules are available:

  • pynlpl.datatypes
    - Extra datatypes (priority queues, patterns, tries)
  • pynlpl.evaluation
    - Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)
  • pynlpl.formats.cgn
    - Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tags
  • pynlpl.formats.folia
    - Extensive library for reading and manipulating the documents in
    FoLiA 
    _ format (Format for Linguistic Annotation).
  • pynlpl.formats.fql
    - Extensive library for the FoLiA Query Language (FQL), built on top of
    pynlpl.formats.folia
    . FQL is currently documented
    here
    
    __.
  • pynlpl.formats.cql
    - Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.
  • pynlpl.formats.giza
    - Module for reading GIZA++ word alignment data
  • pynlpl.formats.moses
    - Module for reading Moses phrase-translation tables.
  • pynlpl.formats.sonar
    - Largely obsolete module for pre-releases of the SoNaR corpus, use
    pynlpl.formats.folia
    instead.
  • pynlpl.formats.timbl
    - Module for reading Timbl output (consider using
    python-timbl 
    _ instead though)
  • pynlpl.lm.lm
    - Module for simple language model and reader for ARPA language model data as well (used by SRILM).
  • pynlpl.search
    - Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)
  • pynlpl.statistics
    - Frequency lists, Levenshtein, common statistics and information theory functions
  • pynlpl.textprocessors
    - Simple tokeniser, n-gram extraction

Installation

Download and install the latest stable version directly from the Python Package Index with

pip install pynlpl
(or
pip3
for Python 3 on most systems). For global installations prepend
sudo
.

Alternatively, clone this repository and run

python setup.py install
(or
python3 setup.py install
for Python 3 on most system. Prepend
sudo
for global installations.

This software may also be found in the certain Linux distributions, such as the latest versions as Debian/Ubuntu, as

python-pynlpl
and
python3-pynlpl
. PyNLPL is also included in our
LaMachine 
_ distribution.

Documentation

API Documentation can be found

here 
__.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.