pySBD

by nipunsadvilkar

nipunsadvilkar / pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that...

167 Stars 19 Forks Last release: 23 days ago (v0.3.3) MIT License 260 Commits 14 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

pySBD: Python Sentence Boundary Disambiguation (SBD)

Python package codecov License PyPi GitHub

pySBD - python Sentence Boundary Disambiguation (SBD) - is a rule-based sentence boundary detection module that works out-of-the-box.

This project is a direct port of ruby gem - Pragmatic Segmenter which provides rule-based sentence boundary detection.

pysbd_code

Install

Python

pip install pysbd

Usage

  • Currently pySBD supports only English language. Support for more languages will be released soon.
import pysbd
text = "My name is Jonas E. Smith. Please turn to p. 55."
seg = pysbd.Segmenter(language="en", clean=False)
print(seg.segment(text))
# ['My name is Jonas E. Smith.', 'Please turn to p. 55.']
import spacy
from pysbd.utils import PySBDFactory

nlp = spacy.blank('en')

explicitly adding component to pipeline

(recommended - makes it more readable to tell what's going on)

nlp.add_pipe(PySBDFactory(nlp))

or you can use it implicitly with keyword

pysbd = nlp.create_pipe('pysbd')

nlp.add_pipe(pysbd)

doc = nlp('My name is Jonas E. Smith. Please turn to p. 55.') print(list(doc.sents))

[My name is Jonas E. Smith., Please turn to p. 55.]

Contributing

If you want to contribute new feature/language support or found a text that is incorrectly segmented using pySBD, then please head to CONTRIBUTING.md to know more and follow these steps.

  1. Fork it ( https://github.com/nipunsadvilkar/pySBD/fork )
  2. Create your feature branch (
    git checkout -b my-new-feature
    )
  3. Commit your changes (
    git commit -am 'Add some feature'
    )
  4. Push to the branch (
    git push origin my-new-feature
    )
  5. Create a new Pull Request

Credit

This project wouldn't be possible without the great work done by Pragmatic Segmenter team.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.