🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that...
pySBD - python Sentence Boundary Disambiguation (SBD) - is a rule-based sentence boundary detection module that works out-of-the-box.
This project is a direct port of ruby gem - Pragmatic Segmenter which provides rule-based sentence boundary detection.
pip install pysbd
import pysbd text = "My name is Jonas E. Smith. Please turn to p. 55." seg = pysbd.Segmenter(language="en", clean=False) print(seg.segment(text)) # ['My name is Jonas E. Smith.', 'Please turn to p. 55.']
pysbdas a spaCy pipeline component. (recommended)Please refer to example pysbd_as_spacy_component.py
import spacy from pysbd.utils import PySBDFactory
nlp = spacy.blank('en')
explicitly adding component to pipeline
(recommended - makes it more readable to tell what's going on)
or you can use it implicitly with keyword
pysbd = nlp.create_pipe('pysbd')
doc = nlp('My name is Jonas E. Smith. Please turn to p. 55.') print(list(doc.sents))
[My name is Jonas E. Smith., Please turn to p. 55.]
If you want to contribute new feature/language support or found a text that is incorrectly segmented using pySBD, then please head to CONTRIBUTING.md to know more and follow these steps.
git checkout -b my-new-feature)
git commit -am 'Add some feature')
git push origin my-new-feature)
This project wouldn't be possible without the great work done by Pragmatic Segmenter team.