spaCy pipeline object for negating concepts in text
spaCy 3.0 support coming soon
spaCy pipeline object for negating concepts in text. Based on the NegEx algorithm.
NegEx - A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries Chapman, Bridewell, Hanbury, Cooper, Buchanan https://doi.org/10.1006/jbin.2001.1029
Install the library.
bash pip install negspacy
Import library and spaCy.
python import spacy from negspacy.negation import Negex
Load spacy language model. Add negspacy pipeline object. Filtering on entity types is optional.
python nlp = spacy.load("en_core_web_sm") negex = Negex(nlp, ent_types=["PERSON","ORG"]) nlp.add_pipe(negex, last=True)
View negations. ```python doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents: print(e.text, e._.negex) ```
Steve Jobs True Apple False
Consider pairing with scispacy to find UMLS concepts in text and process negations.
Designate termset to use,
en_clinicalis used by default.
negex = Negex(nlp, language = "en_clinical")
en= phrases for general english language text
en_clinicalDEFAULT = adds phrases specific to clinical domain to general english
en_clinical_sensitive= adds additional phrases to help rule out historical and possibly irrelevant entities
Replace all patterns with your own set
python nlp = spacy.load("en_core_web_sm") negex = Negex(nlp, termination=["but", "however", "nevertheless", "except"])
Add and remove individual patterns on the fly
python negex.add_patterns( pseudo_negations=["my favorite pattern"], termination=["these are", "great patterns"], preceding_negations=["more patterns"], following_negations=["even more patterns"], ) negex.remove_patterns( pseudo_negations=["my favorite pattern"], termination=["these are", "great patterns"], preceding_negations="denied", following_negations=["unlikely"], )Note: A list is required when adding any amount of patterns but only required when removing multiple patterns.
View patterns in use
python patterns_dict = negex.get_patterns
Depending on the Named Entity Recognition model you are using, you may have negations "chunked together" with nouns. For example when using scispacy: ```python nlp = spacy.load("encoresci_sm") doc = nlp("There is no headache.") for e in doc.ents: print(e.text)
This would cause the Negex algorithm to miss the preceding negation. To account for this, you can add a ```chunk_prefix```:```python nlp = spacy.load("en_core_sci_sm") negex = Negex(nlp, language = "en_clinical", chunk_prefix = ["no"]) nlp.add_pipe(negex) doc = nlp("There is no headache.") for e in doc.ents: print(e.text, e._.negex)
no headache True
This library is featured in the spaCy Universe. Check it out for other useful libraries and inspiration.
If you're looking for a spaCy pipeline object to extract values that correspond to a named entity (e.g., birth dates, account numbers, or laboratory results) take a look at extractacy.