parse-english

by wooorm

English (natural language) parser

129 Stars 9 Forks Last release: 9 months ago (4.1.3) MIT License 351 Commits 41 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

parse-english

Build Coverage Downloads Size Chat

English language parser for retext producing nlcst nodes.

Install

npm:

npm install parse-english

Use

var inspect = require('unist-util-inspect')
var English = require('parse-english')

var tree = new English().parse( 'Mr. Henry Brown: A hapless but friendly City of London worker.' )

console.log(inspect(tree))

Yields:

RootNode[1] (1:1-1:63, 0-62)
└─ ParagraphNode[1] (1:1-1:63, 0-62)
   └─ SentenceNode[23] (1:1-1:63, 0-62)
      ├─ WordNode[2] (1:1-1:4, 0-3)
      │  ├─ TextNode: "Mr" (1:1-1:3, 0-2)
      │  └─ PunctuationNode: "." (1:3-1:4, 2-3)
      ├─ WhiteSpaceNode: " " (1:4-1:5, 3-4)
      ├─ WordNode[1] (1:5-1:10, 4-9)
      │  └─ TextNode: "Henry" (1:5-1:10, 4-9)
      ├─ WhiteSpaceNode: " " (1:10-1:11, 9-10)
      ├─ WordNode[1] (1:11-1:16, 10-15)
      │  └─ TextNode: "Brown" (1:11-1:16, 10-15)
      ├─ PunctuationNode: ":" (1:16-1:17, 15-16)
      ├─ WhiteSpaceNode: " " (1:17-1:18, 16-17)
      ├─ WordNode[1] (1:18-1:19, 17-18)
      │  └─ TextNode: "A" (1:18-1:19, 17-18)
      ├─ WhiteSpaceNode: " " (1:19-1:20, 18-19)
      ├─ WordNode[1] (1:20-1:27, 19-26)
      │  └─ TextNode: "hapless" (1:20-1:27, 19-26)
      ├─ WhiteSpaceNode: " " (1:27-1:28, 26-27)
      ├─ WordNode[1] (1:28-1:31, 27-30)
      │  └─ TextNode: "but" (1:28-1:31, 27-30)
      ├─ WhiteSpaceNode: " " (1:31-1:32, 30-31)
      ├─ WordNode[1] (1:32-1:40, 31-39)
      │  └─ TextNode: "friendly" (1:32-1:40, 31-39)
      ├─ WhiteSpaceNode: " " (1:40-1:41, 39-40)
      ├─ WordNode[1] (1:41-1:45, 40-44)
      │  └─ TextNode: "City" (1:41-1:45, 40-44)
      ├─ WhiteSpaceNode: " " (1:45-1:46, 44-45)
      ├─ WordNode[1] (1:46-1:48, 45-47)
      │  └─ TextNode: "of" (1:46-1:48, 45-47)
      ├─ WhiteSpaceNode: " " (1:48-1:49, 47-48)
      ├─ WordNode[1] (1:49-1:55, 48-54)
      │  └─ TextNode: "London" (1:49-1:55, 48-54)
      ├─ WhiteSpaceNode: " " (1:55-1:56, 54-55)
      ├─ WordNode[1] (1:56-1:62, 55-61)
      │  └─ TextNode: "worker" (1:56-1:62, 55-61)
      └─ PunctuationNode: "." (1:62-1:63, 61-62)

API

parse-english
has the same API as
parse-latin
.

Algorithm

All of

parse-latin
is included, and the following support for the English natural language:

  • Unit abbreviations (
    tsp.
    ,
    tbsp.
    ,
    oz.
    ,
    ft.
    , and more)
  • Time references (
    sec.
    ,
    min.
    ,
    tues.
    ,
    thu.
    ,
    feb.
    , and more)
  • Business Abbreviations (
    Inc.
    and
    Ltd.
    )
  • Social titles (
    Mr.
    ,
    Mmes.
    ,
    Sr.
    , and more)
  • Rank and academic titles (
    Dr.
    ,
    Rep.
    ,
    Gen.
    ,
    Prof.
    ,
    Pres.
    , and more)
  • Geographical abbreviations (
    Ave.
    ,
    Blvd.
    ,
    Ft.
    ,
    Hwy.
    , and more)
  • American state abbreviations (
    Ala.
    ,
    Minn.
    ,
    La.
    ,
    Tex.
    , and more)
  • Canadian province abbreviations (
    Alta.
    ,
    Qué.
    ,
    Yuk.
    , and more)
  • English county abbreviations (
    Beds.
    ,
    Leics.
    ,
    Shrops.
    , and more)
  • Common elision (omission of letters) (
    ’n’
    ,
    ’o
    ,
    ’em
    ,
    ’twas
    ,
    ’80s
    , and more)

License

MIT © Titus Wormer

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.