Need help with natto?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

136 Stars 14 Forks BSD 2-Clause "Simplified" License 316 Commits 2 Opened issues


A Tasty Ruby Binding with MeCab

Services available


Need anything else?

Contributors list

# 481,568
240 commits
# 20,379
9 commits
# 15,850
2 commits
# 159,960
1 commit
# 2,127
1 commit

natto Gem Version Build Status Gem Downloads Gem License

A Tasty Ruby Binding with MeCab

What is natto?

A gem leveraging FFI (foreign function interface), natto combines the Ruby programming language with MeCab, the part-of-speech and morphological analyzer for the Japanese language.

  • natto provides a naturally Ruby-esque interface to MeCab.
  • It runs on both CRuby (mri/yarv) and JRuby (jvm).
  • It works with MeCab installations on Windows, Unix/Linux and OS X.
  • No compiler is necessary, as natto is not a C extension.

You can learn more about natto at GitHub.


natto requires the following:

Installation on *nix and OS X

Install natto with the following gem command:

gem install natto

This will automatically install the ffi rubygem, which natto uses to bind to the MeCab library.

Installation on Windows

However, if you are using a CRuby on Windows, then you will first need to install the RubyInstaller Development Kit (DevKit), a MSYS/MinGW based toolkit that enables your Windows Ruby installation to build many of the native C/C++ extensions available, including ffi.

  1. Download the latest release for RubyInstaller for Windows platforms and the corresponding DevKit from the RubyInstaller for Windows downloads page.
  2. After installing RubyInstaller for Windows, double-click on the DevKit-tdm installer
    , and expand the contents to an appropriate location, for example
  3. Open a command window under
    , and execute:
    ruby dk.rb init
    . This will locate all known ruby installations, and add them to
  4. Next, execute:
    ruby dk.rb install
    , which will add the DevKit to all of the installed rubies listed in your
    . Now you should be able to install and build the ffi rubygem correctly on your Windows-installed ruby.
  5. Install natto with:

    gem install natto
  6. If you are on a 64-bit Windows and you use a 64-bit Ruby or JRuby, then you might want to build a 64-bit version of libmecab.dll.

Automatic Configuration

No explicit configuration should be necessary, as natto will try to locate the MeCab library based upon its runtime environment.

  • On OS X and *nix, it will query
    mecab-config --libs
  • On Windows, it will query the Windows Registry to determine where
    is installed

Explicit configuration via

If natto cannot find the MeCab library,

will be raised. Please set the
environment variable to the exact name/path to your MeCab library.
  • e.g., for OS X

    export MECAB_PATH=/usr/local/Cellar/mecab/0.996/lib/libmecab.dylib 
  • e.g., for bash on UNIX/Linux

    export MECAB_PATH=/usr/local/lib/
  • e.g., on Windows

    set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
  • e.g., from within a Ruby program



Here's a very quick guide to using natto.

Instantiate a reference to the MeCab library, and display some details:

require 'natto'

nm = => #<:mecab:0x00000803633ae8 address="0x000008035d4640">,
@tagger=#<:pointer address="0x00000802b07c90">,
@lattice=#<:pointer address="0x00000803602f80">,
@dicts=[#<:dictionaryinfo:0x000008036337c8 charset="utf8," type="0">]

puts nm.version => 0.996 </:dictionaryinfo:0x000008036337c8></:pointer></:pointer></:mecab:0x00000803633ae8>

Display details about the system dictionary used by MeCab:

puts nm.libpath
=> /usr/local/lib/ 

sysdic = nm.dicts.first puts sysdic.filepath => /usr/local/lib/mecab/dic/ipadic/sys.dic

puts sysdic.charset => utf8

Parse Japanese text and send the MeCab result as a single string to stdout:

puts nm.parse('俺の名前は星野豊だ!!そこんとこヨロシク!')
俺      名詞,代名詞,一般,*,*,*,俺,オレ,オレ
の      助詞,連体化,*,*,*,*,の,ノ,ノ
名前    名詞,一般,*,*,*,*,名前,ナマエ,ナマエ
は      助詞,係助詞,*,*,*,*,は,ハ,ワ
星野    名詞,固有名詞,人名,姓,*,*,星野,ホシノ,ホシノ
豊      名詞,固有名詞,人名,名,*,*,豊,ユタカ,ユタカ
だ      助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ
!      記号,一般,*,*,*,*,!,!,!
!      記号,一般,*,*,*,*,!,!,!
そこ    名詞,代名詞,一般,*,*,*,そこ,ソコ,ソコ
ん      助詞,特殊,*,*,*,*,ん,ン,ン
とこ    名詞,一般,*,*,*,*,とこ,トコ,トコ
ヨロシク        感動詞,*,*,*,*,*,ヨロシク,ヨロシク,ヨロシク
!      記号,一般,*,*,*,*,!,!,!

If a block is passed to

, you can iterate over the list of resulting
instances to access more detailed information about each morpheme.

In this example, the following attributes and methods for

are used:
  • surface
    - the morpheme surface
  • posid
    - node part-of-speech ID (dictionary-dependent)
  • is_eos?
    - is this
    an end-of-sentence node?

This iterates over the morpheme nodes in the given text, and outputs a formatted, tab-delimited line with the morpheme surface and part-of-speech ID, ignoring any end-of-sentence nodes:

nm.parse('世界チャンプ目指してんだなこれがっ!!夢なの、俺のっ!!') do |n|
  puts "#{n.surface}\tpart-of-speech id: #{n.posid}" if !n.is_eos?
世界    part-of-speech id: 38
チャンプ        part-of-speech id: 38
目指し  part-of-speech id: 31
て      part-of-speech id: 18
ん      part-of-speech id: 63
だ      part-of-speech id: 25
な      part-of-speech id: 17
これ    part-of-speech id: 59
がっ    part-of-speech id: 32
!!      part-of-speech id: 36
夢      part-of-speech id: 38
な      part-of-speech id: 25
の      part-of-speech id: 17
、      part-of-speech id: 9
俺      part-of-speech id: 59
のっ    part-of-speech id: 31
!!      part-of-speech id: 36

For more complex parsing, such as that for natural language processing tasks, it is far more efficient to use

to obtain an
to iterate over the resulting
instances. An
yields each
instance without first materializing all instances at once, thus being more efficient.

This example uses the

node-format option to customize the resulting
feature attribute to extract:
  • %m
    - morpheme surface
  • %f[0]
    - node part-of-speech
  • %f[7]
    - reading

Note that we can move the

both forwards and backwards, rewind it back to the beginning, and then iterate over it.
nm ='-F%m\t%f[0]\t%f[7]')

enum = nm.enum_parse('この星の一等賞になりたいの卓球で俺は、そんだけ!') => #enumerator::each> => #<:mecabnode:0x000000032eed68 address="0x000000005ffb48">,
@feature="この 連体詞 コノ">

enum.peek => #<:mecabnode:0x00000002fe2110a address="0x000000005ffdb8">,
@feature="星 名詞 ホシ">


again, ignore any end-of-sentence nodes

enum.each { |n| puts n.feature if !n.is_eos? } この 連体詞 コノ 星 名詞 ホシ の 助詞 ノ 一等 名詞 イットウ 賞 名詞 ショウ に 助詞 ニ なり 動詞 ナリ たい 助動詞 タイ の 助詞 ノ 卓球 名詞 タッキュウ で 助詞 デ 俺 名詞 オレ は 助詞 ハ 、 記号 、 そん 名詞 ソン だけ 助詞 ダケ ! 記号 ! </:mecabnode:0x00000002fe2110a></:mecabnode:0x000000032eed68>

Partial parsing allows you to pass hints to MeCab on how to tokenize morphemes when parsing. Most useful are boundary constraint parsing and feature constraint parsing.

With boundary constraint parsing, you can specify either a Regexp or String to tell MeCab where the boundaries of a morpheme should be. Use the

keyword. For hints on tokenization, please see String#scan

This example uses the

node-format option to customize the resulting
feature attribute to extract:
  • %m
    - morpheme surface
  • %f[0]
    - node part-of-speech
  • %s
    - node
    status value, 1 is

Note that any such morphemes captured will have node

status of unknown. Also note that MeCab will tag such nodes as a noun.
nm ='-F%m,\s%f[0],\s%s')

text = '心の中で3回唱え、 ヒーロー見参!ヒーロー見参!ヒーロー見参!' pattern = /ヒーロー見参/

nm.enum_parse(text, boundary_constraints: pattern).each do |n| puts n.feature if !(n.is_bos? || n.is_eos?) end

desired morpheme boundary specified with Regexp /ヒーロー見参/

心, 名詞, 0 の, 助詞, 0 中, 名詞, 0 で, 助詞, 0 3, 名詞, 1 回, 名詞, 0 唱え, 動詞, 0 、, 記号, 0 ヒーロー見参, 名詞, 1 !, 記号, 0 ヒーロー見参, 名詞, 1 !, 記号, 0 ヒーロー見参, 名詞, 1 !, 記号, 0

With feature constraint parsing, you can provide instructions to MeCab on what feature to use for a matching morpheme. Use the

keyword to pass in a hash mapping a specific morpheme key (String) to a corresponding feature (String).
# we re-use nm and text from above

nm.options => {:node_format=>"%m,\s%f[0],\s%s"}

mapping = {"ヒーロー見参"=>"その他"}

nm.enum_parse(text, feature_constraints: mapping).each do |n| puts n.feature if !(n.is_bos? || n.is_eos?) end

ヒーロー見参 will be treated as a single morpheme mapping to その他

心, 名詞, 0 の, 助詞, 0 中, 名詞, 0 で, 助詞, 0 3, 名詞, 1 回, 名詞, 0 唱え, 動詞, 0 、, 記号, 0 ヒーロー見参, その他, 1 !, 記号, 0 ヒーロー見参, その他, 1 !, 記号, 0 ヒーロー見参, その他, 1 !, 記号, 0

Learn more

Contributing to natto

  • Use git and check out the latest code at GitHub to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
  • Browse the issue tracker to make sure someone already hasn't requested it and/or contributed it.
  • Fork the project.
  • Start a feature/bugfix branch.
  • Commit and push until you are happy with your contribution.
  • Make sure to add tests for it. This is important so I don't break it in a future version unintentionally. I use MiniTest::Unit as it is very natural and easy-to-use.
  • Please try not to mess with the Rakefile, CHANGELOG, or version. If you must have your own version, that is fine, but please isolate to its own commit so I can cherry-pick around it.


Please see the {file:CHANGELOG} for this gem's release history.


Copyright © 2020, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.