Github url

tantivy

by fulmicoton

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

3.8K Stars 223 Forks Last release: 5 months ago (0.12) MIT License 1.6K Commits 29 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Build StatuscodecovJoin the chat at https://gitter.im/tantivy-search/tantivyLicense: MITBuild statusCrates.ioSay Thanks!

Tantivy

Become a patron

Tantivy is a full text search engine library written in Rust.

It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.

Tantivy is, in fact, strongly inspired by Lucene's design.

Benchmark

The following benchmark break downs performance for different type of queries / collection.

In general, Tantivy tends to be - slower than Lucene on union with a Top-K due to Block-WAND optimization. - faster than Lucene on intersection and phrase queries.

Your mileage WILL vary depending on the nature of queries and their load.

Features

  • Full-text search
  • Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese (tantivy-jieba and cang-jie), Japanese (lindera and tantivy-tokenizer-tiny-segmente) and Korean (lindera + lindera-ko-dic-builder)
  • Fast (check out the :racehorse: :sparkles: benchmark :sparkles: :racehorse:)
  • Tiny startup time (<10ms), perfect for command line tools
  • BM25 scoring (the same as Lucene)
  • Natural query language (e.g.
    (michael AND jackson) OR "king of pop"
    )
  • Phrase queries search (e.g.
    "michael jackson"
    )
  • Incremental indexing
  • Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
  • Mmap directory
  • SIMD integer compression when the platform/CPU includes the SSE2 instruction set
  • Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
  • &[u8]
    fast fields
  • Text, i64, u64, f64, dates, and hierarchical facet fields
  • LZ4 compressed document store
  • Range queries
  • Faceted search
  • Configurable indexing (optional term frequency and position indexing)
  • Cheesy logo with a horse

Non-features

  • Distributed search is out of the scope of Tantivy. That being said, Tantivy is a library upon which one could build a distributed search. Serializable/mergeable collector state for instance, are within the scope of Tantivy.

Getting started

Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.

How can I support this project?

There are many ways to support this project.

  • Use Tantivy and tell us about your experience on Gitter or by email ([email protected])
  • Report bugs
  • Write a blog post
  • Help with documentation by asking questions or submitting PRs
  • Contribute code (you can join our Gitter)
  • Talk about Tantivy around you
  • Drop a word on on Say Thanks! or even Become a patron

Contributing code

We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.

Clone and build locally

Tantivy compiles on stable Rust but requires

Rust \>= 1.27

. To check out and run tests, you can simply run:

git clone https://github.com/tantivy-search/tantivy.git cd tantivy cargo build

Run tests

Some tests will not run with just

cargo test

because of

fail-rs

. To run the tests exhaustively, run

./run-tests.sh

.

Debug

You might find it useful to step through the programme with a debugger.

A failing test

Make sure you haven't run

cargo clean

after the most recent

cargo test

or

cargo build

to guarantee that the

target/

directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under

rust-gdb

:

find target/debug/ -maxdepth 1 -executable -type f -name "tantivy\*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT\_DBG\_TANTIVY rust-gdb RECENT\_DBG\_TANTIVY

Now that you are in

rust-gdb

, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to

cargo test

like this:

$gdb run --test-threads 1 --test $NAME\_OF\_TEST

An example

By default,

rustc

compiles everything in the

examples/

directory in debug mode. This makes it easy for you to make examples to reproduce bugs:

rust-gdb target/debug/examples/$EXAMPLE\_NAME $ gdb run

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.