🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro for Elasticsearch
Open Distro for Elasticsearch enables you to run nearest neighbor search on billions of documents across thousands of dimensions with the same ease as running any regular Elasticsearch query. You can use aggregations and filter clauses to further refine your similarity search operations. K-NN similarity search powers use cases such as product recommendations, fraud detection, image and video search, related document search, and more.
The README provides information for development of the k-NN plugin. To learn more about plugin usage, please see our documentation. Do not hesitate to create an issue if something is missing from the documentation!
settings.gradlefile in the root of this package.
JAVA_HOMEto point to a JDK 14 before running
The package uses the Gradle build system.
JAVA_HOMEto point to a JDK >=14
The plugin relies on a JNI library to perform approximate k-NN search. For plugin installations from archive(.zip), it is necessary to ensure
.sofile for Linux and
.jnilibfile for Mac OS are present in the Java library path. This can be possible by copying .so/.jnilib to either $ESHOME or by adding manually ```-Djava.library.path=<pathtolibfiles>
To build the JNI Library, follow these steps:
cd jni cmake . make
The library will be placed in the
To build an RPM or DEB of the JNI library, follow these steps:
cd jni cmake . make package
The artifacts will be placed in the
We build and distribute binary library artifacts with Opendistro for Elasticsearch. We build the library binary, RPM and DEB in this GitHub action. We use Centos 7 with g++ 4.8.5 to build the DEB, RPM and ZIP. Additionally, in order to provide as much general compatibility as possible, we compile the library without optimized instruction sets enabled. For users that want to get the most out of the library, they should follow this section and build the library from source in their production environment, so that if their environment has optimized instruction sets, they take advantage of them.
It can be useful to test and debug on a multi-node cluster. In order to launch a 3 node cluster with the KNN plugin installed, run the following command:
./gradlew run -PnumNodes=3
In order to run the integration tests with a 3 node cluster, run this command:
./gradlew :integTest -PnumNodes=3
Sometimes it is useful to attach a debugger to either the Elasticsearch cluster or the integration test runner to see what's going on. For running unit tests, hit Debug from the IDE's gutter to debug the tests. For the Elasticsearch cluster, first, make sure that the debugger is listening on port
5005. Then, to debug the cluster code, run:
./gradlew :integTest -Dcluster.debug=1 # to start a cluster with debugger and run integ tests
./gradlew run --debug-jvm # to just start a cluster that can be debugged
The Elasticsearch server JVM will connect to a debugger attached to
localhost:5005before starting. If there are multiple nodes, the servers will connect to debuggers listening on ports
5005, 5006, ...
To debug code running in an integration test (which exercises the server from a separate JVM), first, setup a remote debugger listening on port
8000, and then run:
./gradlew :integTest -Dtest.debug=1
The test runner JVM will connect to a debugger attached to
localhost:8000before running the tests.
Additionally, it is possible to attach one debugger to the cluster JVM and another debugger to the test runner. First, make sure one debugger is listening on port
5005and the other is listening on port
8000. Then, run:
./gradlew :integTest -Dtest.debug=1 -Dcluster.debug=1
We appreciate and encourage contributions from the community. If you experience a bug or have a feature request, please create an issue for it. If you decide to make a contribution, please fill out the Pull Request template with as much detail as possible. Also, when creating a title for your Pull Request, please do not include a prefix such as
Bug Fix:. Instead, please use the corresponding tag to label the purpose of the Pull Request.
We'd like to get your comments! Please read the plugin RFC document and raise an issue to add your comments and questions.
This project uses the Apache 2.0-licensed Non-Metric Space Library. Thank you to Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak and all those who have contributed to that project!
This project has adopted an Open Source Code of Conduct.
If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our vulnerability reporting page. Please do not create a public GitHub issue.
See the LICENSE file for our project's licensing. We will ask you to confirm the licensing of your contribution.
Copyright 2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.