Need help with faiss_tips?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

161 Stars 20 Forks MIT License 8 Commits 0 Opened issues


Some useful tips for faiss

Services available


Need anything else?

Contributors list

# 95,992
5 commits

Faiss tips

  • Some useful tips for faiss
  • We suppose faiss is installed via conda:
    conda install faiss-cpu -c pytorch
    conda install faiss-gpu -c pytorch

Nearest neighbor search (CPU)

The most basic nearest neighbor search by L2 distance. This is much faster than scipy. You should first try this, especially when the database is relatively small (N<10^6). The search is automatically paralellized. ```python import faiss import numpy as np

D = 128 N = 10000 X = np.random.random((N, D)).astype(np.float32) # inputs of faiss must be float32


index = faiss.IndexFlatL2(D) index.add(X)


topk = 4 dists, ids =[:3], k=topk) # Use the top three vectors for queyring print(type(dists), dists.dtype, dists.shape) # float32 (3, 4) print(type(ids), ids.dtype, ids.shape) # int64 (3, 4)

Show params

print("N:", index.ntotal) print("D:", index.d) ``

Note that
index.add` function copies the data. If you would like to reduce the overhead of the copy, see this wiki and gists.

Nearest neighbor search (GPU)

The nearest neighbor search on GPU(s). This returns the (almost) same result as that of CPU. This is extremely fast although there aren't any approximation steps. You should try this if the data fit into your GPU memory. ```python import faiss import numpy as np import os

D = 128 N = 10000 X = np.random.random((N, D)).astype(np.float32) # inputs of faiss must be float32

GPU config

gpuids = "0" # can be e.g. "3,4" for multiple GPUs os.environ['CUDAVISIBLEDEVICES'] = gpuids


cpuindex = faiss.IndexFlatL2(D) gpuindex = faiss.indexcputoallgpus(cpuindex) gpuindex.add(X)


topk = 4 dists, ids =[:3], k=topk) ```

Approximate nearest neighbor search (CPU)

There are several methods for ANN search. Currently, HNSW + IVFPQ achieves the best performance for billion-scale data in terms of the balance among memory, accuracy, and runtime. ```python import faiss import numpy as np

D = 128 N = 1000000 Xt = np.random.random((10000, D)).astype(np.float32) # 10000 vectors for training X = np.random.random((N, D)).astype(np.float32)

Param of PQ

M = 16 # The number of sub-vector. Typically this is 8, 16, 32, etc. nbits = 8 # bits per sub-vector. This is typically 8, so that each sub-vec is encoded by 1 byte

Param of IVF

nlist = 1000 # The number of cells (space partition). Typical value is sqrt(N)

Param of HNSW

hnsw_m = 32 # The number of neighbors for HNSW. This is typically 32


quantizer = faiss.IndexHNSWFlat(D, hnsw_m) index = faiss.IndexIVFPQ(quantizer, D, nlist, M, nbits)






index.nprobe = 8 # Runtime param. The number of cells that are visited for search. topk = 4 dists, ids =[:3], k=topk)

Show params

print("D:", index.d) print("N:", index.ntotal) print("M:", index.pq.M) print("nbits:", index.pq.nbits) print("nlist:", index.nlist) print("nprobe:", index.nprobe) ```

Note that you might need to set

as member variables if you'd like to use them inside your class, e.g.,
self.quantizer = faiss.IndexHNSWFlat(D, hnsw_m)
self.index = faiss.IndexIVFPQ(self.quantizer, D, nlist, M, nbits)


Faiss index can be read/write via util functions:

faiss.write_index(index, "index.bin")
index2 = faiss.read_index("index.bin")  # index2 is identical to index
Or, you can serialize the index into binary array (np.array). You can save/load it via numpy IO functions.
chunk = faiss.serialize_index(index)"index.npy", chunk)
index3 = faiss.deserialize_index(np.load("index.npy"))   # identical to index
You can even use pickle:
import pickle
with open("index.pkl", "wb") as f:
    pickle.dump(chunk, f)
with open("index.pkl", "rb") as f:
    index4 = faiss.deserialize_index(pickle.load(f))   # identical to index

k-means (CPU/GPU)

k-means in faiss is much faster than that in sklearn. See the benchmark. ```python import faiss import numpy as np

D = 128 N = 10000 K = 10 # The number of clusters X = np.random.random((N, D)).astype(np.float32)


kmeans = faiss.Kmeans(d=D, k=K, niter=20, verbose=True)

For GPU(s), run the following line. This will use all GPUs

kmeans = faiss.Kmeans(d=D, k=K, niter=20, verbose=True, gpu=True)

Run clustering


Error for each iteration

print(kmeans.obj) # array with 20 elements

Centroids after clustering

print(kmeans.centroids.shape) # (10, 128)

The assignment for each vector.

dists, ids =, 1) # Need to run NN search again print(ids.shape) # (10000, 1)


print("D:", kmeans.d) print("K:", kmeans.k) print("niter:", kmeans.cp.niter) ```

Get posting lists

Given an IVF type index and queries, you can check the nearest posting lists by calling the search function in a quantizer: ```python

Suppose index = faiss.indexIVF...

dists, ids =[:3], k=1) print(dists) print(ids) # e.g., ids[2] is the ID of the nearest posting list for the third query (X[2])

Note that you can enumerate all posting lists (both indices and codes) if you want:
python idposlists = []
poslists = []

codesz = index.invlists.codesize

Code size per vector in bytes. This equals to

- 4 * D (if IndexIVFFlat)

- M (if IndexIVFPQ with nbits=8)

print("codesz:", codesz)

for listno in range(index.nlist): listsz = index.invlists.listsize(listno) # The length of list_no-th posting list

# Fetch
id_poslist = np.array(faiss.rev_swig_ptr(index.invlists.get_ids(list_no), list_sz))
code_poslist = np.array(faiss.rev_swig_ptr(index.invlists.get_codes(list_no), list_sz * code_sz))

id_poslists.append(id_poslist) code_poslists.append(code_poslist)

print(len(idposlists)) # nlist print(len(idposlists[3])) # e.g., 463 print(idposlists[3].dtype) # int64 print(idposlists[3][:4]) # e.g., [ 309 1237 6101]

print(len(codeposlists)) # nlist print(len(codeposlists[3])) # e.g., 7408 (i.e., codesz=16) print(codeposlists[3].dtype) # uint8 print(code_poslists[3][:4 * code_sz]) # e.g., [ 36 66 242 ... 126] ```

Multiple/Single threads

Faiss usually uses multiple threads automatically via openmp for parallel computing. If you would like to use a single thread only, run this line in the beginning.

Or, you can explicitly set an environment variable in your terminal.

Hamming distance table

Given two sets of binary vectors, pairwise hamming distances can be computed as follows.

The code is from

def pairwise_hamming_dis(a, b):
    """ compute the pairwise Hamming distances between two matrices """
    na, d = a.shape
    nb, d2 = b.shape
    assert d == d2

dis = np.empty((na, nb), dtype='int32')

    faiss.swig_ptr(a), faiss.swig_ptr(b),
    na, nb, d,
return dis

Each vector must be the form of "uint8 * ncodes",

where ncodes % 8 == 0

xq = np.array([[0, 0, 0, 0, 0, 0, 0, 2], # [0, 0, ..., 1, 0]

Merge results

You can merge search results from several indices by ResultHeap

import faiss
import numpy as np

D = 128 N = 10000 Nq = 3 X = np.random.random((N, D)).astype(np.float32) Xq = np.random.random((Nq, D)).astype(np.float32)


index = faiss.IndexFlatL2(D) index.add(X)


topk = 10 dists, ids =, k=topk) print("dists:", dists) print("ids:", ids)

Setup with two indices

index1 = faiss.IndexFlatL2(D) index1.add(X[:2000]) # Store the first 2000 vectors index2 = faiss.IndexFlatL2(D) index2.add(X[2000:]) # Store the remaining

Search for both indices

dists1, ids1 =, k=topk) dists2, ids2 =, k=topk)

Merge results

result_heap = faiss.ResultHeap(nq=Nq, k=topk) result_heap.add_result(D=dists1, I=ids1) result_heap.add_result(D=dists2, I=ids2 + 2000) # 2000 is an offset result_heap.finalize() print("dists:", result_heap.D) print("ids:", result_heap.I)

assert np.array_equal(dists, result_heap.D) assert np.array_equal(ids, result_heap.I)


When you would like to directly handle std::vector in the c++ class, you can convert std::vector to np.array by

. Similarily, you can call
to convert np.array to std::vector. See python/ for more details.
import faiss
import numpy as np

D = 2 N = 3 X = np.random.random((N, D)).astype(np.float32) print(X)

[[0.8482132 0.17902061]

[0.07226888 0.15747449]

[0.41783017 0.9381101 ]]


index = faiss.IndexFlatL2(D) index.add(X)

Let's see std::vector inside the c++ class (IndexFlatL2)

We cannot directly read/update them by nparray.

print(type(index.xb)) # print(index.xb) # <faiss.swigfaiss.floatvector proxy of object type float> *' at 0x7fd62822a690> >

Convert std::vector to np.array

xb_np = faiss.vector_to_array(index.xb) # std::vector -> np.array print(type(xb_np)) # print(xb_np) # [0.8482132 0.17902061 0.07226888 0.15747449 0.41783017 0.9381101 ]

We can also convert np.array to std::vector

X2 = np.random.random((N, D)).astype(np.float32) # new np.array X2_std_vector = faiss.FloatVector() # Buffer. print(type(X2_std_vector)) # faiss.copy_array_to_vector(X2.reshape(-1), X2_std_vector) # np.array -> std::vector. Don't forget to flatten the np.array

The obtained std:vector can be used inside the c++ class

index.xb = X2_std_vector # You can set this in IndexFlatL2 (Be careful. This is dangerous)

print(, k=3)) # Works correctly

(array([[0. , 0.09678471, 0.5692644 ],

[0. , 0.09678471, 0.23682791],

[0. , 0.23682791, 0.5692644 ]], dtype=float32), array([[0, 1, 2],

[1, 0, 2],

[2, 1, 0]]))


We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.