Need help with BM25?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

181 Stars 59 Forks MIT License 49 Commits 1 Opened issues


A Python implementation of the BM25 ranking function.

Services available


Need anything else?

Contributors list

# 323,421
45 commits
# 539,028
3 commits


A Python implementation of the BM25 ranking function.


There are 4 main modules of the program: parser, query processor, ranking function, and data structures. The parser module parses the query file and the corpus file to produce a list and a dictionary, respectively. The query processor takes each query in the query list and scores the documents based on the terms. The ranking function is an implementation of the BM25 ranking function; it uses the natural logarithm in its calculations. Finally, the data structures module contains an inverted index and a document length table. The inverted index use a dictionary to map each word to a dictionary; this secondary dictionary maps each document id to the word frequency in the outer dictionary. The document length table contains the length of each document, and also has a function to calculate the average document length of the collection.

How To Run

To run, simply run

$ python
in the src folder.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.