Need help with bertsearch?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

Hironsan
749 Stars 171 Forks MIT License 28 Commits 1 Opened issues

Description

Elasticsearch with BERT for advanced document search.

Services available

!
?

Need anything else?

Contributors list

# 1,944
Python
OpenCV
embeddi...
Keras
20 commits
# 194,806
Python
elastic...
Shell
HTML
2 commits
# 249,174
Python
elastic...
Shell
HTML
1 commit

Elasticsearch meets BERT

Below is a job search example:

An example of bertsearch

System architecture

System architecture

Requirements

  • Docker
  • Docker Compose >= 1.22.0

Getting Started

1. Download a pretrained BERT model

List of released pretrained BERT models (click to expand...)
BERT-Base, Uncased 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Large, Uncased 24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Cased 12-layer, 768-hidden, 12-heads , 110M parameters
BERT-Large, Cased 24-layer, 1024-hidden, 16-heads, 340M parameters
BERT-Base, Multilingual Cased (New) 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Multilingual Cased (Old) 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
BERT-Base, Chinese Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
$ wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip
$ unzip cased_L-12_H-768_A-12.zip

2. Set environment variables

You need to set a pretrained BERT model and Elasticsearch's index name as environment variables:

$ export PATH_MODEL=./cased_L-12_H-768_A-12
$ export INDEX_NAME=jobsearch

3. Run Docker containers

$ docker-compose up

CAUTION: If possible, assign high memory(more than

8GB
) to Docker's memory configuration because BERT container needs high memory.

4. Create index

You can use the create index API to add a new index to an Elasticsearch cluster. When creating an index, you can specify the following:

  • Settings for the index
  • Mappings for fields in the index
  • Index aliases

For example, if you want to create

jobsearch
index with
title
,
text
and
text_vector
fields, you can create the index by the following command:
$ python example/create_index.py --index_file=example/index.json --index_name=jobsearch
# index.json
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 1
  },
  "mappings": {
    "dynamic": "true",
    "_source": {
      "enabled": "true"
    },
    "properties": {
      "title": {
        "type": "text"
      },
      "text": {
        "type": "text"
      },
      "text_vector": {
        "type": "dense_vector",
        "dims": 768
      }
    }
  }
}

CAUTION: The

dims
value of
text_vector
must need to match the dims of a pretrained BERT model.

5. Create documents

Once you created an index, you’re ready to index some document. The point here is to convert your document into a vector using BERT. The resulting vector is stored in the

text_vector
field. Let`s convert your data into a JSON document:
$ python example/create_documents.py --data=example/example.csv --index_name=jobsearch
# example/example.csv
"Title","Description"
"Saleswoman","lorem ipsum"
"Software Developer","lorem ipsum"
"Chief Financial Officer","lorem ipsum"
"General Manager","lorem ipsum"
"Network Administrator","lorem ipsum"

After finishing the script, you can get a JSON document like follows:

# documents.jsonl
{"_op_type": "index", "_index": "jobsearch", "text": "lorem ipsum", "title": "Saleswoman", "text_vector": [...]}
{"_op_type": "index", "_index": "jobsearch", "text": "lorem ipsum", "title": "Software Developer", "text_vector": [...]}
{"_op_type": "index", "_index": "jobsearch", "text": "lorem ipsum", "title": "Chief Financial Officer", "text_vector": [...]}
...

6. Index documents

After converting your data into a JSON, you can adds a JSON document to the specified index and makes it searchable.

$ python example/index_documents.py

7. Open browser

Go to http://127.0.0.1:5000.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.