Need help with EmbedKGQA?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

malllabiisc
148 Stars 30 Forks Apache License 2.0 62 Commits 10 Opened issues

Description

ACL 2020: Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings

Services available

!
?

Need anything else?

Contributors list

EmbedKGQA

This is the code for our ACL 2020 paper Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings

Video

Instructions

In order to run the code, first download data.zip and pretrained_model.zip from https://drive.google.com/drive/folders/1RlqGBMo45lTmWz9MUPTq-0KcjSd3ujxc?usp=sharing. Unzip these files in the main directory.

MetaQA

Change to directory ./KGQA/LSTM. Following is an example command to run the QA training code

python3 main.py --mode train --relation_dim 200 --hidden_dim 256 \
--gpu 2 --freeze 0 --batch_size 128 --validate_every 5 --hops 2 --lr 0.0005 --entdrop 0.1 --reldrop 0.2  --scoredrop 0.2 \
--decay 1.0 --model ComplEx --patience 5 --ls 0.0 --kg_type half

WebQuestionsSP

Change to directory ./KGQA/RoBERTa. Following is an example command to run the QA training code

python3 main.py --mode train --relation_dim 200 --do_batch_norm 0 \
--gpu 2 --freeze 1 --batch_size 16 --validate_every 10 --hops webqsp_half --lr 0.00002 --entdrop 0.0 --reldrop 0.0 --scoredrop 0.0 \
--decay 1.0 --model ComplEx --patience 20 --ls 0.0 --l3_reg 0.001 --nb_epochs 200 --outfile half_fbwq
Note: This will run the code in vanilla setting without relation matching, relation matching will have to be done separately.

Also, please not that this implementation uses embeddings created through libkge (https://github.com/uma-pi1/kge). This is a very helpful library and I would suggest that you train embeddings through it since it supports sparse embeddings + shared negative sampling to speed up learning for large KGs like Freebase.

Dataset creation

MetaQA

KG dataset

There are 2 datasets: MetaQAfull and MetaQAhalf. Full dataset contains the original kb.txt as train.txt with duplicate triples removed. Half contains only 50% of the triples (randomly selected without replacement).

There are some lines like 'entity NOOP entity' in the train.txt for half dataset. This is because when removing the triples, all triples for that entity were removed, hence any KG embedding implementation would not find any embedding vector for them using the train.txt file. By including such 'NOOP' triples we are not including any additional information regarding them from the KG, it is there just so that we can directly use any embedding implementation to generate some random vector for them.

QA Dataset

There are 5 files for each dataset (1, 2 and 3 hop) - qatrain{n}hoptrain.txt - qatrain{n}hoptrainhalf.txt - qatrain{n}hoptrainold.txt - qadev{n}hop.txt - qatest_{n}hop.txt

Out of these, qadev, qatest and qatrain{n}hop_old are exactly the same as the MetaQA original dev, test and train files respectively.

For qatrain{n}hoptrain and qatrain{n}hoptrainhalf, we have added triple (h, r, t) in the form of (head entity, question, answer). This is to prevent the model from 'forgetting' the entity embeddings when it is training the QA model using the QA dataset. qatrain.txt contains all triples, while qatrainhalf.txt contains only triples from MetaQA_half.

WebQuestionsSP

KG dataset

There are 2 datasets: fbwqfull and fbwqhalf

Creating fbwq_full: We restrict the KB to be a subset of Freebase which contains all facts that are within 2-hops of any entity mentioned in the questions of WebQuestionsSP. We further prune it to contain only those relations that are mentioned in the dataset. This smaller KB has 1.8 million entities and 5.7 million triples.

Creating fbwqhalf: We randomly sample 50% of the edges from fbwqfull.

QA Dataset

Same as the original WebQuestionsSP QA dataset.

Citation:

Please cite the following paper if you use this code in your work.

@inproceedings{saxena-etal-2020-improving,
    title = "Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings",
    author = "Saxena, Apoorv  and
      Tripathi, Aditay  and
      Talukdar, Partha",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.412",
    doi = "10.18653/v1/2020.acl-main.412",
    pages = "4498--4507"
}

For any clarification, comments, or suggestions please create an issue or contact Apoorv.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.