Knowledge Graph Language Model

This repo contains an implementation of the KGLM model described in "Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling", Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner and Sameer Singh, ACL 2019 [arXiv].

Warning: To avoid confusion regarding placement of '@@END@@' tokens, we have explicitly added the '@@END@@' tokens to the Linked WikiText-2 dataset and removed preprocessing steps from the dataset reader. If you are using an old version of the dataset, you will need to download the current version here for this codebase to work.

Setup

You will need Python 3.5+. Dependencies can be installed by running:

pip install -r requirements.txt

Data

KGLM is trained on the Linked WikiText-2 dataset which can be downloaded at https://rloganiv.github.io/linked-wikitext-2.

Additionally, you will need embeddings for entities/relations in the Wikidata knowledge graph, as well as access to the knowledge graph itself (in order to look up entity aliases/related entities). For convenience, we provide pre-trained embeddings and pickled dictionaries containing the relevant portions of Wikidata here.

If you would like to apply our annotation pipeline to your own data, please refer to: https://github.com/rloganiv/kglm-data.

Training

To train the model run:

allennlp train [path to config] -s [path to save checkpoint to] --include-package kglm

example model configurations are provided in the experiments directory.

Perplexity Evaluation

To estimate perplexity of a trained model on held-out data run:

python -m kglm.run evaluate-perplexity \
    [model_archive_file] \
    [sampler_archive_file] \
    [input_file]

where:

model_archive_file - Trained (generative) model checkpoint. This is the model whose perplexity will be evaluated.
sampler_archive_file - Trained (discriminative) model checkpoint. This is the model used to create annotations during importance sampling. See Section 4 of the paper for more details about importance sampling.
input_file - Path to dataset to measure perplexity on.

Sentence Completion

To perform sentence completion experiments run:

python -m kglm.run predict --predictor cloze [model_archive_file] [input_file]

where

model_archive_file - Trained (generative) model checkpoint. This is the model whose perplexity will be evaluated.
input_file - Path to dataset to measure perplexity on.

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
build_tools/travis		build_tools/travis
experiments		experiments
kglm		kglm
.dockerignore		.dockerignore
.gitignore		.gitignore
.pylintrc		.pylintrc
.travis.yml		.travis.yml
Dockerfile		Dockerfile
README.md		README.md
codecov.yml		codecov.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Graph Language Model

Setup

Data

Training

Perplexity Evaluation

Sentence Completion

About

Releases

Packages

Contributors 2

Languages

rloganiv/kglm-model

Folders and files

Latest commit

History

Repository files navigation

Knowledge Graph Language Model

Setup

Data

Training

Perplexity Evaluation

Sentence Completion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages