Get Started

This repo is a LSTM+BoW baseline model for NTCIR-15 Dialogue Evaluation Task (DialEval-1)

Dialogue Quality Model

Each Dialogue turn is represented as a N x D matrix where N is the number of tokens and D is the embedding dimensionality. To convert each turn matrix into a vector, we apply Bag of Words (BoW), which takes the sum of each word vectors. Then, stacked bidirectional LSTMs are employed to encode turn vectors to obtain the representation of the dialogue. Finally, the dialogue representation is feed into dense layers to estimate the distributions of dialogue quality.

A-score: Accomplishment Score (2, 1, 0, -1, -2).

E-score: Efficiency Score (2, 1, 0, -1, -2).

S-Score: Satisfaction score (2, 1, 0, -1, -2).

Nugget Detection Model

Nugget detection baseline model is similar to the model above, but we predict the nugget distribution for customer turns and helpdesk turns.

Customer turn nugget types: Trigger Nugget (CNUG0), Not A Nugget (CNaN), Regular Nugget (CNUG), and Goal Nugget (CNUG*)
Helpdesk turn nugget types: Not A Nugget (HNaN), Regular Nugget (HNUG), and Goal Nugget (HNUG*)

We aim to provide a start point for DialEval-1 participants and researchers who are interested in the dataset. Please feel free to fork this repo and modify model.py to implement your own models.

Get Started

Install

Recommended environment:

python>=3.6
tensorflow-gpu>=1.15 (TF 2.0 is not supported)

# Clone this repo
git clone https://github.com/DialEval-1/LSTM-baseline.git

# Copy the DialEval-1 training and dev dataset 
cp -r /path/to/dialeval-data-folder LSTM-baseline

# Install dependencies
pip install -r requirements.txt

# download spacy english corpus
python -m spacy download en

Note: To obtain the dataset of DialEval-1, please check https://dialeval-1.github.io/dataset/.

Train all tasks

This command will train 4 models for both nugget detection and dialogue quality tasks for both Chinese and English training dataset. In addition, the prediction for the test set will be placed in ./output

./train_all.sh

Commands

Train a single task

# Train Nugget Detection with English training dataset
python train.py \
    --task nugget \
    --language english \
    --learning-rate 1e-3 \
    --batch-size 128

By default, checkpoints will be stored in ./checkpoint, and logs for tensorboard will be written to ./log. After training, the prediction file for test set will be written to ./output. Note that test prediction will not be performed when test_en.json and test_cn.json are not available in the dataset folder (test data will be released later).

For more adjustable hyper-parameters, please check flags.py.

Loading checkpoint and generate a prediction file

You may use the following command to generate a prediction for test set by loading a trained model checkpoint. The prediction file will be put in ./output by default.

python train.py \
    --task nugget \
    --language english \
    --resume-dir ./checkpoint/.... \
    --infer-test True \
    --output-dir ./output

Test Results

The distance scores (i.e., JSD, RNSS, RSNOD, NMD) were transformeed by -log() for readability. Thus, the higher the transformed scores, the better the model's effectivness.

Chinese Track

	JSD	RNSS
Nugget Detection	3.77	2.55

	RSNOD	NMD
A-Score	2.12	2.66
E-Score	2.47	2.85
S-Score	2.21	2.76

English Track

	JSD	RNSS
Nugget Detection	3.71	2.49

	RSNOD	NMD
A-Score	2.14	2.65
E-Score	2.57	3.00
S-Score	2.24	2.82

You may find that the dev scores are higher than the test scores. This is because the training data and dev data are annotated by the same group of annotators, while the DialEval-1 test data are annotated by another group of annotators. Thus, there may be a gap between the training data and test data.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
img		img
.gitignore		.gitignore
README.md		README.md
data.py		data.py
eval.py		eval.py
flags.py		flags.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py
train_all.sh		train_all.sh
utils.py		utils.py
vocab.py		vocab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dialogue Quality Model

Nugget Detection Model

Get Started

Install

Train all tasks

Commands

Train a single task

Loading checkpoint and generate a prediction file

Test Results

Chinese Track

English Track

About

Releases

Packages

Languages

DialEval-1/LSTM-baseline

Folders and files

Latest commit

History

Repository files navigation

Dialogue Quality Model

Nugget Detection Model

Get Started

Install

Train all tasks

Commands

Train a single task

Loading checkpoint and generate a prediction file

Test Results

Chinese Track

English Track

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages