Models for automatic detoxification of Russian texts.
Folder data
consists of:
data/train.csv
— train dataset, 262702 texts (213271 non-toxic, 37073 toxic);data/test.txt
— test dataset, 12358 toxic texts;data/parallel_corpus.csv
— parallel train dataset, 500 samples;data/toxic_vocab.txt
— pre-defined vocab of rude and toxic words, 139490 words;data/preds_delete.txt
— predictions of the delete baseline on the test dataset.
We provide two baselines:
- Duplicate — simple duplication of the input;
- Delete (
baselines/delete.py
) — removal of rude and toxic from the pre-defined vocab.
The general algorithm of text detoxification:
- Toxic word detection — we train a binary classifier to detect toxic words;
- Toxic word replacement — to replace words classified as toxic, we use one of pre-trained NLP models for Russian language (either
ruBERT-large
orruRoBERTa-large
). From the top-10 of model predictions we select one that is 1) non-toxic 2) closest to the original word (word embeddings are generated with the FastText model). - Toxic word deletion — if a non-toxic replacement wasn't found in the top-10 of model predictions, we delete the word.
The evaluation consists of three types of metrics:
- style transfer accuracy (STA) — the average confidence of the pre-trained BERT-based toxic/non-toxic text classifier (
SkolkovoInstitute/russian_toxicity_classifier
); - cosine similarity (CS) — the average distance of embeddings of the input and output texts. The embeddings are generated with the FastText Skipgram model;
- fluency score (FL) — the average difference in confidence of the pre-trained BERT-based corrupted/non-corrupted text classifier (
SkolkovoInstitute/rubert-base-corruption-detector
) between the input and output texts.
Finally, joint score (JS): the sentence-level multiplication of the STA, SIM, and FL scores.
You can run the metric.py
script for evaluation with the following parameters:
-i
,--inputs
— the path to the input dataset written in.txt
file;-p
,--preds
— the path to the file of model's prediction written in.txt
file;-b
,--batch_size
— batch size for the classifiers, default value 32;-m
,--model
— the name of your model, is empty by default;-f
,--file
— the path to the output file. If not specified, results will not be written to a file.
Method | STA↑ | CS↑ | FL↑ | JS↑ |
---|---|---|---|---|
Baselines | ||||
Duplicate | 0.07 | 1.00 | 1.00 | 0.06 |
Delete | 0.35 | 0.97 | 0.84 | 0.26 |
Models | ||||
ruBERT-large | ||||
ruRoBERTa-large |