Skip to content

vyhuholl/russian_detoxification

Repository files navigation

Models for automatic detoxification of Russian texts.

Data

Folder data consists of:

Models

Baselines

We provide two baselines:

  • Duplicate — simple duplication of the input;
  • Delete (baselines/delete.py) — removal of rude and toxic from the pre-defined vocab.

Models

The general algorithm of text detoxification:

  1. Toxic word detection — we train a binary classifier to detect toxic words;
  2. Toxic word replacement — to replace words classified as toxic, we use one of pre-trained NLP models for Russian language (either ruBERT-large or ruRoBERTa-large). From the top-10 of model predictions we select one that is 1) non-toxic 2) closest to the original word (word embeddings are generated with the FastText model).
  3. Toxic word deletion — if a non-toxic replacement wasn't found in the top-10 of model predictions, we delete the word.

Evaluation

The evaluation consists of three types of metrics:

  • style transfer accuracy (STA) — the average confidence of the pre-trained BERT-based toxic/non-toxic text classifier (SkolkovoInstitute/russian_toxicity_classifier);
  • cosine similarity (CS) — the average distance of embeddings of the input and output texts. The embeddings are generated with the FastText Skipgram model;
  • fluency score (FL) — the average difference in confidence of the pre-trained BERT-based corrupted/non-corrupted text classifier (SkolkovoInstitute/rubert-base-corruption-detector) between the input and output texts.

Finally, joint score (JS): the sentence-level multiplication of the STA, SIM, and FL scores.

You can run the metric.py script for evaluation with the following parameters:

  • -i, --inputs — the path to the input dataset written in .txt file;
  • -p, --preds — the path to the file of model's prediction written in .txt file;
  • -b, --batch_size — batch size for the classifiers, default value 32;
  • -m, --model — the name of your model, is empty by default;
  • -f, --file — the path to the output file. If not specified, results will not be written to a file.

Results

Method STA↑ CS↑ FL↑ JS↑
Baselines
Duplicate 0.07 1.00 1.00 0.06
Delete 0.35 0.97 0.84 0.26
Models
ruBERT-large
ruRoBERTa-large

About

Models for automatic Russian texts detoxification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published