Methods for Detoxification of Texts for the Russian Language (ruDetoxifier)
This repository contains models and evaluation methodology for the detoxification task of Russian texts. The original paper “Methods for Detoxification of Texts for the Russian Language” was presented at Dialogue-2021 conference.
Inference Example
In this repository, we release two best models detoxGPT and condBERT (see Methodology for more details). You can try detoxification inference example in this notebook or .
Methodology
In our research, we tested several approaches:
Baselines
- Duplicate: simple duplication of the input;
- Delete: removal of rude and toxic from pre-defined vocab;
- Retrieve: retrieval based on cosine similarity between word embeddings from non-toxic part of RuToxic dataset;
detoxGPT
Based on ruGPT models. This method requires parallel dataset for training. We tested ruGPT-small, ruGPT-medium, and ruGPT-large models in several setups:
- zero-shot: the model is taken as is (with no fine-tuning). The input is a toxic sentence which we would like to detoxify prepended with the prefix “Перефразируй” (rus. Paraphrase) and followed with the suffix “»>” to indicate the paraphrasing task
- few-shot: the model is taken as is. Unlike the previous scenario, we give a prefix consisting of a parallel dataset of toxic and neutral sentences.
- fine-tuned: the model is fine-tuned for the paraphrasing task on a parallel dataset.
condBERT
Based on BERT model. This method does not require parallel dataset for training. One of the tasks on which original BERT was pretrained – predicting the word that should was replaced with a [MASK] token – suits delete-retrieve-generate style transfer method. We tested RuBERT and Geotrend pre-trained models in several setups:
- zero-shot where BERT is taken as is (with no extra fine-tuning);
- fine-tuned where BERT is fine-tuned on a dataset of toxic and safe sentences to acquire a style- dependent distribution, as described above.
Automatic Evaluation
The evaluation consists of three types of metrics:
- style transfer accuracy (STA): accuracy based on toxic/non-toxic classifier (we suppose that the resulted text should be in non-toxic style)
- content preservation:
- word overlap (WO);
- BLEU: accuracy based on n-grams (1-4);
- cosine similarity (CS): between vectors of texts’ embeddings.
- language quality: perplexity (PPL) based on language model.
Finally, aggregation metric: geometric mean between STA, CS and PPL.
Launching
You can run ru_metric.py
script for evaluation. The fine-tuned weights for toxicity classifier can be found here.
Results
Method | STA↑ | CS↑ | WO↑ | BLEU↑ | PPL↓ | GM↑ |
---|---|---|---|---|---|---|
Baselines | ||||||
Duplicate | 0.00 | 1.00 | 1.00 | 1.00 | 146.00 | 0.05 ± 0.0012 |
Delete | 0.27 | 0.96 | 0.85 | 0.81 | 263.55 | 0.10 ± 0.0007 |
Retrieve | 0.91 | 0.85 | 0.07 | 0.09 | 65.74 | 0.22 ± 0.0010 |
detoxGPT-small | ||||||
zero-shot | 0.93 | 0.20 | 0.00 | 0.00 | 159.11 | 0.10 ± 0.0005 |
few-shot | 0.17 | 0.70 | 0.05 | 0.06 | 83.38 | 0.11 ± 0.0009 |
fine-tuned | 0.51 | 0.70 | 0.05 | 0.05 | 39.48 | 0.20 ± 0.0011 |
detoxGPT-medium | ||||||
fine-tuned | 0.49 | 0.77 | 0.18 | 0.21 | 86.75 | 0.16 ± 0.0009 |
detoxGPT-large | ||||||
fine-tuned | 0.61 | 0.77 | 0.22 | 0.21 | 36.92 | 0.23 ± 0.0010 |
condBERT | ||||||
DeepPavlov zero-shot | 0.53 | 0.80 | 0.42 | 0.61 | 668.58 | 0.08 ± 0.0006 |
DeepPavlov fine-tuned | 0.52 | 0.86 | 0.51 | 0.53 | 246.68 | 0.12 ± 0.0007 |
Geotrend zero-shot | 0.62 | 0.85 | 0.54 | 0.64 | 237.46 | 0.13 ± 0.0009 |
Geotrend fine-tuned | 0.66 | 0.86 | 0.54 | 0.64 | 209.95 | 0.14 ± 0.0009 |
Data
Folder data
consists of all used train datasets, test data and naive example of style transfer result:
data/train
: RuToxic dataset, list of Russian rude words, and 200 samples of parallel sentences that were used for ruGPT fine-tuning;data/test
: 10,000 samples that were used for approaches evaluation;data/results
: example of style transfer output format illustrated with naive duplication.
Citation
If you find this repository helpful, feel free to cite our publication:
@article{DBLP:journals/corr/abs-2105-09052,
author = {Daryna Dementieva and
Daniil Moskovskiy and
Varvara Logacheva and
David Dale and
Olga Kozlova and
Nikita Semenov and
Alexander Panchenko},
title = {Methods for Detoxification of Texts for the Russian Language},
journal = {CoRR},
volume = {abs/2105.09052},
year = {2021},
url = {https://arxiv.org/abs/2105.09052},
archivePrefix = {arXiv},
eprint = {2105.09052},
timestamp = {Mon, 31 May 2021 16:16:57 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2105-09052.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Contacts
For any questions please contact Daryna Dementieva via email or Telegram.