TY - GEN
T1 - Reflect, Reason, Rephrase (R³-Detox)
T2 - 12th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2025
AU - Villate Castillo, Guillermo
AU - Del Ser, Javier
AU - Sanz, Borja
N1 - Publisher Copyright:
© 2025 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/12/24
Y1 - 2025/12/24
N2 - Traditional content moderation, while effective in reducing toxicity through content removal or censoring, can discourage user participation by making them feel restricted or unfairly targeted, especially in nuanced discussions. Text detoxification offers a more constructive alternative by rephrasing offensive language into respectful forms. We propose R3-Detox, a Reflect-Reason-Rephrase framework that structures detoxification into three steps within a single prompt. The model identifies potentially toxic elements guided by Shapley values to reduce fabricated predictions, evaluates overall toxicity, and then revises the text to eliminate toxicity while retaining meaning. We augment three offensive text paraphrasing datasets (ParaDetox, Parallel Detoxification, APPDIA) with explicit detoxification reasoning. Evaluated with in-context learning, R3-Detox outperforms state-of-the-art methods, including instruction following models.
AB - Traditional content moderation, while effective in reducing toxicity through content removal or censoring, can discourage user participation by making them feel restricted or unfairly targeted, especially in nuanced discussions. Text detoxification offers a more constructive alternative by rephrasing offensive language into respectful forms. We propose R3-Detox, a Reflect-Reason-Rephrase framework that structures detoxification into three steps within a single prompt. The model identifies potentially toxic elements guided by Shapley values to reduce fabricated predictions, evaluates overall toxicity, and then revises the text to eliminate toxicity while retaining meaning. We augment three offensive text paraphrasing datasets (ParaDetox, Parallel Detoxification, APPDIA) with explicit detoxification reasoning. Evaluated with in-context learning, R3-Detox outperforms state-of-the-art methods, including instruction following models.
KW - LLM
KW - Reasoning
KW - Self-Reflection
KW - Text Detoxification
UR - https://www.scopus.com/pages/publications/105026857582
U2 - 10.1145/3773276.3774282
DO - 10.1145/3773276.3774282
M3 - Conference contribution
AN - SCOPUS:105026857582
T3 - BDCAT 2025 - IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, Co Located Conference UCC 2025
BT - BDCAT 2025 - IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, Co Located Conference UCC 2025
PB - Association for Computing Machinery, Inc
Y2 - 1 December 2025 through 4 December 2025
ER -