Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

On the design and tuning of machine learning models for language toxicity classification in online platforms

  • Maciej Rybinski*
  • , William Miller
  • , Javier Del Ser
  • , Miren Nekane Bilbao
  • , José F. Aldana-Montes
  • *Autor correspondiente de este trabajo
  • University of Málaga
  • Anami Precision
  • Basque Center for Applied Mathematics

Producción científica: Capítulo del libro/informe/acta de congresoCapítulorevisión exhaustiva

1 Cita (Scopus)

Resumen

One of the most concerning drawbacks derived from the lack of supervision in online platforms is their exploitation by misbehaving users to deliver offending (toxic) messages while remaining unknown themselves. Given the huge volumes of data handled by these platforms, the detection of toxicity in exchanged comments and messages has naturally called for the adoption of machine learning models to automate this task. In the last few years Deep Learning models and related techniques have played a major role in this regard due to their superior modeling capabilities, which have made them stand out as the prevailing choice in the related literature. By addressing a toxicity classification problem over a real dataset, this work aims at throwing light on two aspects of this noted dominance of Deep Learning models: (1) an empirical assessment of their predictive gains with respect to traditional Shallow Learning models; and (2) the impact of using different text embedding methods and data augmentation techniques in this classification task. Our findings reveal that in our case study the application of non-optimized Shallow and Deep Learning models attains very competitive accuracy scores, thus leaving a narrow improvement margin for the fine-grained refinement of the models or the addition of data augmentation techniques.

Idioma originalInglés
Título de la publicación alojadaStudies in Computational Intelligence
EditorialSpringer Verlag
Páginas329-343
Número de páginas15
DOI
EstadoPublicada - 2018

Serie de la publicación

NombreStudies in Computational Intelligence
Volumen798
ISSN (versión impresa)1860-949X

Huella

Profundice en los temas de investigación de 'On the design and tuning of machine learning models for language toxicity classification in online platforms'. En conjunto forman una huella única.

Citar esto