Automatic assignment of microgenres to movies using a word embedding-based approach

  • Carlos González-Santos
  • , Miguel A. Vega-Rodríguez*
  • , Joaquín M. López-Muñoz
  • , Iñaki Martínez-Sarriegui
  • , Carlos J. Pérez
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Streaming services are increasingly leveraging Artificial Intelligence (AI) technologies for improved content cataloging, user experiences in content discovery, and personalization. A significant challenge in this domain is the automated assignment of microgenres to movies. This study introduces and evaluates approaches based on clustering, topic modeling, and word embedding to address this task. The evaluation employs a preprocessed dataset containing movie-related data—title tags, synopses, genres, and reviews—alongside a predefined microgenre list. Comparisons of three activation functions (binary step, ramp, and sigmoid) gauge their effectiveness in augmenting microgenre tags. Results demonstrate the superiority of the word embedding approach over clustering and topic modeling in terms of mean accuracy. Even more, the word embedding approach stands as the sole fully automated solution. Analysis indicates that incorporating review-based tags introduces noise and undermines accuracy. Besides, the word embedding approach yields optimal outcomes using the sigmoid function, effectively doubling assigned tags while maintaining matching quality. This sheds light on the potential of word embedding methods within the movie domain.

Original languageEnglish
Pages (from-to)48719-48735
Number of pages17
JournalMultimedia Tools and Applications
Volume83
Issue number16
DOIs
Publication statusPublished - May 2024
Externally publishedYes

Keywords

  • Activation function
  • Clustering
  • Movie microgenre
  • Semantic similarity
  • Topic modeling
  • Word embedding

Fingerprint

Dive into the research topics of 'Automatic assignment of microgenres to movies using a word embedding-based approach'. Together they form a unique fingerprint.

Cite this