Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy

Iratxe Niño-Adan*, Itziar Landa-Torres, Eva Portillo, Diana Manjarres

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Normalization methods are widely employed for transforming the variables or features of a given dataset. In this paper three classical feature normalization methods, Standardization (St), Min-Max (MM) and Median Absolute Deviation (MAD), are studied in different synthetic datasets from UCI repository. An exhaustive analysis of the transformed features’ ranges and their influence on the Euclidean distance is performed, concluding that knowledge about the group structure gathered by each feature is needed to select the best normalization method for a given dataset. In order to effectively collect the features’ importance and adjust their contribution, this paper proposes a two-stage methodology for normalization and supervised feature weighting based on a Pearson correlation coefficient and on a Random Forest Feature Importance estimation method. Simulations on five different datasets reveal that our two-stage proposed methodology, in terms of accuracy, outperforms or at least maintains the K-means performance obtained if only normalization is applied.

Original languageEnglish
Title of host publication14th International Conference on Soft Computing Models in Industrial and Environmental Applications SOCO 2019, Proceedings
EditorsFrancisco Martínez Álvarez, Alicia Troncoso Lora, Héctor Quintián, José António Sáez Muñoz, Emilio Corchado
PublisherSpringer Verlag
Pages14-24
Number of pages11
ISBN (Print)9783030200541
DOIs
Publication statusPublished - 2020
Event14th International Conference on Soft Computing Models in Industrial and Environmental Applications, SOCO 2019 - Seville, Spain
Duration: 13 May 201915 May 2019

Publication series

NameAdvances in Intelligent Systems and Computing
Volume950
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365

Conference

Conference14th International Conference on Soft Computing Models in Industrial and Environmental Applications, SOCO 2019
Country/TerritorySpain
CitySeville
Period13/05/1915/05/19

Funding

Acknowledgement. This work has been supported in part by the ELKARTEK program (SeNDANEU KK-2018/00032), the HAZITEK program (DATALYSE ZL-2018/00765) of the Basque Government and a TECNALIA Research and Innovation PhD Scholarship.

FundersFunder number
Eusko Jaurlaritza

    Keywords

    • K-means
    • Normalization
    • Pearson correlation
    • Random Forest
    • Standardization
    • Weighted Euclidean Distance

    Fingerprint

    Dive into the research topics of 'Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy'. Together they form a unique fingerprint.

    Cite this