Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0

Iratxe Niño-Adan, Itziar Landa-Torres, Eva Portillo, Diana Manjarres

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)

Abstract

Normalisation is a preprocessing technique widely employed in Machine Learning (ML)-based solutions for industry to equalise the features’ contribution. However, few researchers have analysed the normalisation effect and its implications on the ML algorithm performance, especially on Euclidean distance-based algorithms, such as the well-known K-Nearest Neighbours and K-means. In this sense, this paper formally analyses the effect of normalisation yielding results significantly far from the state-of-the-art traditional claims. In particular, this paper shows that normalisation does not equalise the contribution of the features, with the consequent impact on the performance of the learning process for a particular problem. More concretely, this demonstration is made on K-Nearest Neighbours and K-means Euclidean distance-based ML algorithms. This paper concludes that normalisation can be viewed as an unsupervised Feature Weighting method. In this context, a new metric (Normalisation weight) for measuring the impact of normalisation on the features is presented. Likewise, an analysis of the normalisation effect on the Euclidean distance is conducted and a new metric referred to as Proportional influence that measures the features influence on the Euclidean distance is proposed. Both metrics enable the automatic selection of the most appropriate normalisation method for a particular engineering problem, which can significantly improve both the computational cost and classification performance of K-Nearest Neighbours and K-means algorithms. The analytical conclusions are validated on well-known datasets from the UCI repository and a real-life application from the refinery industry.

Original languageEnglish
Article number104807
JournalEngineering Applications of Artificial Intelligence
Volume111
DOIs
Publication statusPublished - May 2022

Keywords

  • Euclidean distance
  • Feature normalisation
  • Feature weighting
  • K-means
  • K-nearest neighbours
  • Machine learning

Fingerprint

Dive into the research topics of 'Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0'. Together they form a unique fingerprint.

Cite this