TY - JOUR
T1 - Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0
AU - Niño-Adan, Iratxe
AU - Landa-Torres, Itziar
AU - Portillo, Eva
AU - Manjarres, Diana
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/5
Y1 - 2022/5
N2 - Normalisation is a preprocessing technique widely employed in Machine Learning (ML)-based solutions for industry to equalise the features’ contribution. However, few researchers have analysed the normalisation effect and its implications on the ML algorithm performance, especially on Euclidean distance-based algorithms, such as the well-known K-Nearest Neighbours and K-means. In this sense, this paper formally analyses the effect of normalisation yielding results significantly far from the state-of-the-art traditional claims. In particular, this paper shows that normalisation does not equalise the contribution of the features, with the consequent impact on the performance of the learning process for a particular problem. More concretely, this demonstration is made on K-Nearest Neighbours and K-means Euclidean distance-based ML algorithms. This paper concludes that normalisation can be viewed as an unsupervised Feature Weighting method. In this context, a new metric (Normalisation weight) for measuring the impact of normalisation on the features is presented. Likewise, an analysis of the normalisation effect on the Euclidean distance is conducted and a new metric referred to as Proportional influence that measures the features influence on the Euclidean distance is proposed. Both metrics enable the automatic selection of the most appropriate normalisation method for a particular engineering problem, which can significantly improve both the computational cost and classification performance of K-Nearest Neighbours and K-means algorithms. The analytical conclusions are validated on well-known datasets from the UCI repository and a real-life application from the refinery industry.
AB - Normalisation is a preprocessing technique widely employed in Machine Learning (ML)-based solutions for industry to equalise the features’ contribution. However, few researchers have analysed the normalisation effect and its implications on the ML algorithm performance, especially on Euclidean distance-based algorithms, such as the well-known K-Nearest Neighbours and K-means. In this sense, this paper formally analyses the effect of normalisation yielding results significantly far from the state-of-the-art traditional claims. In particular, this paper shows that normalisation does not equalise the contribution of the features, with the consequent impact on the performance of the learning process for a particular problem. More concretely, this demonstration is made on K-Nearest Neighbours and K-means Euclidean distance-based ML algorithms. This paper concludes that normalisation can be viewed as an unsupervised Feature Weighting method. In this context, a new metric (Normalisation weight) for measuring the impact of normalisation on the features is presented. Likewise, an analysis of the normalisation effect on the Euclidean distance is conducted and a new metric referred to as Proportional influence that measures the features influence on the Euclidean distance is proposed. Both metrics enable the automatic selection of the most appropriate normalisation method for a particular engineering problem, which can significantly improve both the computational cost and classification performance of K-Nearest Neighbours and K-means algorithms. The analytical conclusions are validated on well-known datasets from the UCI repository and a real-life application from the refinery industry.
KW - Euclidean distance
KW - Feature normalisation
KW - Feature weighting
KW - K-means
KW - K-nearest neighbours
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85127123319&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2022.104807
DO - 10.1016/j.engappai.2022.104807
M3 - Article
AN - SCOPUS:85127123319
SN - 0952-1976
VL - 111
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 104807
ER -