TY - GEN
T1 - Scalable Data Profiling for Quality Analytics Extraction
AU - Nikolakopoulos, Anastasios
AU - Chondrogiannis, Efthymios
AU - Karanastasis, Efstathios
AU - Osa, María José López
AU - Aroca, Jordi Arjona
AU - Kefalogiannis, Michalis
AU - Apostolopoulou, Vasiliki
AU - Deligeorgi, Efstathia
AU - Siopidis, Vasileios
AU - Varvarigou, Theodora
N1 - Publisher Copyright:
© IFIP International Federation for Information Processing 2024.
PY - 2024
Y1 - 2024
N2 - In today’s modern society, data play an integral role in the development global industry, since they have become a valuable asset for companies, institutions, governments, and others. At the same time, data generated daily, at a global scale, require significant resources to pre-process, filter and store. When it comes to acquiring such stored data, it is essential to understand which dataset fits to the needs of the user beforehand. One particularly important factor is the quality of a dataset, which could be determined based on a series of quality related attributes generated by it. Such attributes constitute “Profiling”, the process of obtaining information from a data sample, related to the complete dataset’s quality. However, in the era of Big Data, the ability to apply profiling techniques in complete large datasets should also be considered, in order to obtain complete quality insights. This paper attempts to provide a solution for this consideration by presenting “DaQuE”, a scalable framework for efficient profiling and quality analytics extraction in complete datasets of all volumes.
AB - In today’s modern society, data play an integral role in the development global industry, since they have become a valuable asset for companies, institutions, governments, and others. At the same time, data generated daily, at a global scale, require significant resources to pre-process, filter and store. When it comes to acquiring such stored data, it is essential to understand which dataset fits to the needs of the user beforehand. One particularly important factor is the quality of a dataset, which could be determined based on a series of quality related attributes generated by it. Such attributes constitute “Profiling”, the process of obtaining information from a data sample, related to the complete dataset’s quality. However, in the era of Big Data, the ability to apply profiling techniques in complete large datasets should also be considered, in order to obtain complete quality insights. This paper attempts to provide a solution for this consideration by presenting “DaQuE”, a scalable framework for efficient profiling and quality analytics extraction in complete datasets of all volumes.
KW - Big Data
KW - Big Data analysis
KW - Data profiling
KW - Data quality
UR - http://www.scopus.com/inward/record.url?scp=85199206369&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-63227-3_12
DO - 10.1007/978-3-031-63227-3_12
M3 - Conference contribution
AN - SCOPUS:85199206369
SN - 9783031632266
T3 - IFIP Advances in Information and Communication Technology
SP - 177
EP - 189
BT - Artificial Intelligence Applications and Innovations. AIAI 2024 IFIP WG 12.5 International Workshops - MHDW 2024, 5G-PINE 2024, and AI4GD 2024, Proceedings
A2 - Maglogiannis, Ilias
A2 - Iliadis, Lazaros
A2 - Karydis, Ioannis
A2 - Papaleonidas, Antonios
A2 - Chochliouros, Ioannis
PB - Springer Science and Business Media Deutschland GmbH
T2 - 13th Mining Humanistic Data Workshop, MHDW 2024, 9th Workshop on 5G-Putting Intelligence to the Network Edge, 5G-PINE 2024 and 1st Workshop on AI in Applications for Achieving the Green Deal Targets, AI4GD 2024 held as parallel events of the IFIP WG 12.5 International Workshops on Artificial Intelligence Applications and Innovations, AIAI 2024
Y2 - 27 June 2024 through 30 June 2024
ER -