TY - GEN
T1 - Purity
T2 - 8th International Conference on Cloud and Big Data Computing, ICCBDC 2024
AU - Bonilla, Lander
AU - Osa, Maria José L.
AU - Diaz-De-Arcaya, Josu
AU - Torre-Bastida, Ana I.
AU - Almeida, Aitor
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/11/8
Y1 - 2024/11/8
N2 - Data has become an asset for companies, originating from various sources, such as IoT paradigms. It is crucial to safeguard its life cycle using suitable, scalable, and effective technologies, like those enabled by cloud computing models. However, in order to extract value from this data, complementary processes of collection, refinement, cleaning, or modeling, among many others, are required. Furthermore, organizations greatly vary in their methodologies and approaches to handling data, which further emphasizes the need for standardized techniques. In this regard, data management methodologies promote the adoption of the various dimensions of data quality in order to ensure the reliability of data across different systems and processes. The main contribution of this manuscript is the proposal of a new data quality dimension, coined purity, to measure the importance of the data in a processing pipeline topology. As a result, organizations can better guarantee the quality of their datasets in order to raise the success of data-driven endeavors within organizations. The proposed methodology is validated in an urban mobility use case.
AB - Data has become an asset for companies, originating from various sources, such as IoT paradigms. It is crucial to safeguard its life cycle using suitable, scalable, and effective technologies, like those enabled by cloud computing models. However, in order to extract value from this data, complementary processes of collection, refinement, cleaning, or modeling, among many others, are required. Furthermore, organizations greatly vary in their methodologies and approaches to handling data, which further emphasizes the need for standardized techniques. In this regard, data management methodologies promote the adoption of the various dimensions of data quality in order to ensure the reliability of data across different systems and processes. The main contribution of this manuscript is the proposal of a new data quality dimension, coined purity, to measure the importance of the data in a processing pipeline topology. As a result, organizations can better guarantee the quality of their datasets in order to raise the success of data-driven endeavors within organizations. The proposed methodology is validated in an urban mobility use case.
KW - Big Data
KW - Centrality
KW - Computing Continuum
KW - Data Quality
KW - DataOps
UR - http://www.scopus.com/inward/record.url?scp=85215702132&partnerID=8YFLogxK
U2 - 10.1145/3694860.3694862
DO - 10.1145/3694860.3694862
M3 - Conference contribution
AN - SCOPUS:85215702132
T3 - ACM International Conference Proceeding Series
SP - 8
EP - 14
BT - ICCBDC 2024 - 2024 8th International Conference on Cloud and Big Data Computing
PB - Association for Computing Machinery
Y2 - 15 August 2024 through 17 August 2024
ER -