Experimental Evaluation of Automated and Manual Data Cleaning Systems: A Case Study Using Organizational Data
DOI:
https://doi.org/10.65278/IJTACI.2026.42Keywords:
Data cleaning, automation, missing values, data quality, comparative analysisAbstract
This paper presents a comprehensive comparison between computerized and manual data cleaning methods, using different types of outliers’ statistics, including missing values, wrong figures, and inconsistent formats. This view provides a more holistic comparison between the automated and guidance-based information-cleaning methodologies in terms of their efficiency along the three most important dimensions of information quality. Leveraging a common Kaggle dataset of 50,000 employees’ details, we create a systematic evaluation framework with diagnostic measures, fine-grained diagnostics and resource utilization profiles. The experiments are conducted quantitatively in order to evaluate the cleaning on a given distribution of types of anomalies: 30% missing values (15,000 records), 25% incorrect values (12,500 records) and 45% inconsistent formats (22,500 records). Automated cleaning significantly helped normalize mixed formats (92% success) and regarding invalid statistics (88%). However, manual cleaning was better than automatic methods on complex cases and context-dependent learning, with a 95% accuracy in area-knowledge-required cases. The paper shows clear benefits for both approaches: (i) automated cleansing excels at both speed and cost on large sets, alongside hand-cleaning is useful in difficult examples that require domain knowledge. These findings add value to the literature, by presenting empirical evidence of effectiveness for both approaches and providing firms with a structured, knowledge-based filter for choosing suitable cleaning solutions in line with their actual state of knowledge, needs and organizational structure.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Nagwa Elmobark, Amenah Y Abdzaid, Farqad Alaa, Aymen Saad

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright © by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).


