This paper explores "on-the-fly" data cleaning in the context of a user query. A novel Query-Driven Approach (QDA) is developed that performs a minimal number of cleaning steps that are only necessary to answer a given selection query correctly. The comprehensive empirical evaluation of the proposed approach demonstrates its significant advantage in terms of efficiency over traditional techniques for query-driven applications. The significance of data quality research is motivated by the observation that the of data-driven technologies such as decision support tools, data exploration, analysis, and scientic discovery tools is closely tied to the quality of data to which such techniques are applied. It is well recognized that the outcome of the analysis is only as good as the data on which the analysis is performed. That is why today organizations spend a substantial percentage of their budgets on cleaning tasks such as removing duplicates,correcting errors, and lying missing values, to improve data quality prior to pushing data through the analysis pipeline.Given the critical importance of the problem, many efforts, in both industry and academia, have explored systematic approaches to addressing the cleaning challenges
Article Details
Unique Paper ID: 146026
Publication Volume & Issue: Volume 4, Issue 11
Page(s): 1862 - 1866
Article Preview & Download
Share This Article
Join our RMS
Conference Alert
NCSEM 2024
National Conference on Sustainable Engineering and Management - 2024