DaLiCo Glossary

data cleaning

data cleaning

Dirty data can lead to incorrect decisions and unreliable analysis. Examples of common errors include missing values, typos, mixed formats, replicated entries of the same real-world entity, and violations of business rules. Analysts must consider the effects of dirty data before making any decisions. It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data.

Source: Dasu T, Johnson T (2003). Exploratory Data Mining and Data Cleaning. John Wiley & Sons.
Online: https://www.wiley.com/en-us/Exploratory+Data+Mining+and+Data+Cleaning-p-9780471268512 

Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.

Wu, S. (2013). A review on coarse warranty data and analysis. Reliability Engineering and System.114: pages 1–11
Online: doi:10.1016/j.ress.2012.12.021.

Non-preferred terms

Related terms

Date of creation
24-Nov-2021
Accepted term
24-Nov-2021
Descendant terms
0
More specific terms
0
Alternative terms
6
Related terms
4
Notes
2
Metadata
Search
  • Search data cleaning  (Wikipedia)
  • Search data cleaning  (Google búsqueda exacta)
  • Search data cleaning  (Google scholar)
  • Search data cleaning  (Google images)
  • Search data cleaning  (Google books)