Dirty data can lead to incorrect decisions and unreliable analysis. Examples of common errors include missing values, typos, mixed formats, replicated entries of the same real-world entity, and violations of business rules. Analysts must consider the effects of dirty data before making any decisions. It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data.
Source: Dasu T, Johnson T (2003). Exploratory Data Mining and Data Cleaning. John Wiley & Sons.
Online: https://www.wiley.com/en-us/Exploratory+Data+Mining+and+Data+Cleaning-p-9780471268512
Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.
Wu, S. (2013). A review on coarse warranty data and analysis. Reliability Engineering and System.114: pages 1–11
Online: doi:10.1016/j.ress.2012.12.021.