Description
a procedure that removes and/or corrects inaccurate information. Also known as "data cleansing/scrubbing", the procedure is mostly used in databases to track inconsistent data which is also known as "dirty data".
What is its significance?
Information scrubbing came about when businesses wanted to improve the quality of achieving accurate date. In correspondence, the higher the data quality, the lower the costs to fix the misinformation and drain on profits. Data quality is data that must meet a certain critera to be accepted. Here are a couple of examples of what data quality needs to have:
How does it work?
Before computers could develop the capabilities to do information cleansing, most of it was done by hand. As a result, manually doing information scrubbing was time consuming, expensive, and prone to more error. However once computers had the capabilities, companies could either create their own software version or buy from companies that specialized in information or data cleansing. Though there may be different verisons of data scrubbing, they follow the same procedure:
Here are some of the different methods used for information cleansing:
If a firm chooses to buy commerical data cleansing software, they can get expensive. The price ranges for workflow execution software goes between $20,000 to $300,000. From this standpoint, many companies debate the cost-benefit analysis of purchasing the information scrubbing software comparied to creating their own. Some companies choose to either create their own or ignore the data cleansing process altogether due from the high costs that would impose on their company. Despite the attempt to lower expenses to achieve short-term savings, most companies impose long-term losses when the errors are not corrected.
How do errors occur?
Data that incurrs errors, also known as "dirty data", can occur in a couple of ways. One is by human error such as customers not putting the right information into a database. Another type appears when transferring information from one database to another. A likely result from this would be from two databases using different formats. Branches of a company may not have the same input standards which would mean at least two branches of the same company have different methods of putting data into their databases. Finally, errors can occur when old systems that have inconsistent and outdated data are not updated.
The overall result of compiling all these types of areas can be significantly costly to a business. For example, take a firm that has inputted some information into their database and a couple of errors occurred. Even though there may be small errors in one database that might seem unimportant to the firm, if merged with others the errors can multiple. Now lets say that there are a multitude of databases and all have small errors inside. When someone wants to converge all of them into one large database such as a data warehouse, all the small errors or “dirty data” multiple into the one massive database. As a result, instead of having small errors in a lot of databases, one database has a many errors which brings down the quality of data represented. If people or other businesses see inaccuracies in a database they use for their own business, they see the incorrect information as potential losses for them in sales and market share. What are the Challenges and Problems using Information Cleansing?
Website Links for more Information:
|