In all the hype around Big Data, it is often forgotten that you still need to spend time cleaning data so you can use it efficiently.
We just published an article in Supply Chain Digest highlighting this fact. Our article points to two interesting references that back this up: a blurb from a Freakanomics podcast and a link to a nice New York Times article on the importance data cleaning.
We see this issue in all the projects we work on– from data science & machine learning problems to optimization. You need to plan on time to clean data.
Here is the last few lines in our article:
…the need to answer questions with data won’t go away and access to new data sets won’t go way. Instead of worrying about the difficulty of getting clean data, build skills on your team so you can create clean data sets and come up with new insights faster than the competition.