Data cleansing strategy

Søren Block Olsen
Søren Block Olsen
4 min read
We're not ready for BI because our data is a mess.

 

We hear this statement in the BI industry all the time, so that's why we're outlining a strategy for you to know the right systems and standards to set in place to finally have clean data.

The data within your data warehouse is a mess, or you don’t have a data warehouse at all. And decision-makers are asking questions, and you're working very hard to find the answers. We’ve got news for you: It’s time to make your job easier.

 

Stop stalling

 

It's time to get started with BI. Here are three reasons why starting a BI strategy is your fastest and best bet to finally have clean data. 

 

One: It's easier to identify what is inaccurate or missing

 

“Our data warehouse is a mess. We need to clean the data first before starting a BI project.” No, you don’t. Even the most well-oiled enterprise humming away with their analytics projects and decision loops started with an incomplete, inconsistent, or otherwise flawed data warehouse. And guess what: That same company is still cleaning their data. Because I’ve got news for you, the process of data cleansing never ends. Companies just learn how to manage it more effectively.

The best way to start the data cleansing process is by first identifying what is inaccurate or missing. And the only way to identify that clearly is through a report or analysis generated from a BI platform. That’s because data quality is visual. That’s also why data cleansing with nothing but Excel on your side is dangerous. Without first understanding the BI process, it’s impossible to know what to look for in a data cleansing project to foster an environment of data optimized for analyses.

For example, an analyst might look at the data warehouse and know there are 100,000 records to scrub, but they won’t know exactly in which way or why. But if they are given an incomplete sales report that doesn’t display revenue because the sales region was entered incorrectly or not at all, they’ll better be able to tackle those 100,000 records knowing exactly why, what, and how.

 

You can’t start cleaning if you don’t know what’s dirty.

 

The problems must first be seen in order to be corrected. You can’t start cleaning if you don’t know what’s dirty. “Dirty data” is a position. “Missing region” is a concrete problem with an attainable solution. The moment that problem is revealed is the moment your company can start strengthening the building blocks of BI. Once the new process established by BI is put into place, changes can be made along the way to continuously strengthen the data warehouse.

 

Two: It helps establish business process

 

We’re afraid to invest in a big project before we have the right processes put in place.

 

Traditional BI platforms required companies invest in the platform before value was delivered. But today's modern BI landscape allows for a partnership of investment and return. That’s because modern, bimodal BI solutions are designed for the type of experimentation and evolution I’ve described above, in other words, “sandbox analytics.” A BI strategy that includes sandbox analytics is the easiest way to improve data quality with fast ROI.

Technology like data discovery makes it possible to play with data outside the data warehouse, determine its value, and then bring that data up to standards if it's found worthwhile. This allows analysts to gradually increase the quality of data along the way as data proves to be useful to the organization.

The evolution-style strategy ensures short term ROI compared to what would typically be a massive data cleansing project. By thinking big and long term, companies can start small by investing in a BI tool that allows them to play with data inside a BI platform for a fraction of the cost of a full scale solution.

In this way, dirty data can be identified within days, not weeks or months. Once the data that is most valuable to the company has been identified for cleansing, that process can begin and guidelines and standards can be put in place across the organization to ensure data is input correctly going forward.

Once the company has seen the value of even a small amount of BI, it’s far more likely further investment will be planned. As the BI strategy matures, an increasing amount of tools and capabilities can be added to the BI environment to spread analytics to every role throughout the entire organization.

 

Three: It doesn't requiere a data warehouse

 

We don’t even have a data warehouse.

 

Good news! You don’t need one! Advances in in-memory technologies and the BI platforms that support them make it possible to fully utilize the power of a BI solution without investing in a data warehouse.

In addition to fast ROI, there are other significant benefits to launching BI without the confines of a data warehouse. Work process speeds are significantly increased, as users can load data in seconds. In-memory is also generally more user-friendly than a typical data warehouse setup. It doesn’t take significant training to acquire the skills needed to work directly in the system.

Most importantly, in-memory systems allow for near real-time experimentation with data from a variety of sources. That can be anything from Excel spreadsheets living on a desktop to big data repositories such as Hadoop. This flexibility also allows for easy scaling. Enterprises looking for the standardization of the data warehouse set up can scale upwards with their BI when the time is right.

 

Time to begin the BI journey

 

Every day that goes by in which decisions are made based off of hypothesis or assumptions is a lost day of making better decisions.

We've decades of experience of BI implementation, and we've never come across a case in which a company wasn’t surprised by the insight that was unveiled on the first day they gained real access to their data – regardless of whether or not they have access to everything they needed just yet. That means up until that point, their decisions were flawed.

Every day that goes by in which decisions are made based off of hypothesis or assumptions is a lost day of making better decisions.

Ready to become a data-driven organization?

DOWNLOAD GUIDE
Originally published March 15, 2020. Updated March 26, 2020

Søren Block Olsen
Written by

Søren Block Olsen

Director of Marketing & Sales Operations
Related Content