Improve data quality with sandbox analytics

November 24, 2015

Experimenting with new data in smaller groups is the key to a successful bimodal BI strategy and high data quality.
To err is human. That applies to love, history, politics … and data management. When integrating data in an ERP system or other back-end data source, there are myriad ways data can be purposefully or accidentally manipulated, outdated, or cultivated in incongruous formats. As a result, any report or analysis delivered by a company’s business intelligence solution may not reflect the realities of the company, its customers, or the industry in which it’s operating 

That’s why data quality is mission critical for a company’s business intelligence strategy. An analysis is only as good as the data behind it. When implementing a BI strategy, data must be accurate, authentic, and trustworthy. Standards of data quality may vary across verticals, but the integrity of the information must be able to stand solidly.

Read about the steps to create a business intelligence strategy designed for your company: the Action Loop.

In recent years BI has become increasingly operational, playing an important role in daily tactical and strategic decision making. Decision makers can’t act unless they know they’re using reliable information to make those decisions.

Do more with bimodal BI 

Today, businesses are outgrowing the silos that had previously been the status quo within the company. In years past, it was easy enough for employees of particular departments to wrap their heads around the data that was relevant to their role. Now that information is shifting externally, it’s touching connections that have previously been undiscovered in daily business life.

As new and strengthening trends, such as big data, take over the BI environment, companies must learn how to handle the unstructured, semi structured, incomplete, and massive amounts of data that is suddenly at their fingertips. A deliberate, strong, and strategic method for dealing with data security and quality is more important than ever.

As I mentioned in my last post, a bimodal BI solution tackles both the necessity of agile, user-friendly analytics and reliability and security of data. A bimodal BI strategy should not only facilitate traditional business operation—the classic data warehouse and continuous decision loops—but also discovery and innovation. Bimodal BI is both centralized (company-wide initiatives) and decentralized (change, innovation, and exploration). A proper business intelligence strategy is designed with flexibility and mass distribution in mind.

Sandbox analytics

But not all data—no matter how potentially useful—should immediately be shared company- or even department-wide until the right experimentation, finessing, and cleansing for quality has been performed.

That’s why I suggest making “sandbox analytics” a part of your BI practice. In other words, break up small, isolated groups to produce, experiment with, and share data before considering wider adoption. These groups should be intra-department to help better shape how this data would affect the greater company. 

For example, the Marketing department will need to share information with the Sales department. Start with a smaller group from both departments when incorporating new and experimental data. With the right BI tool, there’s no need to compromise who gets what information when. Either/or is a thing of the past; it’s about embracing the “and.” Marketing and Sales. Sales and Finance.

The ability to play with big data sets and analyze them on top of what’s already in the data warehouse encourages employees to think strategically without the need to pull in IT, a must when employing a bimodal BI strategy. And a proper data governance model enables powerful sandbox analytics.

The data cleansing strategy

But what about the quality of data that’s being experimented with? As new data is pulled in to the BI system to be analyzed, it likely won’t mesh seamlessly with current ERP system data without first going through a data cleansing.

However, not all data needs to immediately be up to industry or company standards, especially during the period in which users are determining if it’s worthwhile to fully fold into the BI strategy. The process of cleansing data can be a massive undertaking, and should be reserved for data that’s guaranteed to make a positive impact in the business intelligence strategy. 

Because of this, some business intelligence consumers are not comfortable with data that’s not entirely standardized and should be left out of these sandbox security models. When the “sand castle” looks good, that is the time to bring the data quality up to 100 percent and display the results to the rest of the teams.

The trick is to gradually increase the quality of data along the way as data proves to be useful to the organization. This strategy ensures short term ROI compared to what would typically be a massive data cleansing project of new data.

It is, however, a necessary project. Without quality, standardized data, it’s impossible to create repots and analyses that can be trusted or that show the whole truth of a business scenario. Customer data, for example, doesn’t work if there are different versions of that customer within an ERP system. 

I commonly see BI users who are frustrated with their inability to best use data to their advantage. They need to understand the process of connecting, cleaning, and correlating the data to actually use it. Once they fix or establish this process, they can focus on the KPIs knowing they can trust the data.

Get a little dirty playing in the sand, then clean up what matters most to see the full potential of your business intelligence strategy.

Next step: Learn how to incorporate the most important KPIs into your company's data strategy in the Metrics That Matter.

Further bimodal learning: Download the free on-demand webinar, The modern company's new BI strategy.