What is dark data?

Mikkel Oldenburg
Mikkel Oldenburg
3 min read

With today's powerful data discovery tools, companies have more options than ever before in regards to the complexity, amount, and origin of the data they analyze. Indeed, it seems that many companies just can't get enough.

There are two options companies have when searching for more data to analyze:

  1. Expand data discovery into the world of external data
  2. Search for insights that are hidden within the existing data

The data that already exists in your company but is not currently being tapped for greater insight and analysis is known as dark data

Many are excited by the idea of big data. It’s exciting! It’s sexy! But here’s why you should instead first focus on bolstering your current BI strategy with the dark data that exists within your company.
 
For the record, TARGIT Decision Suite supports both integrating and combining data from your data warehouse as well as external data from almost any data source imaginable. If you want external data, you got it. But first we want to be sure you’re placing enough value on the useful data your company is producing.
 

Not another buzzword


Let’s say you’re a manufacturing company that is humming along with your business processes. You have a solid BI and Analytics strategy set in place that delivers the daily reports and analyses you need to measure, monitor, and make decisions. But then something happens. A quality issue has suddenly put you at risk of losing a valuable customer. Your existing analysis flags the problem: One of your production lines is less effective than the others. But your analysis doesn’t give you the information you need to figure out why.

It’s easier to figure out how to harness the unused data you already have than diving into the world of big data, 99.9% of which is irrelevant to your business in the first place. That’s not to say harnessing internal data doesn’t involve a little bit of trial and error and experimentation. That’s the BI lifecycle. In particular, the practice of sandbox analytics within the BI lifecycle is designed to help bring dark data into the light.
 

BImodal BI

 

With sandbox analytics, a small group of BI users experiment with potentially useful data. If that data does, in fact, prove valuable, only then is it distributed for greater use throughout the organization. The ability to play with big data sets and analyze them on top of what’s already in the data warehouse encourages employees to think strategically without the need to pull in IT. This is bimodal BI.

Let’s go back to the example of that manufacturing company that doesn’t have insight into their quality issue. There are a number of possible reasons for a sudden dip in quality product reaching the customer. Perhaps it’s a worker satisfaction issue that is causing an increase in mistakes. Could it be a shift time issue? Would this have to do with the hours of the shift itself or the manager of that shift? Can it be narrowed down by employee? What about the supply chain?

With these hunches in mind, it’s time to dig into existing data to see what, if anything, supports these hypotheses. Let’s go with the inkling that a drop in quality might have to do with an issue regarding shifts. The current dashboard that displays employee shift data only includes the hours per week that each employee has worked, and doesn’t tell you which shift those hours correspond with. The company’s HR system tracks this, but that type of data isn’t currently set up in an existing data model for analyses. Not to worry.

 

Data discovery

 

A comprehensive data discovery tool will allow users to pull data that isn't already available for analyses and lets users mash it up with the data already being used in current reports and analyses. By mashing up the time each employee on this particular production line clocks in and out each day along with the data already used to analyze shift data across the company shows that these employees are alternating between day and evening shifts much more frequently than employees on other teams.

This type of shift switching seems to not only affect productivity, but likely also causes general fatigue and overall employee dissatisfaction, likely the root cause to this dip in overall effectiveness. There is an evident negative correlation between shift times, employee satisfaction, and production quality. This previously dark data can be moved into an existing data model now or at a later time if so desired.

Having identified the problem, the company can now fix it and, of course, monitor the progress with BI. From here, a scorecard is created to monitor hours, shifts, and product quality across all teams. Now that a root cause is in focus, new best practices can be applied across all production teams to improve even already high performing teams.

Uncovering a little bit of dark data in a relatively short period of time – days, as opposed to weeks or months that it would realistically take for a business analyst to create a potentially valuable analysis with new, external data – adds tremendous value to the BI project as a whole.

Shining light on the right dark data can elucidate more than you might originally think. So next time you need to go digging for more data, first consider flipping on your flashlight.

What to become a data-driven organization?

DOWNLOAD GUIDE
Originally published August 15, 2016. Updated March 27, 2020

Mikkel Oldenburg
Written by

Mikkel Oldenburg

Business Development Executive
Related Content