The Rise of Dark Data and How It Can Be Harnessed

Dark data isn’t just a small portion of big data, but the biggest and fastest growing. It holds massive potential for those who can harness it successfully.

By Stephen Mackey, Kefron.

For any business, data is vital. It holds the key to attracting new customers, increased growth, and bigger profits. That’s why big data is big business.

It’s also why there has been a huge rise in the importance of ‘dark data’ in recent years. Dark data isn’t just a small portion of big data. It’s the biggest slice of the pie, and holds a massive amount of potential for those who can harness it successfully.


How do you classify ‘dark data’?

In almost every instance, dark data can be defined as all the information that a business generates, collects, stores, and then never uses again. It’s data that is:

  • Ever-present
  • Unknown
  • Unmanaged

If data is dormant and inactive, it can be classified as dark.

Crucially, it’s data that is forgotten about, and that’s where the danger lies. Any files that go unmonitored and unsecured pose a threat to the business. Data must be kept for compliance reasons, but this often results in sensitive information being stored in file locations that no-one knows about.

When both the location and the contents of files are unknown, they can be easily mined by hackers and used against a company. This can result in legal action, expensive payouts, and lost business.

Dark data can be a security risk and a hindrance to operations when companies are unaware that the data even exists, let alone where to find it. If a business is asked for information but cannot find it quickly and effectively, its reputation is damaged.

But as well as being a problem, dark data is also an opportunity. By its nature, it holds undiscovered potential, unknown insights, and unanalysed value.

What types of data could be dark?

According to a recent IBM study, over 80% of all data is dark and unstructured. IBM estimates that this will rise to 93% by 2020, giving the example that cars will be generating 350MB of data every second, all of which will need to go somewhere.

Dark data is different for each industry and individual company, but common examples include:

  • Spreadsheets (in one study, a business with 1,500 employees had 2.5 million spreadsheets, amounting to billions of cells of data)
  • Multiple old versions of documents
  • Email attachments and .zip files that are downloaded and then ignored
  • Inactive databases and unused customer information
  • Previous employee files and content (e.g. project notes)
  • Analytics reports and survey data
  • Log files, account information and transaction history

Ultimately, it’s data that’s left behind from processes, scattered across every level of a business. It’s disregarded and considered unnecessary by one department, but may be highly valuable to another.

How to use dark data

There are three key steps to getting the most from dark data: finding, reviewing and determining value. Finding it is arguably the most difficult part, and businesses will need to employ multiple methods and resources, such as:

  • Getting administrative access to everything, including all servers, hard drives and any other storage facilities used
  • Searching for all file types and folders
  • Categorising by type and identifying ownerships
  • Reporting on both usage and purpose

From here, companies will be able to start determining the value of this dark data.

Why companies have been reluctant to harness it

For many businesses, at first glance the obstacles to finding and harnessing dark data can seem too big to overcome. They include excuses like legality issues, workflow disruption and architectural costs.

As with any changes, there is a fear that getting access to dark data will interfere with normal business processes and antagonise employees with new ways of doing things. Of course, disruption can be kept to an absolute minimum when done correctly.

Structural changes are required, and this costs time and money. But it’s worthwhile. The key is in the planning and organisation of how to move gigabytes of data from multiple locations into one integrated system, where it’s easy to access and not forgotten about again.

With this comes the need for context. There will be lots of data with a variety of origins, and it’s important to invest in employees who can understand and utilise the information in front of them, bringing further intelligence and insight to the data.

The value in data

It’s worth thinking about dark data via another definition: unfulfilled value. It’s information that can provide knowledge, which in turn can be used to generate profit.

By utilising new technologies around business intelligence and IT tools, companies can join structured and unstructured data sets together to provide high-value results. When done correctly, the benefits will easily outweigh the costs involved with mining dark data.