What is the importance of Dark Data in Big Data world?
Dark data is a subset of big data, but it constitutes the biggest portion of the total volume of big data collected by organizations in a year. We will discuss about what opportunities this holds for an organization.
Dark data is a subset of big data but it constitutes the biggest portion of the total volume of big data collected by organizations in a year. Dark data is not usually analysed or processed because of various reasons by companies but that does not lessen its importance in the context of business value. There are two ways to view the importance of dark data. One view is that unanalysed data contains undiscovered, important insights and represents an opportunity lost. The other view is that unanalysed data, if not handled well, can result in a lot of problems such as legal and security problems.
What is dark data?
Organizations gather huge volumes of data which, they believe, will help improve their products and services. For example, a company may collect data on how users use its products, internal statistics about software development processes, and website visits. However, a large portion of the collected data are never even analysed. According to IDC, 90% of the unstructured data are never analysed. Such data is known as dark data. According to Gartner, dark data is “the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes.” Though the categories of dark data may vary across companies, the following categories of unstructured data usually are considered dark data:
- Customer Information
- Log Files
- Previous Employee Information
- Raw Survey Data
- Financial Statements
- Email Correspondences
- Account Information
- Notes or Presentations
- Old Versions of Relevant Documents
Why dark data is handled the way it is?
It is surprising because at the time of data collection, the companies assume that the data is going to provide value. Companies invest a lot on data collection so both monetarily and otherwise, data should be considered important. Here are a few reasons why there is so much of dark data.
Take the example of a bank analysing online applications for credit cards. The credit card marketing team is focused solely on customer details and eligibility but no attention is paid to the data on how the customer arrived at the application page. The unattended data could have provided valuable insights on the usability of the bank website and the application page. But there is no priority assigned to this aspect.
Disconnect among departments
In large organizations, departments have their own data collection and storage processes which may not be known to other departments. So, data, even if relevant to other departments, lie unused. This is a process issue obviously.
Technology and tool constraints
If data collection is done by separate technologies and tools in the same organization, there may be cases that these technologies and tools do not interact with each other because of technological constraints. This prevents bringing all the data together and creating a cohesive picture. This happens especially for companies that have different IT systems and formats. For example, it may be difficult to integrate audio file contents from call center with click data from websites. Companies that are at the early stages of a data analytics program face these problems.
Importance of dark data
It has been stated earlier that there are two ways to view the importance of dark data. We have examined the different perspectives below:
Perspective of opportunity not accessed
The area shown in black in the image below indicates dark data. The image illustrates the notional percentage of dark data that is present at any time.
Dark data represents a huge opportunity for companies to gain valuable insights which can drive their business. Take a look at the following examples:
- Server log files can provide website visitor behavior.
- Customer call detail records reveal customer sentiments and feelings.
- Mobile geo-location data can provide traffic patterns.
Companies are letting go of opportunities by not tapping into dark data. It is also true that they need better processes, coordination and technologies to appropriately use dark data.
Perspective of problems dark data can cause
Dark data can cause legal, financial and other problems if it is not handled well enough. In fact, companies with piling dark data are already staring into issues. Companies could face the following issues with dark data:
Legal and regulatory issues
If the data stored is covered by legal regulations such as credit card data, exposure of such data could throw companies into financial and legal liabilities.
Companies could, through deliberate or inadvertent disclosures, lose proprietary or sensitive data on business operations, products, financial status and business plans. This could adversely impact the business.
Loss of reputation
Companies are viewed as custodians of data they collect. So, any loss of data, especially sensitive and confidential data, can result in a loss of reputation.
If a company decides not to invest in the analysis and processing of dark data but its competitors do, its competitors are more likely to inch ahead in the competition because of the usage of insights from dark data. That is the cost the company is paying because of lost opportunities.
Better ways to handle dark data
Either way you view dark data — as an opportunity or a reflection of problems, you cannot deny its importance. The ideal way to handle dark data is to utilize it well. But that may not be easy, considering the investments needed. Still, there needs to be a start. Unused data may render some of it redundant over time. Also, it is unlikely that all of the dark data will be valuable. So, you should neither toss out all of the dark data nor consider all of it a goldmine. Here are some ways to get the best out of dark data.
- Regularly audit and prune the database. This means that you should be structuring or assigning categories to the old data so that you know what kind of data is stored and where. You do not have to dump that data. With storage becoming inexpensive, there is no need to dump data. Later, you may suddenly need the data and since you have organized the data well, you can find it quickly.
- Apply strong encryption standards on the data. This should be applicable both for data sitting in the in-house servers and the cloud storage. Encryption can prevent a lot of security issues with data.
- Have data retention and safe disposal policies in place. The policies should be aligned with the prescriptions of the Department of Defense. Carefully formulate policies identifying data for erasure or destruction. Good retention policies will help you retain valuable data for later use.
Dark data certainly represents unused opportunities that many companies are letting go of because of process, investment and technology constraints. In a sense, this failure to use dark data also makes big data collection, which is a big exercise, a partial failure. Though the investments needed to tap dark data potential may be costly, the effort is worth the investment. And, even if companies choose to just sit on dark data and do nothing, they are in fact exposing themselves to several risks, as described earlier. The key is to do something about dark data and not treating it as dead, useless thing.