Dark data is data which is acquired through various computer network operations but not used in any manner to derive insights or for decision making.[1][2] The ability of an organisation to collect data can exceed the throughput at which it can analyse the data. In some cases the organisation may not even be aware that the data is being collected.[3] IBM estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used.[4]

Organizations retain dark data for a multitude of reasons, and it is estimated that most companies are only analyzing 1% of their data.[6] Often it is stored for regulatory compliance[7] and record keeping.[1] Some organizations believe that dark data could be useful to them in the future, once they have acquired better analytic and business intelligence technology to process the information.[3] Because storage is inexpensive, storing data is easy. However, storing and securing the data usually entails greater expenses (or even risk) than the potential return profit.[1]

In academic discourse, the term dark data was essentially coined by Bryan P. Heidorn. He uses it to describe research data, especially from the long tail of science (the many, small research projects), which are not or no longer available for research because they disappear in a drawer without adequate data management.[8] Without this, the data become dark, and further reasons for this are e.g. missing metadata annotation, missing data management plans and data curators.[9]

