Structured data is relatively easy to manage; however, confidential or unstructured information—like IT logs, emails, video and other untagged, non-inventoried data—can turn into a monster for CIOs. Imagine what would happen if archives with prescribed data retention policies were haphazardly tossed into the data lake to be forgotten. A client recently related the intention of a Fortune 500 company to do this very thing.
According to Isaac Sacolik, dark data is unstructured information:
"data that is kept 'just in case' but hasn't (so far) found a proper usage."
These massive amounts of untapped, largely unprotected data simply sit there in your data lake, doing not much of anything for the bottom line. Even unused and neglected, dark data can be detrimental to your organization should it fall into the wrong hands, or range outside its owner's control.
Dark Data Has High Potential Value, But Poses a Significant Threat
For organizations willing to allocate the necessary resources to develop and utilize the information locked up inside dark data, this potential is indeed attractive. These organizations must also understand that the information can present serious risk to the health and well-being of a business, too. Dark data stored with a cloud provider either in an active production system, a test environment, or abandoned for disuse or non-payment may be accessible by third parties without your consent or knowledge.
Given the type of data collected by most organizations, those risks can include:
- Legal risk
- Intelligence risk
- Reputation risk
- Opportunity costs
- Open-ended Exposure
Reduce the Risk in Your Data Lake
What can organizations do to mitigate the risk associated with dark data? Consider implementing these strategies and technologies:
- Ongoing inventory and periodic evaluation
- Strong encryption and access controls
- Data retention policies and proper disposal
- Routine audits to manage risk of exposure
Provided the appropriate understanding of both the potential value and conceivable risk, it's possible for organizations to manage dark data. With frequent evaluation and removal of data whose risk outweighs the reward, CIOs can ensure that they're proactively preparing their organization for future productivity and profitability.
[Justin, Lots of crap in this article to dispense with but it's a great warning about the problem of collecting unstructured data in a "data lake." If people store IT logs, emails, customer interactions, etc in the data lake thinking they'll later be useful, they may ultimately be useful in court against them. This article lists some dangers in keeping the data around. Can you please summarize? I was going to play on the Loch Ness Monster/Data Lake combination but, of course, the Loch Ness Monster is unproven so it could be a bad choice. thanks! ]
The Best Kept Secret for IT Professionals.
Learn how we keep you more informed than your competition in half the time.