E-Discovery, the Internet of Things and the Big Data Landscape
By James Carpenter, CTO and CISO, Texas Scottish Rite Hospital for Children
Big Data – There are many definitions of Big Data and numerous articles concerning big data and E-Discovery. I won’t rehash those; rather, I’ll briefly define Big Data and summarize key considerations.
For the purposes of this article, Big Data is “Very large amounts of raw data transformed into business relevant and actionable information through data management techniques and analysis.” As it pertains to an effective E-Discovery program, organizations will need to understand their Big Data processes in the context of the various E-Discovery phases and litigation demands.
Identification and Preservation – Consider how your organization produces and uses big data. Consider how your organization is going to preserve the data and represent it in the state it was in at the time of litigation hold and in the same manner it was presented and interacted with.
Collection and Production – Consider how your organization is going to collect this data in light of how it will produce the data. Remember that some of the data is going to be processed, reviewed, and filtered. Excluding some pieces of data may, by the nature of the data linkages, also remove the supporting elements that provided your business insight. Keep these linkages intact to preserve the original knowledge-design intent.
Review and Analysis – Consider how your organization is going to review and filter big data. Big data is an ‘ecosystem of data’ that converges through data management into business relevant and actionable information. It’s that very end product that is likely contemplated in litigation. Your organization will need to consider the parts and pieces of data that were blended together to produce that result. Expect much discussion about process.
Understand your organization’s big data processes by documenting the data flows and how they combine to form business valuable data. In many cases this will be an organization’s secret sauce; handle it with care
Roughly defined, IOT is the connection of “everything” to the Internet – your thermostat, your refrigerator, your car, your phone, and the little dongle on your key chain keeping track of your keys for you. IOT is increasing the amount and ingress rate of Big Data. All of these connected devices are feeding data to somewhere – even if only to assist the manufacturer with preventative maintenance and usage statistics. These IOT activity logs can be combined with other data to form meaningful business data. These logs and refined data created by IOT can also become subject to E-Discovery if such devices fall under the scope of litigation.
One of the key concerns is the accessibility of source data as it pertains to IOT. Is there a log file on a connected thermostat? How does one extract such data? Do these connected devices have syslog servers that can send their data to a central location and get normalized? Probably not – at least for now. Does your organization own the connected device? Maybe it’s an employee’s personal device.
Another concern is representation of data from IOT devices for E-Discovery. Again we get back to the value of data that is created through linkages and associations. The raw data of IOT devices alone can be nominally valuable. There is an ecosystem of other data that, when combined with IOT data, an organization is used to create value.
Key Takeaways: Understand your organization’s big data processes by documenting the data flows and how they combine to form business valuable data. In many cases this will be an organization’s secret sauce – handle it with care. Knowing and documenting this will help understand what is in and out of scope in litigation. It will also help negotiate terms of during the early phases of data discovery.
Clean up your organization’s E-Discovery house now – Before litigation surrounding Big Data occurs, shore up data management practices; ensure documented separation of production and non-production systems tied to decision making, and establish clearly defined systems of record. Policies and procedures related to document retention and classification should be put in place. Getting the basics in place will create a strong foundation for even the most complex E-Discovery case.
Consider centralized log collection facilities for IOT devices. Normalize data prior to submission to other downstream data management processes. This will help segment E-discovery scope.
With the geometric increase in E-Discovery volume and diverse data types, plan on using automated analysis for discovery. Humans will be no match for future volumes of data. Products now exist which will do automated reviews. This saves time but more importantly shifts the role of humans to performing validation checks that ensure the algorithms for identifying relevant data are accurate.
Review your backup technology. If you’re using legacy backup solutions such as disk to tape and shipping your tapes offsite you may be in for a big surprise cost during litigation. Restoring data from tapes to search for documents during discovery is very costly. Modern backup solutions contain indexes and cloud based storage, which provides E-discovery search and data retention capability across your backups. In addition to faster backups and easier searching, organizations can achieve lower costs by eliminating costs associated with offsite physical storage.