2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon. The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks. In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype. This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.
What is predictive coding?
Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input. In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.
Generally, the technology works by prioritizing the most important documents for review by ranking them. In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.
Why hasn’t predictive coding gone mainstream yet?
Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery. The answer really boils down to one simple concept – a lack of transparency.
Difficult to Use
First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple. Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys. The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.
The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’. In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend. For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user. This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process. Similarly, the process must often be repeated several times in order to improve accuracy rates. Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.
Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document. Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata. This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text. The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.
Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents. For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language. The inability to discern what language led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics. This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents. This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.
Waiver & Defensibility
Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility. Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.
The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent. All these limitations increase the risk of human error. Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents. Similarly, it is difficult to defend a technological process that isn't always clear in an era where many lawyers are still uncomfortable with keyword searches. In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.
Why is 2012 likely to be the year of predictive coding?
The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings. 2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now. In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.
Ease of Use
First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution. New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion. By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded. Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.
Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved. Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party. This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.
Accuracy & Efficiency
2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed. For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked. Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.
Conclusion - Transparency Provides Defensibility
The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility. Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools. In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.