The Truth About Predictive Coding: Getting Beyond the Hype


If you’re reading this blog post, chances are you are an e-discovery practitioner, the keywords “predictive coding” caught your eye, you’re cautious about computer-assisted review (CAR), and curious about the constant hype that hangs over this topic.  You’re not alone!  One of the most highly acclaimed sessions at CEIC® 2015 (now Enfuse®) was the session called “The Truth About Predictive Coding: Getting Beyond the Hype.” 

Beyond the hype, this lecture laid out some practical reasons why predictive coding is not catching on faster and is being used in only a minority of cases. It went on to explode some of the myths about practical CAR approaches and how you can leverage the power of analytics and predictive coding today. 

Part of the appeal in this session was in the lecturers themselves:  litigation heavy hitters David Cohen, a partner with Reed Smith LLC and the firm’s Practice Group Leader and Bryon Bratcher, also with Reed Smith serving as Director of Litigation Technology Services.  These men legitimized the legacy of the Enfuse conference and its ability to attract the most highly acclaimed presenters who are industry leaders in digital investigations.

If you missed this popular lecture, you can read a brief summary of it in this blog and download the complete slide presentation here: The Truth About Predictive Coding: Getting Beyond the Hype.  We’d also like to remind you to register early for Enfuse 2016, where you can hear similar topics to maximize your EnCase® eDiscovery solution and reduce costs and risks while streamlining your e-discovery process. 

Predictive Coding and the Myths That Undermine It

Predictive coding is an industry-specific term used to describe a CAR process—also known as technology-assisted review (TAR)—involving the use of machine-learning algorithms and statistical probability tools used to duplicate human decision-making.  The software determines the relevance after training by a human reviewer and identifies properties to predict future coding. 

The team at EDRM.net has prepared a Computer-Assisted Review Reference Model (CARRM) to document the steps of the process:


Interest in predictive coding, as evidenced by the high attendance at this session in the Enfuse track “E-Discovery: Legal Issues, Technical Challenges and Solutions,” is on the rise because sometimes the volume of documents and/or value of a case make human review impractical.  According to Cohen and Bratcher, predictive coding can bring:

  • Cost savings
  • Time savings
  • Reduced risk of errors
  • Greater objectivity in classifications
If the value of CAR is getting recognized and interest is increasing, why is human review still much more prevalent than computer review? Cohen and Bratcher set out to debunk the top three myths they believe are pushing back on predictive coding.

Myth #1: Computer review will never be as accurate as human review.

In 2012, Judge Andrew Peck issued his opinion in Da Silva Moore v. Publicis Groupe & MSL Group 287 F.R.D. 182 (S.D.N.Y. 2012), noting:

“…while some lawyers still consider manual review to be the ‘gold standard,’ that is a myth, as statistics clearly show that computerized searches are at least as accurate, if not more so, than manual review.”

The presenters discussed some of the studies cited by Judge Peck, including one study, in which 28,209 documents were reviewed by seven different reviewer groups, revealed inconsistency in 57 percent of the human-reviewed results. They noted, however, that there is a shortage of studies comparing any well-designed/well-supervised human review to predictive coding in realistic litigation situations. 

Myth #2: Computers will replace all attorney review.

Cohen and Bratcher made a clear point that there is no such thing as computer-assisted review that does not at least rely on a human review element.  Moreover, a number of barriers to using predictive coding ensure that CAR will not entirely supplant attorney review anytime soon.

Those barriers include:

  • Limited if any cost savings in cases with fewer than 20,000 documents requiring review
  • The frequent desire or necessity for human review of the production set for privilege screening and to know what is being produced
  • The risk of spending more fighting with the other side about predictive coding than the predictive coding could save.
  • The time and expertise required to train the predictive coding engine (i.e., often partners or other busy and high-rate members of the trial team in order to achieve optimal results).
  • The problem of multiple cases—if the same documents need to be analyzed for multiple cases/jurisdictions and even one judge does not permit CAR or equivalent techniques, all of the documents will require human review anyway.
  • Unsympathetic judges/discovery masters who do not understand and/or are not willing to approve new and imperfect methods of identifying relevant documents.
  • Danger of losing the narrowing and cost savings of key word filtering where opposing parties and/or judges insist on the predictive coding being applied to the starting universe of documents in lieu of any keywords.

Myth #3: We can’t use predictive coding software because our opponents won’t agree to it.

Magistrate Judge Andrew Peck, revisited his landmark decision in De Silva Moore three years later in Rio Tinto PLC v. Vale S.A. 14 Civ. 3042, (RMB) (AJP) (S.D.N.Y. March 2, 2015), stating:

“The case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.”

While there is not yet much case law on predictive coding, Cohen and Bratcher noted that most of what is there has been fairly positive.  Specifically, In searching case law for “predictive w/2 coding,” they found about 34 cases:

  • 12 positive references, in commentary or tone
  • 18 neutral references
  • Often judicial approval of proposed ESI protocols
  • Four that used the term in a non-ESI context
  • No cases that disapproved the use of predictive coding

Cohen and Bratcher Provide More Ammunition to Use Predictive Coding 

If you want to learn more about using predictive coding and arm yourself with hard facts in recent case studies and court decisions, click here to download the complete presentation by David Cohen and Bryon Bratcher: The Truth About Predictive Coding: Getting Beyond the Hype.  

Don’t forget that you can attend other top-notch sessions like this one at Enfuse 2016 in Las Vegas, May 23-26, 2016. Enfuse brings the power of hands-on labs, learning sessions, and networking events together in a way that will take your work—and your career—to a whole new level. 

Click here to learn more about Enfuse and how you can save over 40% off the regular conference registration fee if you act by November 30, 2015.

No comments :

Post a Comment