In the past decade, machine learning (“ML”) has given us tons of convenience, such as self-driving cars, practical speech recognition, effective web search etc. It is much more pervasive today - in most of our eDiscovery cases in the context of investigation or litigation, we would make full use of the power of machine learning. It is a type of artificial intelligence that can learn from sample documents or human input, make educated guesses beyond that and then organize or classify them into different categories. Since Judge Andrew Peck’s 2012 ruling in Da Silva Moore v. Publicis Groupe & MSL Group, machine learning technologies have been recognized as an approved protocol in discoveries by US / UK / Australia courts.

The most common methods we have been using are Email threading, Conceptual Searching and Technology Assisted Review (“TAR”).

  • Email threading

As the name implies, email threading identifies email relationships and people involved in a conversation, group them together and mark out the most representative emails (either with the most conversation content or with unique attachments). It gives reviewers a full picture of what’s going on and reduces the number of documents to review. It’s been used in most of our eDiscovery cases regardless project nature.

  • Conceptual Searching

Unlike the traditional keyword search which looks for the identical same word in document content as a hit, conceptual searching would have “a brain” study the content of each documents, understand their underlying meanings, and then further return broader associated documents based on your input. For example, when you put in a term “song”, the “brain” understands it and would associate it with other terms like “key”, “tempo” and “lyrics”. In real investigation cases, exploring data with conceptual searching might bring you surprises by finding hidden code names.

  • Technology assisted review (“TAR”)

Machine learning for document classification in eDiscovery goes by many names, technology assisted review (“TAR”) or predictive coding. Here we call it TAR.

It studies document concepts combined with human input and then "predicts" how documents should be classified. We can then rely on its prediction to prioritize relevant documents for review teams. As the review is ongoing, the “brain” continuously studies human calls and then feeds back to the document pool for prioritization until no more “useful” documents float up to the surface. The whole process is a continuously learning model that helps improve early case assessment from overwhelmed to organized and further expedites the pace of eDiscovery and review . It has often been utilized for review acceleration, review quality control and privilege determinations.

All above methods have been extensively applied in every aspect of our eDiscovery cases globally. It has been so powerful and helpful in helping our clients largely reduce time and cost spent on document review, then further saving potentially thousands or even millions of dollars in eDiscovery while ensuring the quality and consistency of the analysis and minimizing the risks of error.

With all these fascinating advantages offered by machine learning, you may wonder if it works for local projects and language. The answer is absolutely YES, machine learning has been extensively utilized in various cases in Asia.