Imagine you could train and build an Active Learning model on one data set, and then apply the model to an entirely different data set to identify documents most likely to be relevant based on historic work. The result would give your legal team a valuable head start on large and complex document reviews. This is what portable models aim to do.
So, what is a portable model? In brief, a portable model is made up of themes that are weighted according to their influence. If a document in the data set contains several of those heavily weighted themes, then the model will predict that the document is relevant. The ‘machine learning’ determines what these themes and weightings should be, based on human decisions.
A number of organisations who’ve built portable models claim that their effectiveness cumulatively increases the more data that they are exposed to. As the models are trained on documents from one case, then another, and so on they become more intelligent and better able to identify relevant themes.
The premise is clearly attractive. But what about the practical application? To put portable models through their paces, our eDiscovery team carried out two controlled experiments. The first looked at what makes a ‘good’ portable model and the second gauged whether the model would give us a head start on a new set of similar matters. We went into this investigation with an open mind, rather than having hypotheses we wanted to validate. Here’s what we found.
We identified two similar matters where documents had already been manually reviewed and built portable models on each using different combinations of the settings available in the software. To measure performance and compare the model predictions with the control of human decisions, we used industry standard metrics - recall and precision. Recall is a measure of completeness and precision is a measure of accuracy.
During this experiment we observed that when the portable model was applied to a new data set with no human training on the new matter, its recall was high, but precision was low. We also noted that the model contained client-specific information as themes. These needed to be manually ‘cleansed’ before they could be shared and used on another matter.
Portable models can be shared quickly and easily. Without human training, however, we found that precision was poor. The performance of portable models can be improved by including metadata such as dates or file types although an important date for one matter will not necessarily apply to another.
We identified three similar matters where documents had already been manually reviewed. We then built a model and trained it on each of the first two matters consecutively. To gauge the effectiveness of the model, we measured its performance when applied to the third matter. We also attempted training with different proportions of the review population in the third matter to assess the impact this had on the model’s effectiveness.
Similar to the first experiment we noted that while recall was high, precision was low. To achieve the necessary levels of recall and precision, we had to use more than 25% of the population as training (with or without the portable model), this is likely due to the small proportion of relevant documents in the third matter. We observed that applying both a portable model and training made little difference to the performance when compared to conducting the training on its own.
The model didn’t add value over and above what could already be achieved with a standard Active Learning model/workflow.
What do our experiments say about the prospects for portable models? Our findings raise questions over how well these models perform when they come up against practical challenges ranging from data protection to recognising the intrinsic differences from one case to another.
The portable model may contain client sensitive data which must be handled with care. Even if the data is cleansed, action must still be taken to avoid breaching data protection legislation, ensuring that information is only used for purposes for which it was intended.
It would be near impossible to build a portable model that works for all types of cases. For instance, the definition of relevancy in an anti-bribery and corruption matter would be quite different to a data breach. One way around this would be to build a library of portable models for different types of cases. But even where two cases are of the same type, the themes that make up the portable model will include specific information that is not transferable such as dates, individuals and company names.
Possibly, but the limitations this would place on the performance of the model mean that it would deliver little or no value beyond what could be achieved through Active Learning on its own.
In this article we’ve set out our findings. But what about your experiences and perspectives? We’d love to hear your thoughts as part of the ongoing debate on portable models - will these models prove their worth as the technology evolves and improves? We hope so. Feel free to contact a member of the team below to further discuss this topic.