Part Two: Why TAR Is Important to Narrowing Your Search
If you are facing an ESI agreement that specifies search terms as the way to find relevant documents, rethink this agreement before you sign it! While it’s important to narrow down the collection of documents, using search terms to do that isn’t the best option. TAR models find documents based on learning algorithms and use examples of documents to locate and record new occurrences. This is more reliable and will ensure you are getting to the data you need. If you only use search terms when looking for relevant documents, you will run into problems such as:
- Playing without the Full Deck
The search will only look for the specified term or combinations of terms. The more terms used in combination with each other, the fewer occurrences will be found. If you add proximity and terms found through concept searching, the result set narrows even more. You are only accessing a very tiny percentage of the features in a document. TAR reviews every word in every document as features.
- Poor Images and Bad OCR
Under- and overinclusivity are exacerbated if much of the production comes in as images and OCR. Poor OCR text derived from grainy images can throw off query results. In many types of documents, including maps, tables and handwriting, the OCR can be quite corrupt with many garbage symbols in the text, thereby impeding any text searches.
- Expertise and Intuition Ignored
Subject matter experts and legal professionals who use term searches rarely, if ever, rely on the hit summary to determine the value of a document. They rely on the entire document. TAR automatically evaluates the entire document. Using this method better reflects the highly refined and developed experience and intuition of the professional.
- Arbitrary Term Choices
Even the best professional will admit the choice of one term over another is just guessing a lot of the time. Why does one get on the list while the other does not? There is no way a canned or favorite list of terms can beat a TAR model that can learn from examples and adjust on the fly to new situations.
- No Measures of Success
If you run a straight search, all you get is the number of documents returned. There are no statistics on the success of the collection. TAR provides a battery of measures so you can see from run to run how well you are doing. For example, your result set may get smaller, but your accuracy and recall may go up. This indicates the model is getting stronger. Searching has no such flexibility or capability.
The undercounting of relevance affects both requesting and producing parties. If you are requesting, your case will be adversely affected if you lack key relevant documents. If you are producing, there remains the possibly of complaints and motions to the court on the subject of underproducing.
For how to deal with search term search requests, read part one of this blog post series: “Underreporting Pitfalls.”