Part of the collection was more than eight million pages of reports that were bulk scanned with no document boundaries and required unitization. Under court order, special counsel had less than a year to produce an inventory of all MTBE contaminated sites for the entire state and asked NimbleSystems to help.
NimbleSystems developed machine learning models to unitize documents and found MTBE measurements in the collection that topped several million documents. In addition, they extracted and compiled specific site locations by PA DEP ID number, as well as type and amount of MTBE released into the environment. At the end, the inventory included more than 8,000 sites.