PROTOCOL DB: Archiving and Querying Scientific Protocols, Data and Provenance

This project addresses a systemic problem in scientific research: although datasets collected through scientific protocols may be properly stored, the protocol itself is often only recorded on paper or stored electronically as the script developed to implement the protocol. Once the scientist who has implemented the protocol leaves the laboratory, this record may be lost. Collected datasets without a description of the process used to produce them become meaningless; furthermore, the experiment designed to produce the data is not reproducible.

The goal of this research is to design and develop a database (ProtocolDB) to manage scientific protocols and the collected datasets obtained from their execution. The approach will allow scientists to query, compare and revise protocols, and express queries across protocols and data. The proposal also addresses the issue of recording and querying the provenance (the why and where) of data. ProtocolDB will benefit scientists by providing a scientific portfolio for the laboratory which not only enables querying and reasoning about protocols, executions of protocols and collected datasets, but enables data sharing and collaborations between teams.



