Building Transparency— A Bottom-Up View
How strategically integrating informatics tools across a drug discovery workflow can improve transparency, trust and productivity.
This is Part II of a two-part series.
The goal of an informatics project implemented in the drug discovery analytical group at Lundbeck Research was to gain efficiencies and productivity by increasing transparency, improving data quality, and building trust. A willingness to embrace transparency is critical to the success of an organization, and an open environment in which data are readily shared and reviewed and in which the quality of the science is of utmost importance nurtures a corporate culture built on transparency and trust. Part I of this article (Lab Manager Magazine, June 2012, “Building Transparency— A Top-Down View”) presented the underlying principles and main objectives of this project. Part II describes the selection and implementation of software tools, their integration into an efficient and effective informatics network, and the outcomes and lessons learned.
The overall objective of this project was to gain efficiency by linking analytical data to chemistry and biology via electronic laboratory notebook (ELN) systems while they were in the process of being deployed. The vision was to create bidirectional pipelines for analytical data to flow to/from the ELN and proprietary corporate compound/ pharmacology databases and to have these pipelines in place before deployment of the ELN. The project also provided an opportunity to improve data organization, workflow efficiency, and transparency.
The drug discovery analytical group at Lundbeck employs multiple tools and methods to determine the composition, structure, and purity of drug compounds; measure physico-chemical/ADME properties; perform mechanistic and other complex bioanalyses; and evaluate the solubility, stability, and other characteristics of drug compounds in various formulations. These functions encompass a broad range of analytical techniques, including liquid chromatography/mass spectrometry (LC-MS), supercritical fluid chromatography (SFC)/MS, gas chromatography (GC)/MS, MS-MS, automated pH (pKa)/Karl Fischer (water content) titration, and nuclear magnetic resonance (NMR). At Lundbeck, these analytical systems feed data into three main informatics platforms: Empower, used mainly for quantitative analysis and LC peak integration; MassLynx, for applications related to mass spectrometry; and NuGenesis® Scientific Data Management System (SDMS), which makes data readily viewable and was the conduit for transferring data from multiple instruments to the ELN.
Using only native software capabilities
Medicinal chemists and compound management groups use LC-MS to determine and verify the molecular weight and purity of compounds. The use of dual UV detectors improves the overall quality of science; it increases the dynamic range and reduces the need to adjust sample concentration. The results from both detectors are depicted on one screen for ease of comparison (Figure 1). OpenLynx, used in Open Access (OA) mode, automatically prints the chromatographs and accompanying computational analyses to SDMS. A laboratory using OA should be fully integrated in terms of capabilities, location, and electronic delivery of results, as is the case for all chemistry-related data at Lundbeck, regardless of hardware/software source. All processed data are printed into SDMS and available for immediate viewing. Ongoing support for maintenance of instruments and training as well as ongoing communication are essential.
Figure 1. Immediate visualization of Open Access LC-MS data via SDMS Vision.
Trust within a drug discovery group is especially critical with regard to the purification of experimental compounds. If a medicinal chemist, instead of a purification expert, insists on performing the purification, it can cost as much as ten-fold more in scientist time. Figure 2 illustrates a purification run in which an evaporative light-scattering detector (ELSD) is used to quantify the mass of a compound being collected and in which a UV detector monitors the waste stream. The purification data generated in FractionLynx are automatically printed to the SDMS in real time, injection by injection; in this way, the medicinal chemist can watch the purification in progress, know when it is completed, and see that none of the target compound is lost. The report generated records the mass of the target peak on completion of the purification and identifies the rack location of the fraction containing the compound. This transparency has been pivotal in establishing the crucial trust that must exist between medicinal chemists and purification scientists.
Figure 2. Immediate visualization of LC-MS-based purification data: the combination of quality of science in the experimental design and complete transparency of the data builds strong trust, which has high positive impact on speed and efficiency.
For bioanalyses that require difficult separations (including chiral compounds) using LC-MS-MS and SFCMS- MS, the analytical group uses MassLynx for data acquisition and instrument control. The data are automatically converted using the Waters Data Converter, and Empower performs automated peak integration, determination of detailed system suitability parameters, and report generation, enabling rapid evaluation of data quality and assessment of assay performance (Figure 3). Real-time data acquisition and reporting allow analysts to monitor the integrity of the separations and to identify potential deterioration as a function of time. The data can provide evidence that assay performance is declining well before it would be evident to the naked eye.
Figure 3. Immediate visualization of SFC-MSMS bioanalytical data, including system suitability parameters to monitor and ensure the highest-quality data is generated.
Adding customized software capabilities
Customization of the software and links to other laboratory systems such as sample/pharmacology databases or ELN can substantially improve the workflow efficiency. Many of these desirable features are not currently available in native software. Although they are beginning to gain attention from software vendors, it could be years before they arrive as off-the-shelf products. They can be realized sooner by execution of targeted mini-projects aimed at improving workflow efficiency. For example, by linking data acquisition software to proprietary databases, it is possible to prepare and import sample lists for instruments simply by reading the barcodes on samples. This streamlines assay sample processing. Automatic electronic updates of the sample identification lists also safeguard experimental accuracy against human entry errors.
These capabilities are especially important when performing universal processing of all compounds to assess their ADME properties and when processing selected subsets of compounds to run against different project-specific assays. It is not efficient to process ADME assays on a projectby- project basis. Instead of processing ADME assays on bar-coded tubes separated into racks by project, racks are backfilled to perform these assays. The tube bar codes are then rescanned to generate an updated sample list generated from the sample database, automatically creating instrument analysis lists using an add-in present in Empower. Each sample rack can also be assigned to a specific instrument and undergo automatic processing according to a menu of assay templates. The same is true for MassLynx, OpenLynx, and FractionLynx, as shown in Figure 4.
Figure 4. Scanning sample bar codes to automatically generate sample analysis (assay) lists for any instrument type (MassLynx shown).
Once assay data are recorded they need to be processed (peaks located and integrated) and converted into assay readout values. For truly high throughput assay output, customized tools to process and calculate values and create assay output sheets for upload to pharmacology databases can enhance productivity. Internal use of the development tool kit for native analytical software is particularly useful. Tools can be created that flag specific analyses with peak integration/ selection issues, facilitate reprocessing and recalculation of results, and upload assay values, all within one application. Figure 5 depicts an example for Empower used to process ADME assay results. This tool allows routine processing of >6,000 chromatograms per month, fully profiling >600 compounds in seven assays, with consistent two-day turnaround performed by 0.5 FTE.
Figure 5. Flagging specific analyses with peak integration/selection issues, reprocessing raw data, recalculation of assay results, and upload assay values—all within one custom Empower SDK application.
Medicinal chemists can more easily interpret the mass spectral component of OA LC-MS data generated for reaction monitoring using a customized program designed to intercept the data in OpenLynx. The program processes the MS data without printing it, considers all of the peaks in each MS spectrum, and identifies the molecular weight indicated in each spectrum using all the adduct peaks. The molecular weight, calculated based on both the positive and negative ion mass spectra, is inserted into the OpenLynx report at the top of each +/- spectra pair (Figure 6) and printed to SDMS. This approach adds value to the data because the medicinal chemist immediately knows the molecular weight associated with each chromatographic peak and can effectively utilize the OA LC/MS resources without advanced knowledge of mass spectral interpretation.
Time is critical when performing reaction monitoring and making real-time decisions. Emailing data to the scientist saves time by allowing the data to find the scientist. Lundbeck collaborated with Waters to create an automatic email tool for SDMS. The email contains all text fields captured by SDMS and a pdf of the full data report with the batch ID (lab notebook # or compound #) as the file name. The scientist can simply drag and drop the pdf into an ELN. At Lundbeck, the drag-and-drop feature is used approximately 30,000 times/year, at a savings of about one minute/use compared with the usual file import approach. This equates to an organization-wide savings of about 0.25 FTE/year, which recovered the initial cost of all SDMS components in the first year.
Figure 6. Automated mass spectral interpretation uses all peaks in positive and negative ion spectra to determine molecular weight for each chromatographic peak and inserts the information into the report.
A summary report containing all the data generated can help the project team review hit compounds across screening assays. For example, a single Excel file might include molecular weight, purity, and concentration data, together with compound identification, chemical structures, and formulas, as well as comments on the results, such as the presence of impurities or isomers. The summary report may also include an SDMS (or pdf) link, allowing users to click and view the analytical data (chromatograms and spectra) for each compound (Figure 7).
Figure 7. Combining the tools described here to produce highly effective results summary sheets.
Conclusions and key lessons learned
In the end, all the defined project objectives were achieved (prior to ELN deployment), as were various opportunistic objectives that were added as the project progressed and generated additional efficiencies. Achieving trust was the ultimate goal, and the results suggest that over the course of a year of total transparency, in addition to ongoing consistent and efficient delivery of high-quality results, output from the same full-time employees (FTEs) can be increased by greater than threefold. The goals achieved and gains realized as a result of this project support the value of informatics as a powerful tool for promoting openness, competence, and efficiency in the research setting. In our experience, this results in much greater trust and collaboration, both crucial attributes of an optimal R&D environment.
Key project management lessons learned contributed to its success: break the project into small pieces and working groups, focusing first on the most fundamental pieces (in this case, organizational ELN efficiency) and do them well (including customization to the actual optimal workflow); use a balance of internal and external resources to achieve execution efficiency; set incremental milestones to monitor and demonstrate progress; minimize the number of decision makers aiming for execution quality and efficiency; and focus on total transparency and the most efficient use of scientists' time as the two main priorities of project decisions to maximize the positive impact of integrating informatics solutions.