Arvind Rao, PhD, associate professor, Department of Computational Medicine and Bioinformatics, University of Michigan, talks to contributing editor Tanuja Koppal, PhD, about ways in which artificial intelligence (AI) will likely impact diagnostics development. He discusses some of the work he is doing in this area and points out the importance of understanding all the caveats and nuances of AI predictions before reaching any conclusions.
Q: Can you describe some of the work you are doing using AI and machine learning (ML) and how that is likely to impact the development of biomarkers and diagnostics?
A: Our lab is developing integrated image-based and genomics-based disease diagnosis and prognosis algorithms. We are using ML methods for the automated interpretation of imaging data, like radiology scans (CT/MRI/Tomography) and pathology slides. We are using this imaging data to build ML/AI models for disease grading and molecular diagnosis. Our intent is to be able to make diagnostics as objective as possible, by leveraging data that are commonly available in the clinical workflow. For example, we try to determine disease treatments for molecular sub-types based on radiology scans or hematoxylin and eosin (H&E) stained pathology slides. We work with radiology scans because you can get a quick assessment of the patient’s disease state from this non-invasive technique, which can really help downstream decisions. Our motivation for using H&E slides to predict patient disease states comes from a global health context. H&E slides are available in most pathology labs and we can use that to help patients who don’t have access to molecular testing facilities.
Q: What are some of the common mistakes and misconceptions when using AI/ML-based predictions?
A: AI is now becoming a commodity that can be easily channeled for problem solving, without a lot of coding or programming experience. With no-code ML platforms, you use an “app” to quickly leverage a variety of algorithms on positive- and negative-labeled training datasets and build a fairly well-performing model with the right computational resources. Now, the problem becomes how to interpret this model and how to deploy it responsibly. There are inherent intricacies to ML that can get skipped in the process. For example, a common mistake would be developing the model on a specific ethnic population and expecting it to work “out of the box” on a different ethnic population. This is called the mismatch between training and validation, where the testing cohort is so different from the training population that the results are essentially irrelevant. Anyone who overlooks these nuances is likely to make the wrong call.
There is also a lot of variation in pre-processing the raw, unstructured data into data that are used in building these AI/ML models. Even though the engineers may know the nuances, they don’t have a systematic rubric to communicate how the data was pre-processed or modified prior to being used in the modeling process. Metadata rigor and reproducibility aspects to data pre-processing, data curation, data labeling—all of which take place upstream of the modeling process—can create problems during deployment, if not attended to. Not understanding the interplay between data version, model version, and model performance can lead to errors in predictions. Hence, deployment should ensure equivalence of training and test data, calibrating concept drift, and assessing biases, among other considerations. The mistakes involving image classification in conventional (low-risk) AI applications have very different consequences when compared to similar mistakes made in health care. These costs can be huge in health care and most models do not factor in that variation in the “costs of erroneous-inference.”
Q: What advice would you give to readers who are using AI for diagnostics development?
A: One needs to be thinking about ML model deployment with a regulatory, compliance, and auditing “lens” rather than a purely performance-based assessment. There is a need to be adapting regulatory principles from FDA/NIST to continuously evolving ML systems. It’s important to certify both the model and the process for its iterative updates.
For engineers or model developers, I would strongly suggest developing rubrics around responsible reporting of the data, the model, and the process to go from model to predictions, in a very clear and understandable manner. Communication is key between the developer and the deployer. On the deployment side, the user must take an active interest in educating themselves on what is required to compare models with different price points and performance guarantees, to find out which product is going to best meet their needs. The companies that make these products also need to invest in education to ensure that the deployers are able to make meaningful comparisons in the context of their applications.
Q: What changes do you see occurring in the use of AI for diagnostics in the next two to three years?
A: The past year has enlightened us to challenges in how we share data, how we build robust AI models, what standards these models adhere to in terms of interpretability, bias, and how models can continuously evolve to receive more data. These are now entering our collective consciousness and is going to be the rubric by which we are expected to develop AI models more responsibly going forward. No-code ML/AI (Auto ML/AI) is likely to be fairly routine going forward, making it easier to build complex models. At the same time, it will also make it easier for things to go wrong. There will likely be greater use of technologies like federated learning to work with data privacy restrictions, distributed model training, blockchain, and model certification.
Similarly, there is expected to be more interaction between community members coming from compliance, auditing, regulatory, and legal perspectives, with those from the health care community to ensure that things are done reliably, responsibly, and with quality. There is likely to be a much deeper conversation around liability aspects for mistakes made by AI algorithms. Hence, it’s important to get on the conversation early because it will help us all get educated about how AI algorithm-based health care will look, what reimbursements will look like, and how to price AI predictions in the context of offering superior quality care. It involves going beyond building models to figuring out how to responsibly deploy those models involving a variety of stakeholders. It involves thinking about AI with a multifactorial lens.
Arvind Rao, PhD, is an associate professor in the Department of Computational Medicine and Bioinformatics at the University of Michigan in Ann Arbor. His group uses image analysis and machine learning methods to link image-derived phenotypes with genetic data, across biological scale (i.e., single cell, tissue, and radiology data). Such methods have found application in radiogenomics, pathology informatics, and drug repurposing algorithms based on phenotypic screens. Arvind received his PhD in Electrical Engineering and Bioinformatics from the University of Michigan, specializing in transcriptional genomics, and was a Lane Postdoctoral Fellow at Carnegie Mellon University, specializing in bioimage informatics. Prior to joining the University of Michigan, he was a faculty member in the Department of Bioinformatics and Computational Biology at the University of Texas MD Anderson Cancer Center in Houston.