Dive Brief:
-
A paper published Tuesday in Nature's digital medicine journal argues FDA’s proposed approach to regulating artificial intelligence and machine learning misses an important point.
-
Researchers from Harvard University and Europe-based INSEAD business school call for FDA to stop evaluating individual AI/ML products in favor of assessing systems. The case is based on a belief that the real-world performance of AI/ML products will vary too much for a narrow focus on the technology to be effective.
-
Instead, the researchers want FDA to factor in wider issues, such as physician training, and potentially limit authorizations of AI/ML products to specific hospitals.
Dive Insight:
FDA and its peer institutions primarily regulate products. A company files data on a new drug or medical device and the regulator assesses whether the evidence supports marketing authorization. Factors including the skill of the physician influence the outcomes the product achieves in the real world, as shown by studies of the variance in results across health centers, but such considerations are outside of FDA’s remit.
Writing in npj Digital Medicine, researchers from Harvard and INSEAD advocate change is needed in the era of software as a medical device (SaMD) products that use AI/ML. The influence of external factors on the performance of AI/ML-based SaMD is too strong for FDA to limit its focus to the products themselves, the authors write.
The paper points to the impact organizational factors such as resources, staffing, skills, training, culture, workflow and processes have on the use of software. Evidence of such impact led the authors to claim "there is no reason to expect that the adoption and impact of AI/ML-based SaMD will be consistent, or even improve performance, across all settings."
There are precedents in healthcare to support that viewpoint. As the authors note, computer-aided detection (CAD) for mammography proliferated in the early 2000s, only for a study to find the technology had no positive effect on diagnostic accuracy. The authors of the AI/ML paper interpret the lack of positive effect as a consequence of "the way physicians interacted with CAD," going on to argue that similar dynamics could play out in AI/ML-based SaMD.
Having concluded AI/ML could be beneficial in one healthcare setting and deleterious in another, the authors make the case that such variance complicates the task of assessing whether a product has a positive risk-benefit profile. The way around that complication, according to the authors, is to assess products in the context of the systems in which they are used.
"A full system approach would require the regulator to collect data on a myriad of information beyond its current regulatory gaze and perhaps even beyond its legal mandate, requiring additional statutory authority — the reimbursement decisions of insurers, the effects of court decisions on liability, any behavioral biases in the process, data quality of any third-party providers, any (possibly proprietary) machine learning algorithms developed by third parties, and many others," the authors wrote.
After performing such an assessment, the regulator would "issue a limited regulatory authorization that tracks factors like the ones discussed above." The authors even see scope for FDA to restrict its approvals to certain "trained and authorized" users at specific hospitals.
Given the limited precedence for such an approach, the authors accept reviewing systems would be a big change for FDA and its peers and is currently an unrealistic proposition. Yet, the authors also think regulators can follow stepping stones between the product and system approaches, for example by demanding more data on how healthcare professionals respond to the outputs of AI/ML devices and mandating training to reduce variance.