Ep99: When AI goes wrong

Ep99: When AI goes wrong
8 August 2023

Fellows of the College can record CPD hours spent listening to the podcast and reading supporting resources. Login to MyCPD, review the prefilled activity details and click ‘save’.

This is the fourth part in a series on artificial intelligence in medicine and we try and unpick the causes and consequences of adverse events resulting from this technology. Our guest David Lyell is a research fellow at the Australian Institute of Health Innovation (Macquarie University) who has published a first-of-its kind audit of adverse events reported to the US regulator, the Federal Drugs Administration. He breaks down those that were caused by errors in the machine learning algorithm, other aspects of a device or even user error.

We also discuss where these all fit in to the four stages of human information processing, and whether this can inform determinations about liability. Uncertainty around the medicolegal aspects of AI-assisted care is of the main reasons that practitioners report discomfort about the use of this technology. It's a question that hasn’t been well tested yet in the courts, though according to academic lawyer Rita Matulonyte, AI-enhanced devices don’t change the scope of care that has been expected of practitioners in the past.


>Rita Matuolynte PhD (Macquarie Law School, Macquarie University; ARC Centre of Excellence for Automated Decision Making and Society; MQ Research Centre for Agency, Values and Ethics)

>David Lyell PhD (Australian Institute of Health Innovation, Macquarie University; owner Future Echoes Business Solutions)

Produced by Mic Cavazzini DPhil. Music licenced from Epidemic Sound includes ‘Kryptonite’ by Blue Steel and ‘Illusory Motion’ by Gavin Luke. Music courtesy of Free Music Archive includes ‘Impulsing’ by Borrtex. Image by EMS-Forster-Productions licenced from Getty Images.

Editorial feedback kindly provided by physicians David Arroyo, Stephen Bacchi, Aidan Tan, Ronaldo Piovezan and Rahul Barmanray and RACP staff Natasa Lazarevic PhD.

Further Resources

Artificial intelligence in medicine: has the time come to hang up the stethoscope? [Lyell, IMJ. 2023]
More than algorithms: an analysis of safety events involving ML-enabled medical devices reported to the FDA [Lyell, J Am Med Inform Assoc. 2023]
How machine learning is embedded to support clinician decision making: an analysis of FDA-approved medical devices [Lyell, BMJ Health Care Inform. 2021]
A governance model for the application of AI in health care [Reddy, J Am Med Inform Assoc. 2020]
Machine learning in clinical practice: prospects and pitfalls [Coiera, Med J Aust. 2019]
When Artificial Intelligence Models Surpass Physician Performance: Medical Malpractice Liability in an Era of Advanced Artificial Intelligence [J Am Coll Radiol. 2022]
Artificial intelligence tools in clinical neuroradiology: essential medico-legal aspects [Neuroradiology. 2023]
Artificial intelligence clinical trials and critical appraisal: a necessity [ANZ J Surg. 2023] 
When Could You Be Sued for AI Malpractice? You're Likely Using It Now [Medscape]
Should AI-enabled medical devices be explainable? [Matulonyte, Int J Law Inform Tech. 2022]
Artificial Intelligence and Human Life: Five Lessons for Radiology from the 737 MAX Disasters [Radiol Artif Intell. 2020]

How do we safely, effectively and responsibly implement Artificial Intelligence in healthcare? [USyd]
Digital Health CPD Primer [RACP]


MIC CAVAZZINI:              Welcome to Pomegranate Health, a podcast about the culture of medicine. I’m Mic Cavazzini for the Royal Australasian College of Physicians. This is the fourth part in our series on artificial intelligence in medicine. We’ve focused mostly on AI-assistance in clinical decision-making, and how naturally clinicians will hand over some responsibility to machine learning devices.

One of the reasons that practitioners and regulators report discomfort about AI is the uncertainty around the medicolegal aspects of such care. Who takes responsibility when something goes wrong? The Royal Australian and New Zealand College of Radiologists offers this answer in its statement on Ethical Principles for AI in Medicine;

“Responsibility for decisions made about patient care rests principally with the medical practitioner… Medical practitioners need to be aware of the limitations of [machine learning] ML and AI, and must exercise solid clinical judgement at all times. However, given the multiple potential applications of [machine learning] ML and AI in the patient journey, there may be instances where responsibility is shared between the medical practitioner caring for the patient, the hospital or practice management who took the decision to use the systems or tools, and the manufacturer which developed the ML or AI”

This statement is predictably conservative and a little vague, but maybe it’s no different to the guidelines around use of other medical devices. In order to peel back some of the layers of the medicolegal onion, I spoke to academic lawyer Rita Matulonyte. But before we hear from her, I want to run over an important paper that I wish I’d found when I started out this series of podcasts.

It was written by my second guest, David Lyell, a research fellow at the Australian Institute of Health Innovation. His co-authors included Professor Farah Magrabi and Professor Enrico Coiera.

This study took in at all the machine learning devices that had been registered with the US FDA up to February 2020. From a literature search the reviewers identified 137 devices in total. While other publications and the FDA itself offer a list two to four times longer, David Lyell is not alone in observing that a single product might have multiple notifications, and that not all components actually contain an AI algorithm within.

In David Lyell’s BMJ paper from 2021, he and his colleagues narrowed their focus down to confirmed AI-devices that were intended to support clinical practice specifically, excluding those intended for consumers or practice managers. This filter identified 49 unique devices, 42 of which relied on imaging data, with CT, MRI and X-ray making up the lion’s share. The remaining handful of devices used signal data like ECG, phonocardiography and blood glucose measurements from an insulin pump.

The researchers assessed where the devices sat with regard to the four stages of human information processing, these being, information sensing; then, information analysis; third, choosing an appropriate response from a number of alternatives; and finally, acting on that decision.

The first and last categories only made up 12 percent of devices; those that might automate image enhancement during scanning, or automatically capture an ultrasound image only when suitable image quality was detected.

29 percent of the tools in this list were involved in the information analysis stage. For example, on a brain scan, feedback that provides the user with a volumetric quantification of key sub-structures. Or on an ECG, that quantifies the height of particular peaks and troughs in the cardiac rhythm. In such examples, it’s still up to the clinician to weigh that information up in determining the diagnosis.

59 percent of machines in the shortlist assisted with the decision-selection stage of information processing and this was divided into four further sub-categories. Clinical classification is  exemplified by one tool that gives a formal breast density grading from mammograms. By contrast, another device called Profound, analyses digital breast tomosyntheses to highlight regions of interest. This is what you’d call feature-level detection and it suits data formats that we have some natural feel for, such as imaging. For other less intuitive modalities like ECG, you often see AI performing case-level detection where it might say outright, “this looks like a particular arrhythmia or structural deficit.”

The final category were a bunch of tools that provide triage notifications, which do also make recommendations about caseness but at a different point in the workflow. Case-level detectors are consulted in real time by the diagnostician, whereas the triage notifications work in the background and then prioritise cases for review by the clinician. For example, an image analysis program called Briefcase flags scans that contain likely cervical spine fractures, large vessel occlusions, intracranial haemorrhages and pulmonary embolism so that they go to the top of the radiologist’s list.

In this sense, machines providing triage notifications were considered to be operating autonomously, as were those working earlier in the workflow to automate image capture or provide data analytics. Some of the machines that provide case-level findings to non-specialists were also classified as autonomous, because the output might go largely unquestioned by that user. But the patient would then be referred to someone with more expertise who acts as the safety net.

By contrast, nearly half of devices in the list of 49 were determined to be assistive AIs. You can imagine why feature level detectors naturally fall into this camp as they’re helping with the user apply their own diagnostic skills. Most case-level detectors were also filed under assistive devices, though, because their output came with a message indicating that it was the clinician’s responsibility to make the final diagnosis.

That’s enough background for now, but just keep this schema in mind when we get stuck into David Lyell’s second paper about adverse events resulting from the use of machine learning devices.

DAVID LYELL:     So I’m David Lyell, I’m a postdoctoral research fellow at the Australian Institute Health Innovation at Macquarie University. My interest is how humans use decision-support systems. So there's a lot of lessons that we already know from heavily automated areas, so also like aviation, that apply equally to AI. I have started to say it is just a fancy method of automation. I suppose the important difference is how is this automation intervening in the task? And is it doing something that people can rely on? Or do they have to hold its hand and double check everything it does?

RITA MATULONYTE:       My name is Rita Matulonyte and I’m a lecturer and researcher at Macquarie Law School in Macquarie University. I specialize in technology law and intellectual property law. And I my special interest is in law and regulation of artificial intelligence technologies.

MIC CAVAZZINI: Okay, David, lets start with that looked at all the different kinds of machine learning devices approved by the FDA. Of the 49 devices you identified as being practitioner-facing clinical tools, 14 of them, 29 percent, were just providing additional information analysis to help human problem solving, but no explicit recommendation. Is this the kinds of interface that clinician users are most comfortable with- that they’re not deferring judgement to the machine?

DAVID LYELL:     The clinicians I've talked to—and I'll say upfront that these are people I encounter in my travels looking at decision support, so it's probably not a representative sample. A lot of clinicians and doctors they like the idea of a second pair of eyes. And they like the idea of AI’s that make their life easier. But there's a bit of an uneasy relationship because they'd like the idea of it, but they don't trust it, and they feel like they own the decision, and they have to be responsible for it. So there is that tension.

MIC CAVAZZINI:               Rita, does it also make lawyers more comfortable when it’s very clear cut that the decision-making is still with the user rather than the machine? Most devices whether they were helping with analysis or giving outright diagnostic recommendations came with a disclaimer like; “Should not be used in lieu of full patient evaluation or solely relied on to make or confirm a diagnosis.” That was from a Knee Osteoarthritis Labeling system called KOALA. Or a more explicit disclaimer; “Interpretations offered by device are only significant when considered in conjunction with healthcare provider over-read and including all other relevant patient data.” That was from a digital stethoscope called eMurmur. So Rita, are such disclaimers from developers pretty bombproof? Would the liability always rest with the clinician?

RITA MATULONYTE:       Yeah, so I suppose these disclaimers essentially they don't set any new rules in this space. Clinicians always kind of had different tools and they had to use them together with all the other tools they have. So the main legal standard in the field is that the doctor has to take all the reasonable steps in order to prevent harm. And I suppose this just confirms that doctors should not rely on AI as the only and single tool they can make the decision on, and they have to use it in conjunction with other procedures that they normally use, and consultation with doctors.

But of course has to be reasonable, right? So it doesn't mean that they have to explore all the avenues available there. So they would to act as any other professional in the area would act. If they've done everything that a reasonable expert would have done the situation, and still ended up with an incorrect result, then we'd have to see what happened there. Maybe there was a defect also in the initial tool they used. And if that essentially led to the incorrect result and to the harm, then we could talk about, possibly, manufacturer’s liability.

What you think is allocation of responsibility by those disclaimers doesn't mean that manufacturer of those devices will never be liable if, for instance, the device is defective—let's say it provides biased results, or its error rate is much higher. Or if it doesn't have a sufficient information provided about the device, for instance, for what particular situations we can use it, and that's why the doctor using the situation for which it's not suitable. So in this case, is the sort of disclaimers won't help, and the manufacturer can still be held liable and responsible for the harm that was caused at the end..

MIC CAVAZZINI:               Yeah, we'll pick apart some of those different examples. But as you say, probably the pathway you describe is more a question for that field of specialty, the consensus of best practice. You know, they're going to decide the most appropriate way to fit this decision aid in, not the lawyers.

RITA MATULONYTE:       Yeah, exactly, peer opinions, that's what lawyers refer to, and established best practices in protocols are very important for doctors in this case, you know, how they're supposed to use this device. And of course, let's say, AI tool—if there are no other ways to double check the advice, and the doctor relies only on that advice, it still might be reasonable, right? And so, it depends exactly on the particular tool, and how the doctors’ community agrees, where it fits in the entire workflow.

MIC CAVAZZINI:               Makes sense. Most of the example we’ve talked about in these last few episodes have been of “assistive” AI and some people prefer to call it “augmented intelligence” rather than artificial intelligence. But there will come a time, when certain AI tools becomes recognised as the standard of care. In which case you could end up in the inverse scenario where a clinician who chooses not to defer to the AI, and uses their own judgement instead, could be accused of “disregarding best practice”? Does that sound feasible, Rita, or is that a question more for the Colleges?

RITA MATULONYTE:       Yeah I think it's a question that's been raised also in the legal literature, what happens then? Well, in law, there is this rule that if a clinician acts according to established practice, and in compliance with peer opinion, they are generally safe from liability. But, but that's not without exceptions. So the courts have acknowledged that, first of all, those established practices might be not reasonable. So sometimes, even if courts were provided evidence that doctors have been behaving in a particular way for a while doesn't mean that that's sufficient standard of care, and they might have established and even higher one.

In other cases, as far as I understand, sometimes there is disagreement of the doctors as well, what's the best practice? So there are several different best practices essentially. And so in that situation, even if the doctor deviates from that best practice to apply, particularly AI to one particular situation, if they have sufficient support from their peers that deviation, maybe in that particular situation from the advice of AI is reasonable, for sure the courts will not punish them for that.

But there should be a good reason to do that, right? So if they say, like, “Look, I know, this skin detection tool is really good on white population, but from my experience, it's not very accurate on black populations, and so that's why I don't trust this result, you know, I take a next step and maybe deviate from the recommendation and do something else,” it will be perfectly fine. So yeah.

DAVID LYELL:     The other thing I'm just thinking of—going back to the automation literature—is that there's kind of a point where something goes from being automated—so where you have a machine that's doing something that a human would otherwise do, to being a machine task, where it is no longer capable of being performed by humans. So, perhaps a bad example is the starter motor in a car. So early cars were crank-started and at some point you could crank-start it and you'd have a starter motor. But modern cars can no longer be crank-started—it is a machine task now. And there is that crossover point, and I suppose that this is going to be a consensus amongst all the actors, you know, at what point the technology is good enough to stand by itself.

MIC CAVAZZINI:               Yeah there's a pretty high bar and again, as you say it's, it's not a new problem. In his book, “New Laws of Robotics” Professor Frank Pasquale from the Brooklyn Law School presents the example of prescribing software that identifies possible drug interactions. Already there have been cases of clinicians who think they know better, or they turn off the alerting system because it’s too annoying. But Pasquale says; “courts have also recognized that professional judgment cannot be automated, and they are loath to make failure to follow a machine recommendation an automatic trigger for liability if there are sound reasons for overriding the Clinical Decision Support System.”

DAVID LYELL:     Actually, for my PhD thesis, I looked at automation bias, so the risk that an automated system could adversely bias people. And talking to a Professor of pharmacology he made some comment at some point that everything interacts with everything. And so it does kind of seem that with medication alerts is that it alerts every interaction, but most doctors will go, “Yes, I know that. But that's not a sufficiently risky thing. It doesn't outweigh the benefits of the medication, so I’ll prescribe that anyway.”

MIC CAVAZZINI: It’s the boy who cried wolf.

DAVID LYELL:     Yeah, and so then this is alert fatigue, where people kind of just go, “This is not telling me anything useful. And it's just easier for me to do this task by myself and ignore what the alerts are saying.” So there's like this continuum of use; we've got automation bias on one hand, where there's an overreliance on it, where people are unquestioningly accepting it; And on the other hand, there's disuse where there's this fatigue, and people just switch off. And what manufacturers—I don't know the legal implications of it—but what manufacturers seem to be assuming is that clinicians can effectively discriminate good from bad decision-support. And this, I think, this is making a lot of extra work for doctors, because it's not just getting it, they've got get the advice and then evaluate it.

MIC CAVAZZINI:               Interestingly, while the scenarios I’ve described are the ones that most easily come to mind, at least to my mind, David, I don’t think any of them were actually reflected in your audit of adverse events reported to FDA’s medical devices database, which was a first-of-its-kind study. So this was just published last June in the Journal of the American Medical Informatics Association, and in the six years between 2015 and 2021 you found that there were 266 events involving AI-equipped devices reported, associated with 25 unique devices. And only 28 reported events were classified as errors with the machine learning algorithm itself.

45 of the 266 adverse events resulted in patient harm, and they were certainly not all machine errors. But in two cases there were deaths. One of the fatalities which stood out was a machine error, but it was reported anonymously, and you weren’t able to corroborate it. But it was said to be from an insulin-dosing system that treated hyperglycaemia way too aggressively, with several patients developing metabolic and EKG changes as a result.

DAVID LYELL:     The report was quite vague, it listed several adverse outcomes. But what it seemed to be expressing was the reporter’s concern with how aggressively it treated hyperglycaemia. It wasn't so much that the device malfunctioned, but it was kind of almost like this a disagreement with a design philosophy of it.

MIC CAVAZZINI:               Let's assume that it was a problem with the model, the AI model. Rita, is it easier to put that sort of error at the feet of the developers?

RITA MATULONYTE:       Well, certainly it is, if it is an error. So to say if it is a defect in the device, the manufacturer of the device would be held liable for that.

MIC CAVAZZINI:               Even with those sort of disclaimers.

RITA MATULONYTE:       Yeah, even with the disclaimers, exactly. So, in Australia and overseas as well, actually, to prove the liability of a manufacturer under so called product liability laws, you don't even have to prove fault. So, as long as you prove that the device was defective that's quite a straightforward case of manufacturer liability. And disclaimers, even if attached to these products would not help them to avoid liability.

DAVID LYELL:     With a software system, what would make a defective? Or how would courts determine if the device is defective?

RITA MATULONYTE:       Yeah, it's a good and complex question. And actually, if we go a bit more deep in the topic, there is still a bit of a disagreement whether those product liability laws apply to software device at all, whether it's a product. But courts increasingly, accept that and apply even those kind of defective product laws to these. So that's the first step, and then to prove that the software is defective, I suppose it will be very hard work. And you have to, of course it will rely on expert evidence. I imagine they will have to talk to manufacturers of those sorts of software products and say, “Okay, do you think the design adopted for this particular device followed the industry standards?”

You know, and then how it was implemented, trained, whether data sets used were appropriate, or maybe they were intrinsically biased and you didn't inform users about that. Because you can't rely, probably, on saying that device does not perform 100 percent accurately because none of the devices perform, there's always will be an error at a certain level. But if you prove that device was like, 80 percent, incorrect, when it's claimed that it was 80 percent correct, that's, of course, another evidence to show that there is the there is a defect in there.

MIC CAVAZZINI:               I talked in the last episode with Sandeep Reddy and Paul Cooper about the governance models that the TGA and the FDA is trying to apply and how hard it is to, given the explainability problem of these black box models. What standards should we hold them to? And the response from Brent Richards was, in fact, “well, the same standard that we hold every drug to, you know RCTs.  We don't necessarily know how they work, but we know what good patient outcomes look like.” [see also]

But I wish I'd read your BMJ paper before, David, because you found that just under half of hundred-odd devices approved by the FDA went through the softest approval pathway that sees the device as substantially equivalent to existing products. The other half went through the de novo registration but through a gentler door for low to moderate risk devices. Only two devices out of the whole lot went through the most stringent pre-market approval pathway that requires scientific review, clinical trials, quality and safety assessment. Is that appropriate given what you now know about adverse events?

DAVID LYELL:     The de novo ones do have—it's not as extensive a clinical trial as the premarket approvals, but they do provide some sort of evidence. There's a thing they refer to it—a pivotal trial. Most of the devices that we looked at, they're approved with a premarket notification, which meant that the manufacturers established substantial equivalence to an existing device on the market. So there's probably some legacy reasons why they are designed the way that they are. And even some of the premarket notifications do provide some data, so they might have done a small study or something like that. But yeah, the full clinical trial is always done for the class three devices.

MIC CAVAZZINI:               I'm still not sure whether the standards are as high as they would be. I mean, when there's a bad outcome from a drug that's been approved, or a surgery or prosthetic device, implant, we know about it, it's in the headlines. You know, Johnson and Johnson ended up with $12 billion in suits against it for their rushed vaginal mesh rollout, because they hadn't done enough trials. Are you satisfied that the approval process for these devices is within the same bounds of safety?

DAVID LYELL:     That’s the thing is, we need clinical evidence that it is actually proving outcomes. We also want to test the weak points where we think that might fail. So we know some of the places where AI might be sensitive. So it's data intensive, right? So we should be testing different qualities of data with it and to make sure it is kind of robust to violations of some of the requirements.

MIC CAVAZZINI:               You didn't get the impression that the FDA was just cutting corners to get these things out to market?

DAVID LYELL:     My impression, and also with the TGA in Australia, is that it's set up around a risk-based assessment. So the same as like a drug trial, and all the rest of it, and they are being evaluated against that. And so I think we’re can have a good amount of confidence that the regulators are being very diligent. And they are very aware of the new challenges that are bought by it. But at the end of the day, it's in the clinical testing. You know, we need the clinical evidence.

I was just going to quickly add with the regulatory process is that it's kind of around the product classes. So it's kind of like the de novo establishes that the use case is safe and all the rest of it, and then these other devices, sort of, piggy back on the back of it. So it allows for an incremental thing. The triage devices that we saw talked about, is there was a de novo, which was the one that introduced that use case. So it introduced radiological triaging notification software, and then the premarket notifications would then approve substantially equivalent devices. It does seem like there's some evolution where some mistakes have been picked up along the way. There was a letter from the FDA to providers, warning them about triage devices for large vessel occlusion—so this is ischaemic stroke—and it was warning people that you can't rely on this for diagnosis, it’s just for notification.

The other thing that came out is that doctors need to be aware of that it only checks for occlusions in certain vessels. And I did notice that after that there was another premarket notification for one of the devices I knew about that specified that it only checked for large vessel occlusions in the M1 artery. So if a user is assuming that it does something that it doesn't, then this is where there's a risk of bias is because you could assume that it has checked all the arteries in the brain where it's only checked a couple? Would that be a defect actually, where the labelling hadn't specified that it only checked for occlusions?

RITA MATULONYTE:       Yeah, so certainly, certainly it is problematic, right? If their description claims that you can do more than one tech she does and then a doctor relies on it, this particular lack of proper description is certainly would weigh in favour of attributing liability to the manufacturer.

MIC CAVAZZINI:               The majority of events in your audit, David, were classed as errors in data acquisition but this category was inflated by one particular device for mammography that aborted mid-scan on 184 separate occasions. So that’s something like two thirds of all that were reported. There were also some clear examples of plain old user error. There were 11 reports out of the 266 that were classed as errors in commanding the device. A couple of these involved surgical machines or radiotherapy devices going off target because the landmarks hadn’t been calibrated properly by the users. Is that enough to call this negligence?

RITA MATULONYTE:                       Yeah, so it will depend, of course, what information users are provided, what use instructions they're provided. If they are told that it's up to them to calibrate or recalibrate the devices regularly and instructions are provided how to do that, then it will be on them, right? But if the device is initially not properly calibrated, and/ or the instructions are not sufficient, or sufficiently clear, and the information is not clearly communicated, then it might be, you know, the fault of the manufacturer for not ensuring the safe use of device on their side.

MIC CAVAZZINI:               Like all these things there’s a sliding scale. One specific example that resulted in a death involved a Doppler echo that failed to detect an acute mitral valve insufficiency, leading to a delayed diagnosis. It appears that staff at this particular centre had used their own calibration set rather than images provided by the manufacturer. I’m a little surprised it wasn’t better trained out of the box, but as long as it’s all spelled out in the instructions, “this is what we’ve provided, this is what’s required of you.”

RITA MATULONYTE:                       Yeah.

MIC CAVAZZINI:               And then finally in your catalogue of catastrophe, David, there were seven examples of contraindicated uses. Most of these were consumers putting too much faith in the ECGs on their smart watches, but one was a clinician misusing the guidance from an insulin-dosing device. You noted that across the board, manufacturers were quick to point out when “there was no device malfunction”, but rarely acknowledged problems of user interface or interoperability issues with other IT systems. Should they be doing more in the way of testing out the user experience before launching products?

DAVID LYELL:     I suppose that's the critical thing is that it's not just an AI algorithm, it's part of a system that needs to improve healthcare outcomes for patients, and it's going to be making sure that whole system works together and eliminating the variability that could be introduced into that system. Again, we know from the automation literature is that when you have an automated decision support system or something like that, I read someone describe it once as the brightest blinking red light in the cockpit. It garners more attention than every other source of information that's available.

MIC CAVAZZINI:               So the very last thing, so we, I haven't come across any actual events that went to court of medicolegal cases involving AI. But there are examples from outside of medicine which might be informative, and hopefully you can guide us through them. So one of them is the Uber self-driving car that killed a pedestrian during road testing in Tempe, Arizona in 2018. A year later the company was found by a local court not to be criminally liable. The car had been traveling at the speed limit and had alerted the safety driver that they should take control of the vehicle given the night-time conditions. But the driver had been watching their phone instead, and was later indicted on the count of negligent homicide or manslaughter. That trial has been delayed because it’s still such a complex area. But I think that outcome shouldn’t scare listeners too much, should it Rita, given that the driver was so clearly pushing the machine outside of its proclaimed performance limits?

RITA MATULONYTE:       Yeah, I suppose this case can only encourage the users of those AI devices, being in medical field and elsewhere, to kind of get informed about the device. Read the instructions, how to use it, and how not to use in which situations and follow them, and don't overtrust it. And that’s why we’re talking here, I think, to kind of try to highlight its limitations. And always, act very carefully, as long as there are no established best practices, standards, just be very careful, and take all precautions you might have. But still, I suppose, use them as if you find them useful. You know, we don't want to discourage use of potentially beneficial technology overall.

MIC CAVAZZINI:               David, some final reflections?

DAVID LYELL:     Yeah, I'm, I'm going to present a different perspective on this one. I kind of think that…

[loud phone alarm sounds] That is an example of…

MIC CAVAZZINI:               How to overcome alert fatigue?

DAVID LYELL:     I can't turn it off. But this is a good example of how like—because technology has to fit the use cases. It has a function where, apparently, if you tap the power button five times in rapid succession, it assumes there’s an emergency and it calls emergency services, and I can't find out how to disable it. So, I mean, this is probably a good lead into my point here. Is that there's a lot of enthusiasm about what can be done with AI and all the possibilities, but I think our challenges is coming up with wise use cases.

And I think that there is a risk of setting humans up to fail. A self-driving car, it changes—being a safety driver is not the same as actively driving a car. And I think it's actually a more complicated task, because when you're driving, you're aware of what's going on and you're engaged with it. For the safety driver, they are supervising. So it's kind of like a supervising learner driver or something like that. So the safety driver they have to recognize the hazard. Then they have to recognize the car hasn't seen that hazard and I need to take over and that's going to that's going to lengthen braking distance. And I saw the dashcam video of it and this was something that happened—it was a split second,

MIC CAVAZZINI:               The crash itself was a split second, but if the car had told you 10-15 minutes earlier, “it's too dim for my sensors to work properly”

DAVID LYELL:   Did it, though?

MIC CAVAZZINI:               That was the impression I’d got.

DAVID LYELL:     I kind of think, with the –I think that the safety driver’s legal peril at the moment is because their job was to be a safety driver, and they were demonstrating complacency with it. So I think that yeah, to start off with the safety driver has a more difficult job than someone who's just a driver. And we know this from lots of research, from aviation, from, from all these other things is that humans are very poor monitors for rare events in highly reliable systems. And from what I understand is the Uber self-driving system was very, very reliable. And so some applications have been talked about in medicine, it’s the same thing, there's this assumption that the user will be in a position to recognize when AI is wrong.

And you are asking clinicians to verify the AI, like to supervise it. If we've got something's very complicated, it's not really explainable, is a clinician in a good position to be able to effectively supervise and monitor and control it? And to approve what it says? And my fear is, is that there could be use cases that set humans up to fail if something goes to wrong. And so, I think we've got to make sure that the human’s part of the task is something that is realistically achievable for them. Wise use cases are ones that enhance healthcare but we got to do it in a way that doesn't set humans up to take the fall when something goes wrong.

MIC CAVAZZINI:               So the prosecution of the Uber safety driver ended on July 29th. Instead of being convicted for manslaughter they pleaded guilty to the lesser charge of reckless endangerment which came with a three year sentence of supervised probation. For David Lyell, this doesn’t satisfactorily resolve those underlying tensions you heard him describe. An investigation by the US National Transportation Safety Board did take issue with Uber’s software in that it only expected pedestrians in conjunction with marked crossings and not those pushing a bicycle as the victim had been.

Another question that hasn’t yet been explored by the courts, and which I didn’t get time to discuss with my guests, was how inexplainability of deep learning networks might affect determination of medicolegal responsibility. Rita Matulonyte has an excellent paper examining this in the International Journal of Law and Information Technology. Inexplainabilty would make it very hard to show that erroneous outputs of a model were due to oversights by the developers. But by the same token, how would you prove that the operator was at fault instead? And if the developer included some of those interpretability aids we talked about in the last episode, would that necessarily broaden the clinician’s scope of liability?

According to Rita Matulonyte and colleagues, such questions don’t really change the fundamental principle that; “For a clinician to be liable under the tort of negligence in Australia, a patient would need to demonstrate a duty of care that was owed, a breach of that duty and damage suffered that is causally related.” Rita Matulonyte and colleagues argue that while explainability of neural networks is not always achievable, transparency about the design and intentions of the models might be a more important glue for ensuring trust; trust from clinicians in the developers, and from patients in the clinicians.

“A lack of explainability does not inhibit patient autonomy (nor their relationship with the clinician or trust in the medical system generally)... Patients have traditionally accepted the opacity of medical decision-making, whereby diagnostic procedures [like a biopsy] are usually embraced without unduly interrogating the practitioner’s method of determination…. Clinicians should be capable of providing such general knowledge about the AI technology they use (and should be able to outline to the patient in simple and understandable terms the advantages and risks), so that transparency around technology would be ensured.”

I’ve linked to this paper and all the other reference material we discussed at the website racp.edu.au/podcast. There you’ll also find a transcript, credit to the members of the podcast editorial group and to the musical composers who you’ve heard. For now, I want to thank David Lyell and Rita Matulionyte for contributing to this episode of Pomegranate Health. The views expressed are their own, and may not represent those of the Royal Australasian College of Physicians

Stay tuned for the final episode of this series, where we’ll take a quick look at the large language model ChatGPT4 that has taken the world by storm. Are such AIs just a gimmick, or do they have a place in medical practice too.

If you found this podcast informative, please share it with your colleagues, but start them back at episode 95, where we explained exactly what machine learning is. I’m Mic Cavazzini, and this episode was produced on the lands of the Gadigal and the Walumadagal of the Eora nation. I pay respect to the storytellers who came long before me. Thanks for listening.


Be the first to comment on this Podcast!

Thank you for posting your comments

21 May 2024
Close overlay