MIC CAVAZZINI: Welcome to Pomegranate Health, a podcast about the culture of medicine. I’m Mic Cavazzini for the Royal Australasian College of Physicians. This is the second part in a trilogy about artificial intelligence in healthcare. In the last episode we gave a very brief introduction to the different types of machine learning models and how they are trained and tested. While there are many processes within the health system that could be facilitated by AI, we focused on those that you might soon be referring to directly in your clinical practice; prognostic and diagnostic aids that will require your interpretation and your trust.
The allure of such tools is that they can help overcome some of the natural limits of human cognition with regards to working memory and attention. We often use examples from medical imaging because visual pattern matching is an area in which AI has really excelled. Consider that a radiologist typically reviews 80 to 150 scans in a day, although in the most stretched settings that number can double. The repetitive nature of the task, that also requires careful focus, makes it cognitively taxing. But computers don’t need coffee to stay alert and can churn through images more quickly than any human.
You’ve probably heard it said that radiologists are on their way to becoming obsolete, but it doesn’t really work like that. AI models make mistakes too, and humans will be in the picture for a long time yet to supervise them. If you’re in any doubt that this is a big data problem, consider that an average-sized radiology department pumps out around 50 gigabytes a day, and globally that adds to over 10 million gigs in a year. It’s estimated that the field of genomics produces four times as much data again and will reach an order of magnitude higher by the next decade. No surprises, then that according to some analysts, the rate of data growth is faster in health than in any other sector.
Artificial intelligence is a useful tool to help us make the most of all this data, and maybe buy some time back for doctors to spend with the complex patients who really need it. But the way all this pans out will really depend on how seamlessly the machine learning devices fit in with the clinical workflow. Which aspect of clinical decision-making do they support and how are the consequences of error mitigated? Because of all the ergonomic factors related to the use of AI decision support, only a small fraction of research projects make it all the way to clinical implementation. Professor Enrico Coiera of Macquarie University has described the translation gap in a 2019 article for the Journal of Medical Internet Research titled “The Last Mile: Where Artificial Intelligence Meets Reality”. I spoke to him online, along with another Fellow of the College who is leading the digital transformation of the Queensland Health system. Here’s Associate Professor Clair Sullivan.
CLAIR SULLIVAN: Clair Sullivan. I'm an endocrinologist at Royal Brisbane and Women's Hospital. And I'm also the director of the Queensland Digital Health Centre at University of Queensland.
MIC CAVAZZINI: Thank you, Clair. Enrico, tell us where you're coming from.
ENRICO COIERA: So, Enrico Clara, and I'm a Professor of Medical Informatics at Macquarie Uni, where I also lead the Centre for Health Informatics. And I also, for my sins, lead the Australian Alliance for AI and Healthcare.
MIC CAVAZZINI: Alliance—that sounds rather sinister, doesn’t it. But our aim is to dismiss any sinister notions about AI. In the first episode we already explained how you’d train and test a model. And I used the example of a neural network that was trained to distinguish malignant skin lesions from their benign cousins. And on a set of new images from each disease classification the machine performed as well or better that twenty one board-certified dermatologists. The area under the ROC curve was 0.94. Enrico, such results often get a lot of headlines, and they’re often hyped up by developers. But for you, they’re still very much ‘in silico’ experiments, not real world validation.
ENRICO COIERA: Absolutely. So it's very common to build an AI model test it on data that you've got, and say, “Look how good it is, you know, when I compare myself to humans answering this, we're as good or better”. But often those results mean nothing because they're not in the real world. So Clair comes from the clinical world and she knows that days are busy and messy and workflows get disrupted, decisions get made in all kinds of difficult circumstances. We know that hospital A and hospital B just do things slightly differently. We know that the images from two X ray machines are going to be registered differently. So the reality is that you've got to test these models in real clinical practice to actually understand not whether they could in principle make a decision, but whether in the real world decisions get changed by humans in a way that's meaningful for patients. So that means doing what we've always done in healthcare, which is conduct trials to show in fairly realistic settings whether or not this thing adds or subtracts from the care we're delivering. So it doesn't mean that those studies that you talked about aren't useful—they’re indicative—but it's, it's like an efficacy, not an effectiveness study.
MIC CAVAZZINI: That’s a perfect intro and of course we’re going to dissect those questions more. And just like any new device in the clinic there are also ergonomic considerations to think about. Where and how does the AI fit into the clinical workflow. One random example I came across. There was a Japanese study where a model was being validated that would classify small polyps imaged during colonoscopy and suggest whether they needed resection or not. The AI was super accurate and pretty easy to use because it didn’t require the user to draw a region of interest or slow down their practice. But even though this only added a couple of minutes to the procedure, endoscopists didn’t bother with using the aid for about 28 per cent of the polyps they were imagine. And the authors suggested that some proceduralists were put off by the extra requirement to snap good images to feed the AI. Clair, in your leadership roles and communication roles how much resistance do you come across from those that don’t want to change the way they practice?
CLAIR SULLIVAN: Oh, I think that's just normal human nature, isn't it? That's totally fine and to be respected. Doctors in particular have spent decades training and learning and perfecting various components of their practice, and often have been doing it that way and delivering world class care for some time, so absolutely, I respect that, and we all should. I guess if you want people to change you have to give them a really good reason to change and technology itself is not a good enough reason. So about 10 per cent of us are early adopters and we just love new technology almost for the technology's sake. In some ways, those people are a little bit dangerous. And actually it's the next component of people who are a bit more thoughtful about what they do, that I find interesting. And if we are to adopt new technologies or ways of working, they have to have very clear benefits which outweigh the downsides. And those downsides for busy clinicians often include increased cognitive load or slowing us down. So I don't think we should just adopt things for the sake of adopting them, we should have a very clear purpose and reason and benefit for doing so.
MIC CAVAZZINI: Sticking with the dermatology theme we started on, there was a 2020 paper published in Nature Medicine that gives an example of the ergonomics of the device. So the researchers were investigating the user experience of an AI that could classify seven different types of lesion. Three hundred dermatologists were recruited online to participate and were assigned to receive the information in different ways. One group received a single score for each image estimating the probability that the lesion was malignant. In another group the AI pulled up similar images of known pathology for the clinician to compare to. But the format that most improved diagnostic accuracy was one where a probability match was given for each of the seven different types. Clair, to me this example suggests that you want the technology to support a clinician’s own judgement, but without distracting them or demanding more attention. Would that be fair?
CLAIR SULLIVAN: Yes. I agree. That's correct. Because the algorithm itself will be good at a limited suite of things and what you're encountering may be outside that suite of things that it’s encountered before. So for example, you could see something wildly rare in that lesion that's just never been in the learning set before.
MIC CAVAZZINI: And have developers so far been good at recognising what clinicians need from the machines?
CLAIR SULLIVAN: I think they've trained the machines to be very good at tasks. And that's terrific, because some tasks, such as reading retinopathy pictures, we don't have the workforce to attend to. So I fully support training the algorithms to be good at tasks. What they're not good at, though, is care. And care is a combination of tasks, of perception, of empathy, that extends beyond a task. So I think the tasks are important, and having machines to take some of those tasks and make them more efficient is wonderful. But that's no substitution for care. And that's a bigger conversation.
MIC CAVAZZINI: In this example I’m thinking more about the way the information is provided and how it assists your clinical judgement. It didn’t occur to me that there could be so many ways the machine could spit out that number.
CLAIR SULLIVAN: Oh, yeah. And that's really important for things such as explainability. So how did they get to that percentage? There's a whole host of things around how it should be displayed to a clinician, I think.
MIC CAVAZZINI: The authors from this very international collaboration I just mentioned wrote; “We find that good quality AI-based support of clinical decision-making improves diagnostic accuracy over that of either AI or physicians alone, and that the least experienced clinicians gain the most from AI-based support.” But, if in some experiments they had the AI deliberately give incorrect suggestions and even the most experienced would often be thrown off by the incorrect suggestion and fail to correct the error. Enrico, you sometimes use the term “automation-induced complacency” . Could we also think of it an example of the Framing Effect or of Diagnostic Momentum?
ENRICO COIERA: So the other term we use is automation bias. And it refers to people putting too much trust in a tool like an AI. So, for example, if you're involved in a screening task, and the AI is usually right, then after a while, you just say, “Great. Tick, tick, tick.”
MIC CAVAZZINI: Don't need to double guess it.
ENRICO COIERA: Exactly. And there's another phrase, which we use in monitoring, it's called “out of loop unfamiliarity.” So if you're an anaesthetist, and you're not paying attention to the monitors and something goes wrong, there's a period where you've just got to come back and pay attention. And that gap is quite dangerous. So what happens with automation bias is that people trust the machine but assuming it's perfect. But it's not and it makes mistakes. And so that's what happens.
But it's not just an AI thing. So you talked about more junior people benefiting from AI. We know, from all sorts of research over the last 30 years—you know, I can think of an experiment we did about 20 years ago at Westmead providing handheld devices with access to labs in ICU, versus access to protocols and other data. And of course, the seniors just took the labs, they didn't need anything else and the more junior residents and registrars often will go to the material for support. So, it's what you would expect, really. So it's a tough thing, because how do you counter it? It's not easy. It's a human behaviour, it's kind of built into the brain that we've got. Training seems to make some impact on it. But it's actually one of those lurking concerns.
MIC CAVAZZINI: Do you want to say something about the electronic prescribing that you’ve written about?
ENRICO COIERA: Okay. Look, it's well known that clinicians will listen to what the prescribing system tells them to do. And I can think of one classic case where something was prescribed that shouldn't have been prescribed and the patient died. And when asked, “Why did you prescribe this cocktail?” the clinician said, “Well, I presumed the system was checking it”. And in fact, it turned out that the checker had been turned off. So that's a very good example of trusting that the machine is going to save you when you're out of your comfort zone.
MIC CAVAZZINI: Devolving too much responsibility.
ENRICO COIERA: Exactly. And so when people talk about replacing clinicians with AI, it's the wrong model. The model should be; this is a new member of your team. And you know that your team members strengths and weaknesses, and you work together as a team.
MIC CAVAZZINI: I can’t get through this without mentioning Professor Eric Topol from the Scripps institute who’s written so much on AI in healthcare. In a 2019 review he noted that even an area under the curve of 0.99 is not indicative of clinical utility, as you mentioned, Enrico. He pointed to a deep learning model that had been touted by its developers as making radiologists obsolete because it was so good detecting pneumonia in a chest CT. But that’s the only thing the AI was doing, whereas a radiologist would be considering many more alternatives. Another machine that was trained to make 14 different diagnoses actually did very poorly at picking out pneumonia. Enrico, presumably these narrow AIs will be more feasible to implement, but will radiologists always have an advantage with more undifferentiated patients?
ENRICO COIERA: That’s a very interesting question you ask, actually, and kind of the dogma in AI for many years has been that ‘knowledge trumps reasoning’. So if I've seen it before, I'm going to be better, and that Sherlock Holmes can only do so much without knowledge. So in the early days of AI, there was a lot of focus on reasoning and being really clever, and really amazing work was done. But what we've really learned in the last 20 years is that knowledge is critical. There's a lot of pattern detection, so if I've seen that, before, I know what it is. Or if I don't know what it is, then I probably have a procedure on how to resolve that. When you get down to the ultra-specialized things you really have to have specialist knowledge. Now, that's not a barrier—it's very straightforward to get access to the right sorts of data sets and knowledge bases to create this. It's really about experience, and there's a human experience story and there's an AI experience story, that are achieved in different ways. So yeah, I think I'll stick to knowledge trump's reasoning.
MIC CAVAZZINI: So is the radiologist that’s looking for multiple things in a scan, not just one thing, you mean that they're using knowledge and…
ENRICO COIERA: That’s right.
MIC CAVAZZINI: Okay, yeah. I just find the language confusing because in my head knowledge is the vast sum of information that the AI has at its fingertips, that we don't, whereas reasoning is a unique human trait, but you've kind of flipped that.
ENRICO COIERA: Yeah, machines can reason. I’m sorry about that.
MIC CAVAZZINI: We have to accept it.
ENRICO COIERA: Yeah. You know, Marvin Minsky was a very famous American AI researcher, and he wrote a book called ‘The Society of Mind’ a long, long time ago. The idea being that cognition is actually lots of different mini things working in concert together.
MIC CAVAZZINI: I found an interesting example- a second AI used in endoscopy. But there were some important differences from the one we already described earlier. The model could handle an image rate of 25 frames per second so it didn’t require the clinician to stop and capture a perfect image, which is great. And the AI wasn’t performing any diagnostic classification itself, it would just raise an auditory and a visual alert when it thought it had seen a polyp. Often, small polyps are missed by endoscopists. So this illustrates a very kind of insertion into the workflow. Rather than helping diagnosis, it’s actually supporting endoscopists when they’re fatigued or distracted. Enrico, could that be a more natural fit that takes away the some of the pressure about getting the diagnosis wrong?
ENRICO COIERA: Yeah, look, it's very clear that what we call that the user interaction model is critical to success. And a lot of people when they think of AI, think of “the Oracle model”, you know, you tell it something, it gives you the answer, which is really not what works in health care. So what you described is really just highlighting features of interest. So it could be radiologists scanning images. Radiologists are trained to read images, and ultra skilled at it. But we also know those images get seen by clinicians in ED. So those very same images might be marked up very differently for an ED physician compared to a radiologist.
And you don't even have to provide a diagnosis. You could just say, “Oh, just look over here. This is something” and if they want to know what we think it is, you might click on it and expand. But usually just letting people know that something to look at is important. There's another model called critiquing where you basically input what you think should happen, and then it's just watching in the background. And if it sees something it disagrees with it says, “Oh, by the way, maybe you want to consider this instead.” And sometimes very cleverly, it'll say something like, not that you're wrong, but “Oh, in similar circumstances, 93% of your colleagues elected to do something else.”
MIC CAVAZZINI: That’s what they call, ‘behavioural nudges.’
ENRICO COIERA: Exactly. So designing an algorithm and testing its performance has nothing to do with designing the interaction model, which is really where the rubber hits the road. And the wrong interaction model will see people not use the system. People already are very familiar with over-alerting, which is one interaction model, you know, telling people through popup boxes, things are on, so people just end up ignoring them. They're so sick of low specificity, high volume alerts, that they just don't read anything.
MIC CAVAZZINI: The comparison between those two endoscopy models; what does it say about the relative advantages of humans and machines in, whether it's pattern recognition or other cognitive demands? I mean, there was an example in Eric Topol’s book, “Deep Medicine”, where they did that classic gorilla amongst the basketballers test on radiologists. They overlaid a gorilla on the scan, on some x ray scans or something for cancer, and 83 per cent of the radiologists missed the gorilla waving its fist. So what does it say about the cognitive skills and where are we best to target the advantages of AI versus the advantages of humans?
ENRICO COIERA: Clair, you probably have a few ideas, I think it's where we need help. So I think image-screening tasks are a great place to, for example, eliminate all the obviously normal ones. You know, “these are all obviously normal, so don't worry, check them bit later, if you want. Look, I'm just not sure about these ones, please. You have a close look.” And especially with workforce issues. I don't know, how many people now do mammography screening, compared to the need for such screening. So my answer is, wherever there's need is we do it. And early on in the piece there was there were a lot of people to solving problems that were solvable, [but] not important. And often, what's important is very different. Clair?
CLAIR SULLIVAN: So I think there is some evidence around what sort of tasks are easier for AI. And those are tasks that are high volume, repetitive with a limited suite of transactions. So anything that is—and I want to say low value, but it's not low value, because it's important to the consumers—but that has a very limited suite of transactions and only a few decision points is a great place for AI to help us, because you don't need a PhD in endocrinology to make those decisions. So that's what the AI is for, to take away some of that low-hanging fruit. So that actually the cognitive load of the physician can be freed up from those relatively routine tasks, and instead, we can focus our minds on, what are we missing? What else is going on here? That high level synthesis. And also, what is important to this consumer? So should we, in fact, be screening a 96 year old who's in terrible pain from her terminal cancer?
I think getting the routine, relatively-limited high volume tasks done by AI, I hope, will free us up to think about the higher level, very top-of-our-scope work in terms of clinical diagnoses. But also being consumer-centred in the way that we then apply this. So a great example is you see a very junior doctor, they are absolutely focused on getting the right dose of insulin for a high blood sugar, right? They spent all their cognitive time trying to work that out. You see a very experienced physician, they'll do the insulin in two seconds and then spend the rest of the consultation talking to that person about their lives and how they are and what they want and how they're doing. So, you know that freeing up that bandwidth to work at the top of our scope, I think would be tremendous.
MIC CAVAZZINI: And that's what Topol says about radiologists. Far from becoming obsolete, it l allow them to come out of that dark room and speak to the consumer and explain what the results…
CLAIR SULLIVAN: I’m not sure they will want to, though.
MIC CAVAZZINI: It’s a special breed?
ENRICO COIERA: It’s a very nice, dark room.
MIC CAVAZZINI: To stick with this example of the polyp detection AI, the trial included more than a thousand consecutive patients half of whom were randomised to an endoscopist who was receiving the alerts, and the primary outcome was adenoma detection rate. This came out to be 29 per cent for patients seen with the AI assistance, and 20 per cent for those receiving treatment by a mere human. So that result is pretty solid but most of the effect was driven by smaller polyps which carry less risk for malignancy and there was also an increase in the detection of polyps determined by the operator to be benign. So that raises the age-old question about screening and overdiagnosis, that you’ve addressed in one of your papers, Enrico.
ENRICO COIERA: Yeah, so, I don’t know if you’ve heard the 60:30:10 story? But 60 percent of care is in line with evidence. 30 percent low-value, or waste, and 10 percent is harm. And those numbers are a global estimate, but most countries robustly behave that way. So a lot of what we do is unnecessary, and if you look at being even better at screening and detecting more crop prostate cancers or more thyroid cancers, which will never harm patients in the long-run, leading to over treatment, that's pretty bad. So we need to we need to understand the costs and benefits of ever more accurate screening. If you keep on looking, you will find something.
MIC CAVAZZINI: Yeah and the researchers also admitted that they couldn’t tell if the improvement in adenoma detection was down to the AI itself, or just whether the endoscopists on their best behaviour. Because you can get the same result simply by having a second observer in the room, whether it’s a nurse or a trainee. I think that goes back to the point you made earlier, Clair about, we don’t just use the tech because we can if there’s a simple way to solve the problem. We could just be introducing more avenues for distraction and error.
CLAIR SULLIVAN: Yep
MIC CAVAZZINI: In the colonoscopy example I mentioned before, the software actually has an onboard message reminding the user that “Doctor's diagnosis has priority. Please use this output as a reference.” I hope it has a nice polite voice as well. Enrico, you and Professor Ian Scott have published a checklist of considerations that a sceptical clinician should ask before trusting in the AI. One of these suggestions was to form a pre-recommendation view of what’s going on before getting advice from the AI. And, Clair, the same would be true for any test and scan, isn’t it? You don’t just order it because you’re not sure what’s going on.
CLAIR SULLIVAN: Yeah.
MIC CAVAZZINI: Sometimes? Well, the whole EVOLVE mantra that you want to know what that extra information will do to your change your practice. Is it a fair comparison, do you think?
CLAIR SULLIVAN: Yep, so pre-test probability is important. So for example, there's algorithms that can predict if you have COVID, from your cough. Now, if everybody's got COVID, and everybody's coughing, it's always right. So, yes, you do have to think about that when you're interpreting your AI. So if you think that somebody is having a low blood sugar, and they look like they're having a hypo, and you see the machine says, “Actually, their sugar is nine,” which is normal, you'd be very sceptical about what you're seeing on that blood sugar, because that doesn't match what your pre-test probability for that reading is.
MIC CAVAZZINI: And the ergonomic question comes into more relief when you also think about the different settings in which the same AI might be used in. So let’s use as an example, a device recently approved in Australia to analyse fundoscopic images for diabetic retinopathy, age-related macular degeneration and possible glaucoma. You could imagine this helping an ophthalmologist churn through the easy cases and buy more time to focus their expertise on the borderline cases. But it could also be used by a generalist in an under-resourced regional setting as more of a screening tool. So those two clinicians will be consulting the AI in a different way and will action the knowledge in different ways, and there will be different safety thresholds. Is that the way we should look at inserting these into different settings?
CLAIR SULLIVAN: So I think you have to be careful because, again, if it's only a limited suite of tasks that that algorithm can do, such as detect retinopathy or detect maculopathy, as a screening mechanism it might be entirely appropriate. If it's an asymptomatic population that you're screening, and you have a specific screening task that this algorithm is very good at, that sounds entirely appropriate. If, however, you're coming to the ophthalmologist with visual disturbance, sure, it could screen for those three things. But if it found that, that might not be the cause of your visual disturbance, or it might find nothing, because the cause of your visual disturbance is, in fact, an optic neuritis, for example. So I think it's clear that there's a role screening for a particular issue or diagnosis. I think when you're coming with a problem, and the algorithm’s only been trained on a certain suite of transactions, in a specialist setting, there's more of an issue.
MIC CAVAZZINI: What I was getting to was the specialist should have the expertise to then peel those apart, whereas the generalist might not. So if they're using this tool, you want to have a referral pathway in place to pick up the false positives.
ENRICO COIERA: There's an example that's related which involves consumers. So smartwatches, which a lot of people were and a lot of use them to diagnose AF and arrhythmias—and we had a recently published series of cases where things went wrong. And in this particular case an individual had the smartwatch, didn't feel good had chest pain, it was reporting normal. Of course, because it's designed only to look at arrhythmias, it was not designed to look for infarcts. So key was delayed, because the message from the machine was, “Everything's fine”. And so that was not understanding the purpose and limits of the AI. So you talked about generalists, but consumers are accessing these tools and making clinical decisions.
MIC CAVAZZINI: Let’s change tack, now. You’ve already mentioned the many other ways in which AI could streamline the workflow beyond clinical decision-making. There are many efficiencies in the workplace, in resource allocation, and in the next episode we’ll talk about where natural language processors like ChatGPT. I want to keep it centred around clinical practice, and use an example that Associate Professor Jonathan Chen gave in a presentation for the RACP. And he described an interface that works a bit like the algorithm-driven recommendations in Google and YouTube. You’re ordering a complete blood count and a smear and the AI gives you a list of other tests that often get ordered within 24 hours of those. OK let’s do a reticulocyte count too, because anaemia was high among our suspicions. Then it says, “do you want to add a ferritin, or transferrin, or a haptoglobin assay to your cart?” Sure let’s go with that to rule out haemolysis, he said. This type of tool has lower stakes than the diagnostic AIs we’ve been talking about. Clair, do you think it will be easier to fit into the workflow to create efficiencies in the workflow or are there just as many dead ends?
CLAIR SULLIVAN: Frankly, it sounds terrible. I mean, “adding things to your cart,” that’s a very capitalist approach, isn’t it? I spend my life talking people out of doing blood tests. So just because people have done them doesn't always mean it's the right test. You know, we live in an area of very, very scarce health resources, and those tests do cost a lot and also throw up, furphies. So I'm not sure I would advocate for a system that suggested and encouraged additional pathology testing. That's a very American model of more and more and more, I suspect their investigation model, and the business model is very different to the Australian model. So I think that thoughtful use of nudge, if that's really what your question is, is really wonderful. Often, though, I suspect in our setting, it would be nudging people to reconsider investigations. Because you know, those statistics about how much care we deliver is useless or harmful, I can see a role for that type of AI nudging to reduce unnecessary care to reduce unwanted variation in care and to reduce harm.
MIC CAVAZZINI: Rather than “more and more and more”, I wonder if it's again, a solution without a problem. Like the problem is perhaps the inexpertise of the clinician who's not quite sure where to go and to say, or “have you forgotten something,” that's not the right way to solve it.
CLAIR SULLIVAN: Yeah, being thoughtful and maybe prompting or directing people to guidelines or to review articles about the best way to go forward might be better than purchasing things in your cart.
MIC CAVAZZINI: I mean, a whole section of his talk and his research, in fact, was addressing overuse. There’s evidence from US inpatients that as many as sixty per cent of all lab tests could be unnecessary. And his own research showed that–from Stanford University—that over a four year period almost 800,000 thousand tests were repeats ordered within a 24 hours window; “including tests that are physiologically unlikely to yield new information that quickly e.g., white blood cell differential, glycated haemoglobin, and serum albumin level.” But let’s go back to the AI; his team trained a model to advise whether a given order was likely to yield new information, based on what the EMR contained at the time. Are either of you aware of any AIs like this already in operation? Any evidence about how productive they are?
CLAIR SULLIVAN: In terms of tests?
MIC CAVAZZINI: Yeah, in terms of those in-line—not so much the high acuity kind of diagnostics, but just the workflow assistance.
CLAIR SULLIVAN: I’m not. Enrico might be.
ENRICO COIERA: No, the thing that I would comment about this whole line is that it's associational learning, it's not causal learning. So we spoke earlier about how different populations will yield different rules. So, you know, you do something at Cedars-Sinai or Beth Israel of different sides of the US continent, you'll learn different things and so, and you wouldn't necessarily transport the rules from hospital A to hospital B. So biases in data, all that sort of stuff start to become critical when you do this associational mining. I think it's been well-reported that when you do associational learning in the US you will build in the biases of care. So people who are socioeconomically, or racially disadvantaged, get one kind of care and people who are rich get another kind of care. And the associational learning system will learn that that's normal.
MIC CAVAZZINI: Jonathan Chen acknowledged that these tools should “inform physician decision-making but not dictate or replace it. Ultimately, medical testing decisions are always based on varying levels of diagnostic certainty even if practitioners are only implicitly aware that they are empirically estimating [this]… For example, blood cultures are not performed for every febrile patient because a credible risk of bacteremia is qualitatively recognized only in certain situations.”
CLAIR SULLIVAN: Yeah, exactly.
MIC CAVAZZINI: But the AI could—rather than building it up from a priori rules, if it's learning from clinician practice it's actually adapting to good changes in practice. And guidelines and consensus statements they take so long to develop, they might not keep up with the late very latest.
CLAIR SULLIVAN: Well, the AI will reinforce the set it's learning from and I think that's really important to understand. So the associational mining, just by presenting what everyone else is doing will not help you implement new guidelines. In fact, it will reinforce the status quo. Then when you want to change practice, it's very difficult. So I think I disagree with that, actually.
ENRICO COIERA: Cos you’re also learning bad practice.
CLAIR SULLIVAN: Exactly, you’re learning bad practice. So, new guidelines, sure, you could pop up and say this is not consistent with the current guidelines, maybe have a little read.
ENRICO COIERA: And on the point about how long it takes to produce guidelines, there's a concerted global effort to turn papers and guidelines into computable objects. What does that mean? That means that every time a new RCT is published, we import the data and the results from that RCT into the living digital guideline. So it's always up to date. That was science fiction 15 years ago. That's an active area of work now, and I can imagine within a decade that it will not be uncommon to consult a living digital guideline.
MIC CAVAZZINI: You might have already delivered your verdict, but researchers in Israel have developed an AI a bit like the one I’ve described but instead it was learning about prescribing patterns from the complete electronic record. So for example, the AI would figure out pretty quickly that statins were prescribed only in adults, and those with LDL above a certain level, but never in people with liver disease. So then if you were going to write a script for a patient outside these statistical norms it would be flagged as a potential medication error. Again, this tool hasn’t been programmed with a bunch of a priori rules about what’s indicated and contraindicated. It just follows the behaviour of clinicians. Enrico, everything you’ve said about advantages and the risks of this agnostic approach stands?
ENRICO COIERA: Yeah, absolutely. Look, it's not that they're necessarily wrong. But it's really—association and causation and just different things. So it's interesting, but not necessarily actionable.
MIC CAVAZZINI: Before we go, the questions that were most asked by my review group was, where are they most likely to first encounter AI in their day to day practice? These kinds of tools? Is that the diagnostic aids? Should trainees be taking this into consideration when they're thinking about their future careers as radiologists or pathologists?
ENRICO COIERA: I think people are using it right now. Every time you get a pathology result, you know, the little asterisks? Where do they come from? So it might not be deep learning, it might not be ChatGPT, but it's computers making decisions using some kind of knowledge base. So it's already there. You know, ECG, smart interpretation has been here for ages. It's billable in some countries. But it will become more and more pervasive. So, therapy planning; precision therapy recommendation; care coordination scheduling; the telephone you call at the hospital when you want to talk to somebody, quite likely after ten years we’ll have a bunch of ChatGPT-like people doing it for you. So work does change but making that natural and seamless is the challenge.
MIC CAVAZZINI: Many thanks to Enrico Coiera and Clair Sullivan for contributing to this episode of Pomegranate Health. They’ve both published a lot on this topic, and I’ve provided a long list of useful references and a transcript at our website racp.edu.au/podcast. Keep an eye out for the third and final part of this podcast series, which will be on the governance of AI in medicine. And you can carry on this conversation about how AI will affect practice at the RACP Online Community. Just look for the thread in the general community forum.
Dr Sullivan has also kindly given her time to a video titled CPD Simplified, explaining how to perform Category 2 and 3 activities under the MBA’s new framework. If you have any further questions relating to your own professional development, my colleagues are always happy to help via the address MyCPD@racp.edu.au. They really deserve a thank you for helping 97 percent of Fellows complete their Professional Development record by the first of June deadline.
If you’ve not explored the online resources the College produces for you, have a browse at elearning.racp.edu.au. There’s are courses in microbiology and divisional exam readiness. And the College Learning Series now has hundreds of lectures in paediatrics as well as adult medicine. These are tailored precisely to the training curricula of the Royal Australasian College of Physicians. But we have plenty of Fellows who find these summaries helpful too. There are lectures on every topic you can imagine and they come in packages centred around a particular body system.
If you have feedback or ideas for future podcasts feel free to send them to email@example.com. And please pass this podcast round to colleagues who might be interested. They can subscribe through any pod-catcher app like Spotify, Castbox and Apple Podcasts. There’s even an email alerts list at our website. I also want to thank the College members and staff who gave this episode a listen before it was published.
This podcast was produced on the lands of the Gadigal people. I pay respect to their elders past and present. I’m Mic Cavazzini. Thanks for listening.