Correlation vs. Causation: How Causal AI is Helping Determine Key Connections in Healthcare and Clinical Trials

Correlation vs. Causation: How Causal AI is Helping Determine Key Connections in Healthcare and Clinical Trials

Aaron Mackey
Lokavant

ith all the excitement around artificial intelligence (AI), many in healthcare are exploring ways to use this technology to improve how we discover, develop, and test new medicines. While the hype around AI can be somewhat overblown, there are real ways it can significantly improve clinical research, specifically with decision-making for trial design and execution.

Most people are using ChatGPT and other foundation model-based AI agents to generate content: i.e., images, videos, text, and code. Drug discoverers are using this form of generative AI to design new molecules, new chemistry, new gene therapies, and new medicines, while drug developers are using it to draft regulatory, legal, and patent documents and even marketing content.

But generative AI, despite all its strengths and accomplishments, also has its limitations. No matter how powerful the apparent reasoning capabilities may seem, it cannot be relied upon to correctly infer cause-and-effect relationships. More explicitly, generative AI “out of the box” cannot distinguish between apparent correlations and actionable causality. This is a problem. Causal AI, a form of artificial intelligence designed to identify and understand the cause and effect of relationships across data, now in its pilot stage, can address this challenge.

Correlation is Not Causation

Correlation is a statistical measure that shows the direction and size of a relationship between two or more variables, but it doesn’t necessarily mean that one variable causes the other to change. Causation, also known as the relationship between cause and effect, is when one event is the result of an antecedent event. The phrase “correlation does not imply causation” means we cannot assume a cause-and-effect relationship between two variables just because they’re correlated.

In healthcare, this distinction is vital. We can easily (sometimes too easily) identify any number of correlations, but only a subset of these has true causal underpinnings. For instance, observational studies in the past have shown that people who regularly drink red wine or consume high doses of vitamin C live healthier, longer lives. But further research has revealed that these connections are tenuous at best, if not completely wrong! What scientists observed were correlations: two things that occurred in the same group, but one did not cause the other. People who consumed wine tended to be wealthier and able to access healthcare, and people who consumed megadoses of vitamin C were likely already concerned about their health and pursuing an active lifestyle.

“In the healthcare domain, leveraging traditional machine-learning approaches has helped to elucidate relationships between patient genetic profiles and response to specific compounds. However, as we venture into deep learning, identifying the impact and effectiveness of treatments is exponentially harder to pinpoint,” said Jonathan Crowther, Pfizer’s Head of Predictive Analytics and Lokavant Product Advisory Board member. “This is where causal AI is relevant, as it will help us home in on events that may be driving the relationship between patient and drug response that illuminates us with greater insight than traditional predictive models.”

Random chance, a third variable that makes the relationship seem stronger or weaker, and multiple factors interacting are some reasons why correlation may not equate to causation. Hypothesis testing and controlled experiments can help rule out false positives and confirm relationships; this approach is foundational to clinical trials.

Causal AI’s Crucial Role in Clinical Trials

Understanding the difference between correlation and causation is crucial when planning to use AI to improve clinical trials, as even small changes in how a study is designed, where it takes place, the patients it recruits, and how the trial is carried out can have a big impact on the outcome—operationally, scientifically, and commercially.

“Causal AI is a game-changer for clinical trials,” said Colin Hill, CEO and founder of Aitia Bio, the leader in the application of causal AI and digital twins. “By untangling complex biological networks and identifying true drivers of disease progression, we can make more informed decisions about drug targets and patient selection. This approach has the potential to dramatically improve success rates and bring effective treatments to patients faster.”

It’s time to harness the power of causal AI methods to help study teams optimize their trials across multiple dimensions. This is a complicated technology, so teams can start by using causal reasoning to optimize geographic and site selection strategies where we know many of the underlying causal reasons why a trial could recruit faster or slower in certain regions and at certain sites, and we can account for those reasons directly in our prescriptive AI modeling approach.

But the opportunity for impact goes much further. Full-blown causal AI methodology can make strong, prescriptive recommendations on everything from eligibility criteria to assessment schedules, protocol design, and even portfolio-level strategy decisions. This is where causal AI can be truly transformative. By analyzing historical trial data through a causal lens, we can start to uncover the true drivers of trial performance and surface those insights to trial sponsors. Causal AI helps us distinguish between what’s a correlation and which of those correlations are causal.

Before we get lost in all the “c’s,” here is an example from a current, ongoing pilot project:

We might observe that a particular historic comparator trial that happened to also have looser eligibility criteria recruited patients faster than some other comparator trials. But is this a direct causal relationship between eligibility criteria and recruitment, or might there be additional confounding factors at play? Perhaps other observable differences in trial operation could also account for the differences in recruitment rates? Or, even more insidiously, they could account for unobserved differences, such as widespread direct-to-patient marketing campaigns. And what if improving one metric (say, recruitment rate) leads to a decrease in some other metric (i.e., sites recruiting patients faster aren’t prepared to manage the larger influx of data and information required for the trial)? Causal AI can help us tease apart all these complex relationships between upstream causal factors and downstream indicators of interest and identify the individual changes most likely to impact our key performance indicators.

“In healthcare, distinguishing between correlation and causation is critical,” added Hill of Aitia. “We’ve seen countless examples where apparent correlations led us down the wrong path. This is one of the main reasons why more than 90% of new therapies fail in development: they are relying on limited information and correlations. Causal AI allows us to move beyond simple associations and understand the actual mechanisms driving disease, which is essential for developing truly effective interventions.”

Of course, inferring causality from observational data is difficult. One of the key challenges is dealing with variables that influence both the interventions and the outcomes, potentially leading us to draw false conclusions. There are the factors we know—such as patient age, gender, and socioeconomic status—the factors we suspect, and the unobserved, hidden “unknown unknowns.” Addressing these is crucial in planning clinical trials. Causal AI can address both the known and hidden factors, leveraging sophisticated techniques to control known factors and even infer the presence of hidden ones from their joint effects seen across large data sets. Even more exciting, early pilots of the technology show that we are then able to make individualized predictions, specific to every trial. This is what sets causal AI apart from traditional predictive modeling approaches that model average expected effects.

With prototype causal AI solutions, sponsors can confidently ask and answer questions such as:

How will loosening/tightening this eligibility criterion impact recruitment rate and patient retention for our specific trial?
What’s the optimal visit schedule to maximize data quality while minimizing patient burden?
How should we allocate our clinical development budget across different trials to maximize our chances of overall portfolio performance?

“Incorporating causal AI not only enhances operational efficiency but also significantly reduces trial risks by mitigating biases, identifying hidden variables, and fine-tuning interventions,” explained Crowther. “This leads to faster, more cost-effective trials, higher data integrity, and, ultimately, improved patient outcomes. By making causal AI part of their core strategy, sponsors can transform trial design and execution, improving both short-term trial success and long-term innovation in research.”

Benefits of Continuous, Iterative Feasibility Analysis

When it comes to clinical trial feasibility analysis, current AI-fueled technology platforms don’t understand red flags correctly. For example, some sites recruit more participants in a month than projected, others less. The next month could be different. Many predictive tools inadvertently identify decreases in rates—spiky behavior that’s part of the natural ebb and flow of recruitment—as red flags. Many of the tools on the market also don’t properly account for realities like site exhaustion, when there are simply no more trial participants left to recruit in each area; this is simply an example of correlation not causation.

To conduct continuous feasibility analysis in a way that matters, operations teams need better solutions than just basic forecasting tools with the potential for inaccurate correlations or false alarms. They need to see the projected particulars of what would happen in alternate scenarios, combined with informed guidance on what to do when advanced tooling detects that things are beginning to veer off course, or when a new factor emerges that wasn’t there when feasibility was first examined. Causal AI offers prescriptive guidance by providing recommendations based on a combination of historical real-world data and current, fluctuating trial data in real time so teams can take remedial action before trials veer off course.

“By leveraging causal AI, we can go beyond traditional predictive models and uncover the hidden drivers of outcomes, allowing us to make more precise, individualized predictions for each trial,” concluded Crowther. “This capability is transforming how we design clinical trials and ultimately helps us deliver more effective, tailored approaches to strategies that will enable on-the-fly adjustments empowering study teams to react to dynamic changes during the study conduct phase.”

What if a competitor suddenly enters the picture after a trial is underway? What if an opportunity arises to end a trial six months earlier? What would that look like? How would we change the protocol? Clinical research is analogous to space research in that they are both risky and a lot can go very wrong, very quickly. Therefore, when NASA plans to launch a spaceship, engineers carefully create a flight plan with a calculated, multistep trajectory: the initial rocket thrust required to achieve exit velocity, a clever slingshot around the moon to gain some additional acceleration, a reverse thrust burn to slow the approach, and so on.

But during the voyage, NASA engineers also know there will be many smaller details to monitor, and certainly some necessary adjustments to make – small engine burns, course corrections, vehicle attitude or orientation adjustments. NASA anticipates these midflight modifications, even without knowing exactly what they will entail or what specific challenges might necessitate them. NASA labels these “anticipatory work” and breaks them down into two groups: long-term and real-time. Prepping for the unknown, closely monitoring in-flight progress, and executing minor, just-in-time adjustments are essential. NASA’s preemptive engineering is also the key component to clinical trial success.

Causal AI is key here and yields insights that can lead operations teams to consider scenarios they might not have otherwise, ones that could mean the difference between success and failure because they’ll be getting more of the “why.” Such insights also provide reliable, data-driven guidance to junior study staff, a key advantage, especially amidst current industry-wide staffing challenges.

Most importantly, feasibility analysis should be ongoing, an iterative process rather than just a box to check once at the beginning of a trial. Advanced preparation and causal AI-supported planned mid-journey checks and adjustments will help ensure successful clinical trial execution.

While the potential of causal AI is immense, it’s important to maintain a balanced perspective and not get too breathless about related new technologies. Causal inference from observational data remains a complex challenge, and we must be rigorous in our approach. Even so, recent advances in causal AI methods offer a significant step forward in our ability to design and execute more efficient, effective clinical trials.

Back to Issue