Breaking the Numbers Barrier: How AI and Bayesian Statistics Are Transforming Rare Disease Drug Trials
The Statistical Impossibility Problem
Victoria Gamerman
RWD Insights
T

here is a paradox keeping the rare disease research community awake at night. The challenge is the required statistically significant evidence of efficacy.

Traditional clinical trials, traditional trial designs, and analysis methodologies need hundreds of patients to achieve that significance with sufficient statistical power. But what happens when your entire global patient population is small, say 200 people, and you can only recruit 50 for a trial?

This mathematical impossibility can block therapeutic development for thousands of rare diseases. Conventional power calculations, designed for diseases with large populations, fail when patient numbers drop below critical thresholds.

The result? People with ultrarare conditions face a cruel challenge: too few of them exist to prove that a treatment works, even when it does.

Fortunately, a quiet revolution is underway. Artificial intelligence and advanced statistical methods are rewriting the rules of what’s possible with small sample sizes. Regulators are paying attention.

AI Strategies Changing the Game

Bayesian Statistics: Borrowing Strength from the Past
Traditional frequentist statistics treat each trial in isolation. Bayesian approaches flip this assumption, systematically incorporating prior knowledge from natural history studies, related diseases, and expert clinical experience.

The impact can be dramatic:

Transfer Learning: Knowledge from Related Diseases
Another question the rare disease research community explores: Why start from zero when related diseases offer valuable insights?

Machine learning can enable AI models trained on data sets from similar conditions to be adapted for insights on rare diseases. Specifically, as was argued in a comprehensive Nature Methods review, transfer learning approaches have demonstrated success across multiple rare disease applications.

The MultiPLIER framework demonstrates this approach: researchers trained models on large public gene expression data sets and successfully transferred them to small rare disease cohorts, extracting meaningful biological signals that would be impossible to detect from disparate rare disease data sets alone. This enables pattern recognition in diseases with limited data by leveraging knowledge from related conditions with larger available data sets.

Such analytical breakthroughs extend beyond genomics. Some examples include:

Synthetic Data: Expanding the Possible
Generative adversarial networks (GANs) can create synthetic data statistically indistinguishable from real data while preserving privacy and clinical validity. This isn’t about fabricating results. Rather, it’s about augmenting real data to train more robust algorithms.

The evidence is compelling: synthetic data augmentation can improve machine learning model performance in small sample applications, enabling statistical analyses in diseases with few patients. However, it is important to be transparent on risks of bias, interpretability, and, as with all clinical research, to ensure sufficient and appropriate data quality.

As this Nature article demonstrates, regulatory guidance and acceptance is growing, provided researchers address multiple scientific components, including:

  • Validation ensuring relevance,
  • Representation of appropriate medical scenarios, and
  • Transparent methodology and limitations.

Real-World Impact Across Therapeutic Areas

Ultrarare Disorders: From Impossible to Approved
As demonstrated in the potential of Bayesian methods, using adaptive designs with historical control data and machine learning integration of, for example, biomarker, imaging, and clinical data, researchers can show treatment efficacy and submit for regulatory approval, an outcome impossible with traditional methods due to sample size limitations.

Rare Oncology: Precision Medicine Meets Small Numbers
In oncology, where genomic information and biomarkers can help with disease identification and precision medicine treatment, machine learning integration of diverse genomic data can provide insights into genomic and clinical mechanisms at play from relatively smaller rare disease genomic data sets.

Pediatric Rare Diseases: The Smallest Populations
Pediatric rare diseases add complexity such as developmental changes, limited outcome measures, and ethical constraints on trial participation. Bayesian models that “borrow strength” across developmental stages can achieve meaningful statistical inference with small pediatric populations.

Regulatory Reality: Growing Acceptance with Clear Guardrails
Both the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) have issued guidance specifically addressing statistical considerations for rare disease development. The message is clear: in certain and specific circumstances, regulatory agencies will accept innovative approaches when properly justified.

FDA expectations include:

  • Comprehensive validation of novel statistical methods
  • Clear documentation of assumptions and limitations
  • Early engagement through formal meetings to align approaches
  • Demonstration that advanced methods maintain scientific rigor

Well-designed studies utilizing advanced statistical methods can achieve regulatory acceptance when meeting FDA guidance requirements for rare disease development. Examples include utilization of external control data in the design of a phase 3 trial, multistage Bayesian studies including umbrella or platform trials, and a specific Bayesian adaptive platform trial case study with disease progression modeling.

How AI methodologies are regulated creates new, uncharted waters for how researchers, patients, and policymakers navigate the landscape. To increase adoption across the rare disease research community, the development of approaches to utilize AI methodologies requires integration of current and emerging regulatory frameworks.

Technical Reality: What Success Requires
Success isn’t automatic. Organizations must address three critical challenges:

  1. The expertise gap is at risk of growing. AI-enhanced rare disease drug development requires specialized expertise spanning rare disease clinical knowledge, advanced statistics, machine learning, and regulatory science. Organizations with comprehensive capabilities can achieve higher drug development success rates.
  2. Data quality is always critical and incredibly important in small sample data sets, which demand exceptional rigor. This is because every missing data point, every measurement error, every protocol deviation carries a magnified impact. In these cases, it is not possible to hide behind large numbers, regression to the mean, or trends.
  3. Validation requirements for novel analytical approaches should be planned for, executed, and explained. Regulatory agencies rightly demand robust validation. This means:
    • Rigorous validation strategies
    • Analytical techniques to prevent overfitting, which may occur when an analysis is too specific to the data it was trained on and, therefore, doesn’t fit as well to additional or new data, meaning it does not generalize well to work on other data
    • Transparent reporting of all analytical decisions and their rationale

A Call to Action for the Rare Disease Community

An ask of all members of the community
For researchers and sponsors, the statistical barrier is no longer insurmountable. Begin integrating AI-enhanced approaches into your trial designs now. Engage FDA, EMA, and other key stakeholders early, and ideally before phase 2, to align statistical frameworks and validation requirements. As the literature indicates, the agencies can be receptive, yet they expect rigor and transparency.

For regulatory scientists, continue developing guidance that balances innovation with patient protection. Consider establishing clearer methodologies for validating novel statistical approaches, particularly for diseases affecting fewer patients globally. International harmonization efforts should also be accelerated, especially in those rare diseases that don’t respect national boundaries.

For rare disease patient advocacy organizations, continue the discussion. Demand that researchers leverage these advances. Push for consortia that enable data sharing and federated learning approaches while balancing the necessary proprietary intellectual property.

A view into our future with timelines and next steps

  • Immediate (next 6 months): Education and literature review of these newer AI- and ML-based approaches is a consistently evolving landscape. Learn and teach others. Form cross-functional teams combining clinical, statistical, and regulatory expertise.
  • Near-term (6-12 months): Develop validated AI frameworks for your therapeutic area, with regulatory scientist engagement.
  • Medium-term (1-2 years): Implement AI-enhanced trial designs, with expected first regulatory submissions.

Expected outcomes

Imagine a world of wider access to meaningful clinical trials for rare disease patients, of faster regulatory approvals, of reduced development costs enabling more companies to enter the rare disease space. Ultimately, there would be more treatments for patients who are waiting and hoping.

The mathematics that once made rare disease trials impossible are yielding to AI-enhanced approaches that respect both statistical rigor and patient urgency.

The question is no longer whether we can achieve meaningful efficacy assessment with small samples. Instead, it’s how quickly we can make these advances standard practice.