Operationalizing GenAI in Clinical Development The Interlocking Barriers to Regulated AI Deployment

Operationalizing GenAI in Clinical Development

The Interlocking Barriers to Regulated AI Deployment

Leila Pirhaji
ReviveMed, Inc.

ringing a new drug to market remains a long, expensive process, and clinical trials account for much of that burden. Artificial intelligence (AI) now offers a credible way to compress parts of this cycle across scientific and operational work: trial simulation and synthetic controls, protocol design and feasibility, safety review, data cleaning and reconciliation, and drafting submission-ready documentation. In controlled settings, these applications have produced measurable productivity and quality gains. Yet a persistent gap remains between what AI can do in a pilot and what organizations can deploy in regulated, inspection-facing workflows.

The limiting factor is rarely “can the model generate an answer.” It is whether the system can satisfy the requirements that clinical development, quality, and IT and security teams must enforce: confidentiality, traceability, accountability, validation, and monitoring.

This article synthesizes structured discovery interviews with 15 leaders across Clinical Development, Clinical Operations, Quality, and Data/IT Security at biotech and large companies, alongside evidence from the AI in Clinical Research track at SCOPE 2026. The picture that emerges is that these barriers are interlocking, not sequential: Trust, data readiness, and workflow integration must be addressed in parallel. Organizations that solve one while neglecting the others find deployment stalled for reasons their governance framework was never designed to address.

Trust Splits in Two: Operational versus Scientific Trust

Trust was the first concern raised in nearly every discovery interview, but not for the reasons most frameworks emphasize. IT and security teams block external AI tools because trial materials—patient-level safety narratives, proprietary program information, and unblinded analyses—cannot risk being exposed outside corporate controls. In parallel, hallucinations have created lasting skepticism: Multiple interviewees described direct incidents in which models generated confident, fabricated clinical content with no reliable mechanism for detection.

The key point is that “trust” is not a single requirement; it separates by function. For operations leaders—those responsible for data management, monitoring, and regulatory submissions—trust means audit trails, document provenance, and the ability to answer “who changed what, when, and why” during inspections. For R&D and scientific leaders—those designing trials, selecting endpoints, and making go/no-go decisions—trust means something fundamentally different: validated, disease-relevant models with transparent performance characteristics and interpretable evidence that a clinician (not an AI expert) can evaluate.

One hematology clinical development leader was blunt: The bottleneck is not access to large language models or document tools; it is the lack of validated models and fit-for-purpose data sets to build and test them. This leader relies on internal AI colleagues to translate model behavior into clinically actionable judgments of credibility, a dependency that does not scale.

This distinction has a practical implication. A governance program optimized for provenance and audit trails will not satisfy the scientific side of the house. And a validation program designed for predictive performance will not satisfy operational inspection requirements. In regulated deployment, both trust requirements must be met simultaneously.

Siloed Data Block Reliable AI

AI performance depends on data quality, consistency, and access. Yet clinical development data are typically distributed across dozens of systems—electronic data capture, trial management, safety databases, laboratory systems, and patient-reported outcomes—each with its own definitions, structures, and permissioning. Before organizations can evaluate whether to trust an AI tool, they often cannot even provide it reliable access to the data it needs.

At SCOPE 2026, one large biopharma company reported building a production AI assistant that consolidated 60+ terabytes from 50+ source systems. Their AI-powered analytic tool reached ~95% accuracy on natural-language queries—but only after they collapsed hundreds of fragmented tables into a small set of standardized, domain-specific tables with consistent definitions and role-based controls. Without that foundation, even high-performing models produced confident but incorrect answers—not because “AI was unreliable,” but because the underlying data relationships were ambiguous or inconsistent.

This is also why data standards are re-emerging as an AI enabler, not a compliance exercise. CDISC’s Digital Data Flow work has been modernizing long-standing standards toward more connected, interoperable models—an approach that makes clinical data more computable and, in turn, makes AI outputs more reliable. The implication is consistent across our research: Data readiness is not a step that follows governance. It is a prerequisite that must be built in parallel with governance if AI is to move from pilots into production.

Workflow Fit Determines Whether AI Scales

Even the most capable AI tool delivers little value if it cannot fit into how teams already work. And where AI does add efficiency, it is rarely “push-button.” Multiple interviewees emphasized that AI-assisted work still requires human oversight, iteration, and domain expertise—particularly in regulated contexts. Organizations that promise instant automation often end up creating the opposite outcome: frustration when the workflow remains iterative, followed by skepticism about whether AI is worth the effort.

This “workflow embedding” gap was a consistent theme in the AI in Clinical Research track at SCOPE 2026. Both our interviews and conference evidence pointed to a repeatable pattern: Small, cross-functional teams (often one technical lead paired with one clinical operations lead) moved faster than large, centralized AI initiatives. Broad “chat with everything” approaches—applying conversational AI across every system and document type—consistently disappointed, while targeted, high-value use cases succeeded. In one such deployment, an AI-powered protocol optimization tool was introduced to a small cross-functional team to flag design risks such as overly complex eligibility criteria or endpoint selection issues before they became costly amendments.

Adoption also tends to follow a bottom-up path. The tool only became mandatory after early users demonstrated impact and recruited peers organically. Mandates imposed before proof of value reliably stalled.

Perhaps the most revealing finding came from this deployment: More than half of the AI’s protocol optimization recommendations did not introduce new insights. Instead, they validated concerns teams already had and helped convert uncertainty into decision by providing a structured, documented rationale. The value was not in confirming a preferred answer; it was in providing source-linked, auditable evidence that could withstand regulatory scrutiny, turning informal consensus into inspection-ready documentation, AI’s most scalable role in clinical development may not be generating novel answers; it may be in giving teams source-linked, auditable evidence (and therefore the confidence) to act on what they already suspect, turning informal consensus into inspection-ready documentation.

What This Means for the Industry

The Manual State Is the Actual Trust Crisis
The industry’s instinct is to ask whether AI can be trusted. The more urgent question is whether today’s manual, duplicate-entry workflows deserve the trust they receive by default. In a multisite cancer clinical trials study of EHR-to-EDC workflows, manual entry produced a 5.8% field-level error rate; using an EHR-to-EDC application with human oversight reduced errors to 1.2%—a 79% reduction. Notably, the remaining errors were largely attributable to human entry, underscoring a practical point: The dominant risk in many workflows is not “AI hallucination” but routine transcription.

This matters because manual transfer is still common.

A 2025 eSource survey reported that 72% of sites continue to transfer data from eSource to EDC manually. Meanwhile, avoidable protocol churn remains expensive: Tufts CSDD has estimated the median direct cost of substantial protocol amendments at $141k (phase 2) and $535k (phase 3). These are not abstract inefficiencies. Every day a trial is delayed is a day patients wait for therapies that may already exist in a sponsor’s pipeline.

Overlay Beats Replacement
A consistent finding from both our discovery interviews and SCOPE 2026 conference evidence: Organizations cannot rip out and replace their existing infrastructure. As one former pharmaceutical executive noted, replacing a global trial infrastructure is not flipping a switch—it is a change-management project measured in hundreds of millions of dollars. Companies hesitate to abandon systems that work well enough, even when more efficient alternatives exist.

The AI deployments that succeed are those that overlay existing systems: reading data where it lives, integrating with current tools, fitting within established governance gates. They do not ask teams to learn new systems or abandon familiar workflows. They make existing processes faster and more reliable without requiring organizational transformation as a prerequisite.

A Readiness Checklist

Based on the evidence from our discovery interviews and SCOPE 2026 conference, teams preparing to move AI from pilot to production should assess readiness across all three barrier domains simultaneously:

Trust and governance:

Provenance, auditability, and accountability: Can every AI output be traced to its source data with timestamps, user actions, and model versions? Is a named human owner responsible for each final output, documented in SOPs?
Scientific validation: For predictive or scientific applications, is there a validation summary defining intended use, evaluation data sets, performance characteristics, error modes, and known limitations?
In practice: One enterprise deployment linked every AI-generated protocol recommendation to its source data, model version, and the named clinical lead who approved it, creating an inspection-ready audit trail.

Data infrastructure:

Standardization: Are critical sources standardized with consistent definitions, or is the model forced to navigate fragmented, inconsistent systems and table semantics?
Access controls: Does clinical data remain within the sponsor’s controlled environment, with explicit governance over AI access, permissions, and any model training or fine-tuning?
Data lineage: Can teams document where the data came from, how it was transformed, and which version was used for a given output?
In practice: One large biopharma company achieved ~95% query accuracy only after collapsing hundreds of fragmented tables into standardized, domain-specific definitions with consistent role-based controls.

Workflow integration:

Embedded gates: Is AI embedded into existing decision gates and governance processes (rather than operating as a standalone “side tool”)?
Realistic operating model: Are expectations calibrated? Does the organization plan for human oversight, iteration, and exception handling (instead of full automation)?
Adoption pathway: Is adoption designed to be bottom-up first (pilots with clear wins, project champions, and voluntary adoption by colleagues), before any mandate is imposed?
In practice: A protocol optimization tool gained traction only after a small cross-functional team demonstrated impact; it became mandatory through voluntary colleague adoption, not top-down mandate.

Looking Ahead

The path from AI pilot to production in clinical development does not run through better models alone. It runs through solving three problems at once: earning trust through transparency and validation, building data infrastructure that AI can actually reason from, and embedding tools into workflows without demanding organizational upheaval as a prerequisite.

What is needed now is not more pilots. It is the unglamorous work of building trust infrastructure, standardizing data foundations, and redesigning workflows to include AI as a governed participant rather than a bolt-on experiment. A fully integrated approach remains ahead of current practice; to our knowledge, no organization has yet fully operationalized all three in parallel. Early enterprise deployments described in this article demonstrate progress on individual dimensions; organizations that begin this work in parallel, rather than sequentially, will be the first to move AI from conference demos to inspection-ready production.

For patients waiting for therapies already in development pipelines, the cost of that delay is not measured in dollars. It is measured in time they may not have.

DIA Learning offers several programs and opportunities for you to learn more about Clinical Research.

Back to Issue