Time to Assess the Value of External Control Arms in the Development of Cancer Therapies

Tito Fojo
Columbia University Medical Center and James J Peters VA Medical Center
Mengxi Zhou
Columbia University Medical Center and James J Peters VA Medical Center
Krastan B. Blagoev
National Science Foundation and Johns Hopkins University
Susan Bates
Columbia University Medical Center and James J Peters VA Medical Center
N

ovel approaches to cancer drug development are urgently needed to successfully address the challenges of clinical trial enrollment, rising drug development costs, and persistently high failure rates of randomized clinical trials. While external control arms based on prior or historical data could offer a solution, adopting conventional clinical trial control arms as a standard can be difficult. Using a novel set of equations that define the growth and regression rates of tumors, we have demonstrated that the rate of tumor growth is a reliable biomarker of overall survival. Here, we propose an innovative application of this analysis method alongside an external control for more successful and rapid drug development.

Key Takeaways

  • Regulatory agencies are open to consider novel methods of assessing drug efficacy.
  • Tumors grow and regress exponentially, and the two components can be mathematically deconvoluted.
  • The rate of tumor growth (g) determined while a therapy is administered is an excellent biomarker of overall survival.
  • For purposes of the external control arms, the estimated rates of g are the data output that anchor the analyses.
  • Data generated with an experimental therapy can be benchmarked against an external control with as few as twenty patients to inform drug development.
  • Provision of clinical trial data should be a moral imperative and made broadly available in public warehouses such as Project Data Sphere.
  • External control arms are ideas that, when validated, stand to benefit all stakeholders. The time has come to test these concepts.

Drug Development – Where We’ve Been and Where We Are

The development of cancer drugs remains a work in progress. With randomized trials viewed as the gold standard, drug development usually defaults to trials that too often are very large in size, attempting to achieve statistical significance albeit minimal clinical significance. Furthermore, because progress has usually been incremental, or occurs in parallel with similar drugs being developed simultaneously, thousands of patients are enrolled annually in control arms where everyone receives the same standard of care or even placebo control. Examples include the use of docetaxel in prostate cancer, gemcitabine in pancreatic cancer, etoposide plus platinum in small cell lung cancer, and FOLFOX or FOLFIRI in colorectal cancer. With clinical trial enrollment always a challenge, the cost of drug development continually rising, and with a persistently high rate of failures in randomized trials, novel approaches to drug development are needed.

One suggested approach is the use of an external control arm based on prior or historical data. However, adopting conventional clinical trial control arms as a standard can be difficult due to differences between trials of the same order of magnitude being sought in a study that might use the standard.

Using a novel set of equations that define the growth and regression rates of tumors, we analyzed clinical trial data in thousands of patients and across a range of tumor types. We demonstrated that concurrent rates of tumor growth and regression can be reliably determined, and that the rate of tumor growth is a consistently reliable biomarker of overall survival. A novel use of this method of analysis alongside an external control could result in more successful and rapid drug development.

Estimating Rates of Concurrent Tumor Growth and Regression While a Therapy is Administered

Our studies have demonstrated that tumors grow and regress exponentially and that the two components can be mathematically deconvoluted. The novel paradigm we use to estimate efficacy has been extensively vetted across a broad range of cancers.

We previously confirmed that data from most patients fit exponential equations. The used regression-growth models assume that tumor quantity change (assessed radiographically or by the levels of tumor markers) during therapy is the result of simultaneous exponential decay/regression, termed d, and exponential growth/regrowth of the tumor, termed g. By inputting radiographic measurements, or tumor markers such as serial PSA or CA19-9 values, or M-spike determinations into the TUMGr package for R, the rates of tumor growth (g) and regression (d) can be calculated.

The Power of the Method

To illustrate our novel approach to use external control arms we mined data stored at Project Data Sphere, an easily accessed public data warehouse that currently houses data on over 100,000 patients. As examples we chose prostate and pancreatic cancer – two cancers where drug development would have benefited and still would benefit from external control arms. In these examples we used older trials whose data had been ratified by time. In the case of pancreatic cancer, the raw data for current standard of care therapies have yet to be made available. The goal with these examples is not to advance any therapy at this time, but rather to demonstrate the approach one would take using current standard of care data.

In our prostate cancer example, we looked at cases where prednisone or docetaxel was administered as first line after androgen deprivation therapy. For purposes of the external control arms, the estimated rates of tumor growth (g) are the data outputs that anchor the analyses. We analyzed datasets from 2,902 men with a diagnosis of prostate cancer who were enrolled in six clinical trials and randomized to the control arms. The therapies used in the control arms were either prednisone (1,132 men; 2 datasets) or docetaxel (1,770 men; 4 datasets). The median of the estimated g values—i.e. the rates at which the drug-insensitive tumor cells were growing while receiving treatment—was 0.0032/day for prednisone, and 0.009/day for docetaxel, consistent with the known efficacy of docetaxel. The distribution of these g values and their estimated doubling times are shown in the left panel of Figure 1.

The middle panel in Figure 1 demonstrates that in all tumors and with every therapy analyzed to date, g is an excellent biomarker of overall survival. To illustrate the value of how such data might be used as a reference against which to benchmark, we constructed a simulation analyses shown in the right panel of Figure 1. In these analyses, the reference or external control is the prednisone data – either the entire dataset from 1,132 patients or random samples of data from 100, 200, and 300 of the 1,132 men.

[A] Box plot depicting distribution of g values showing that the rate of tumor growth
[B] Kaplan Meier (KM) plot for each quartile of g values supports g as a biomarker of overall survival
[C] Simulations using control and experimental groups
Figure 1: The quantity of tumor represented by PSA values available on Project Data Sphere was analyzed and fit using four exponential tumor growth models. Values for the estimated tumor growth rate constants (g) were pooled to form control and experimental groups—prednisone and docetaxel, respectively. [A] Box plot depicting distribution of g values showing that the rate of tumor growth is slower with docetaxel. [B] Kaplan Meier (KM) plot for each quartile of g values supports g as a biomarker of overall survival. KM curve to the left represents the quartile with the fastest growth rates; KM curve to the right shows the quartile with the slowest growth rates. Two intermediate quartiles are also shown. [C] Simulations using control and experimental groups. Control used data either from all 1,132 patients receiving prednisone or only from cohorts of 100, 200, or 300 of the entire prednisone group. To this reference or external control we compared a sample from the experimental group ranging in size (n) from 10 to 200, with n chosen randomly with replacement. In each simulation, we conducted 1,000 replications, comparing the g values of the n cases against the control using a two-sided Wilcoxon Mann–Whitney U Test. For each sample size n, we calculated the proportion of analyses for which the test result was significant (p < 0.05) as the simulated power and minimum n required to achieve 80 or 90 percent of the recorded power. Similar results were achieved with as few as 200 and 100 patients, i.e., much fewer than enrolled in each individual trial.

The power analysis is designed to establish the number of men who needed to be treated with docetaxel to establish, with 80-90 percent confidence, the superiority of docetaxel relative to prednisone with a p-value of 0.05 (see table in figure legend for the relevant numbers). The sample size plotted on the x-axis increases incrementally as data from one man is randomly chosen with replacement from the experimental docetaxel cohort—a process that was repeated 1,000 times at each sample size. With an alpha of 0.05, the y-axis quantitates the power to detect this difference. The remarkable power of this analysis is shown by the small number of men whose data would have been needed to predict that docetaxel was a superior therapy with powers of 80 percent and 90 percent (see table in figure legend). Similar results were observed with the individual studies or with all four studies combined, which speaks to the reliability of the estimates. And the similar results with cohorts of as few as 200 and even 100 patients in the reference prednisone arm underscore the reliability of predictions made utilizing g as the endpoint with even very small reference cohorts of good quality. Thus, the experimental arm of even one well-designed and executed clinical trial can become the external control against which subsequent studies are benchmarked.

A recurrent observation in our analyses is that with effective therapies such as docetaxel, there occurs a tightening of the distribution of g values. In fact, the importance of tightening is strongly suggested in Figure 2 showing the results of our pancreatic cancer data analysis. Here, we used data stored in Project Data Sphere to compare 5-FU and gemcitabine. In this example, the values for 5-FU come from its use in the second line setting with those of gemcitabine coming from two of the innumerable studies where it was used as reference in the first line. Their use in different lines of therapy explains the apparent superiority of gemcitabine compared to 5-FU. Here, as with the prostate cancer data, a limited number of patients were needed to demonstrate inferiority of 5-FU relative to gemcitabine. The somewhat higher numbers needed to achieve an 80 percent power—64 in this analysis with 322 gemcitabine-treated patients as the reference arm—likely reflect the less tight distribution of the pancreatic cancer data compared to that in prostate cancer, especially the tighter distribution seen with docetaxel, a very effective therapy for prostate cancer. The latter tightening can be anticipated with both FOLFIRINOX and the combination of gemcitabine with nab-paclitaxel and will make it possible in future pancreatic cancer analyses to establish superiority of novel therapies or combinations with 80-90 percent power and with far fewer patients.

[A] Box plot of the distribution of g values shows that growth rates are slower for gemcitabine in first line than 5-FU in the second line
[B] Kaplan Meier (KM) plots show g values as biomarker of overall survival, with values for all patients divided into quartiles.
[C] Simulations as in Figure 1.
The quantity of tumor in patients with pancreatic cancer as estimated by CA19-9 values available on Project Data Sphere table
Figure 2: The quantity of tumor in patients with pancreatic cancer as estimated by CA19-9 values available on Project Data Sphere. Gemcitabine in the first line of therapies was used as a control or reference group; 5-FU in the second line was used as the experimental group. [A] Box plot of the distribution of g values shows that growth rates are slower for gemcitabine in first line than 5-FU in the second line. [B] Kaplan Meier (KM) plots show g values as biomarker of overall survival, with values for all patients divided into quartiles. KM to the left: quartile with the fastest g; KM to the right: slowest g; two intermediate quartiles are also shown. [C] Simulations as in Figure 1. In this case, 64 patients treated with 5-FU provide 80 percent power of inferiority of 5-FU relative to gemcitabine. The higher numbers compared to prostate cancer docetaxel analysis are likely indicative of a less tight distribution of the pancreatic cancer data.

In both examples one can see how the performance of an experimental cohort could be compared to an external control, even though the data here (as in any benchmark analysis) come from different sources with data collected independently. Indeed, as noted above, the interval of assessment is irrelevant. The power models, especially for the tight docetaxel data, demonstrate how making such external controls public could allow for the benchmarking of data. Possible uses are discussed below.

Leveraging Existing Data to Develop External Controls

The past few decades have witnessed a gradual evolution in the way cancer drugs are developed and how regulatory agencies approve them. The current approach, admittedly imperfect, remains a work in progress. We propose a novel paradigm that, if properly deployed, can help accelerate drug development without jeopardizing safety or assessment of efficacy.

The potential uses of external control arms are manifold. Here are a few highlights:

  1. Widespread availability of such external control data on a public platform will allow not only pharmaceutical companies but all those involved in cancer drug development a reference against which to benchmark their new therapies or drug combinations. As the results show, benchmarking against such a reference (which, if of good quality, could be data from as few as 100 to 200 individuals) provides insight that could inform go/no-go decisions.
  2. In drug development, such reference data could be leveraged to streamline clinical trial design. Starting with a randomized design where the control arm is the same as the external control, pre-planned analyses could inform those conducting the trial that the ongoing control arm emulates the reference and provide support for either dropping the control arm altogether or exaggerating the enrollment disparity so that a disproportionately high fraction of enrollees are randomized to the experimental arm.
  3. For drugs with accelerated approvals where full approval will require a second study, we would envision that a second study could be conducted utilizing the external control. This latter approach may help regulatory agencies appreciate and become comfortable with the concept of external controls by providing the security of a conventional trial design in the study that leads to the accelerated approval. Note here that with this approach, the “experimental data” in the confirmatory study would be benchmarked against both the external control and the “experimental data” from the registration trial, thus not only demonstrating its superiority to the external control, but also ratifying the registration trail data.
  4. In the management of rare diseases or tumors harboring infrequent mutations where the number of patients preclude randomized comparisons, external controls could play pivotal roles. Taking the concept of the external control even further, we would argue that in such diseases, where no or few drugs are approved, one could leverage the indifference of our novel platform to time of assessment, and develop an external control arm using data from diverse sources, including standard of care practice, in effect benchmarking against the existing options that the investigational therapy is trying to improve upon. We would argue that, while what is proposed here is an “out-of-the-box” approach to drug development, if validated it would replace the current approach used to manage such patients where unproven therapies are used “off label” or in desperation.

With appropriate safeguards, external control arms based on our novel assessment methods can be deployed, validated, and used in comparative analyses. The key becomes timely sharing of data. The innumerable hurdles that are encountered all too often as one tries to obtain data for a therapy that was published and received regulatory approval years earlier are unacceptable and unnecessary. Holders of such data should see their provision as a moral imperative and voluntarily make it broadly available. With its unhindered path to data acquisition and its planned expansion of tools for data analysis, Project Data Sphere is an excellent model that can be used or emulated. What we propose here or what others may advance as alternate approaches for external control arms are ideas that, when validated, stand to benefit all stakeholders. The time has come to test these concepts.

Correspondence and requests for references or other materials should be addressed to Tito Fojo at atf2116@cumc.columbia.edu.