In the era of targeted treatments and immunotherapies, it is relatively common to address multiple questions in a single phase III clinical trial. These questions often involve using more than one primary endpoint (either co-primary or hierarchically tested), planning a number of interim analyses, and testing more than one experimental treatment and/or cohorts from different populations (e.g., specific subgroups of interest). The feasibility of this strategy, provided that multiple testing is taken into account by adequately adjusting the overall type I error, relies on the expectation that the benefits achieved by new agents will be larger than we’ve seen in the past. It is fortunate that in this time of surging knowledge of tumor biology and, as a result, soaring progress in experimental treatments, large benefits often do materialize, providing on many occasions significant results with very impressive, very small p values. But this is not always the case, and statistical “traps” can emerge.
KEYNOTE-604 was a multicenter, randomized, placebo-controlled phase III study exploring the efficacy and safety of the addition of pembrolizumab to standard etoposide and platinum-based chemotherapy (EP) as first-line therapy, for patients with extensive-stage SCLC (ES-SCLC).1,2 In 2019 and 2020, for the first time in almost 3 decades without treatment progress in first-line therapy for ES-SCLC, approval was granted for the the combination of of each of two immunotherapy agents (anti–PD-L1) with standard EP.3,4 The pivotal studies (IMpower133 and CASPIAN)5,6 evaluating the addition of atezolizumab and durvalumab, respectively to EP both showed a significant OS benefit, whereas KEYNOTE-604, evaluating pembrolizumab in the same therapeutic venue, succeeded only in showing an improvement for the co-primary endpoint of PFS.
To control type I error, investigators for all three studies implemented multiple-testing procedures, allocating portions of the overall alpha significance level to different hypotheses, co-primary and/or hierarchically tested endpoints, as well as analysis times (one or multiple interim and final analyses). Because of the adjustments for multiplicity, the corresponding alpha allocated for overall testing of OS was one-sided significance levels of 0.0225, 0.020 and 0.019 for the combination of EP with atezolizumab, durvalumab, and pembrolizumab, respectively. This type I error was further distributed to the interim and final OS analyses according to the study multiplicity strategy,7-11 with one interim analysis planned for IMpower133 and CASPIAN, and two for KEYNOTE-604.
In each of the three trials, the targeted benefit for OS was substantial, with HRs of 0.68, 0.69 (according to the online appendix; 0.71 in the manuscript), and 0.65 for the addition of atezolizumab, durvalumab, and pembrolizumab, respectively. The power to detect the corresponding OS benefit was set above 90% (91%, 96%, and 94%, respectively).
For IMpower133 and CASPIAN, the OS benefit for the addition of immune checkpoint inhibitors (ICIs) to EP was found to be statistically significant at the single interim analysis, with p values of 0.007 and 0.0047 and corresponding O’Brien–Fleming thresholds for the allocated two-sided alpha of 0.0193 and 0.0178, respectively. It is notable that these interim analyses occurred at 78% and 79% of anticipated deaths, relatively late in each trial’s duration.
A common pattern in the survival curves for all three studies is has been observed , with an apparent overlap for approximately 6 months that is followed by a progressively wider difference in favor of the ICI combination versus EP alone. This observed departure from the exponential survival assumption diminishes the design power, because a number of early events do not provide support for a benefit of the experimental treatment. On the other hand, it points to a long-term benefit for the combination of EP and corresponding ICI that is successfully captured at the interim analysis, which was wisely scheduled late in the trial to overcome the early lack of benefit.
The design of KEYNOTE-604 included two interim analyses for OS, with a somewhat unique schedule: the first one occurred at a relatively early trial time (62% of expected deaths) and the second one occurred close to the trial full-information time point(93% of deaths). The remaining alpha at final analysis, according to the design, was 0.0167. Unfortunately, on the basis of the actual timing and implementation of the interim analyses, the remaining alpha at final analysis diminished to 0.0128. The observed p value of 0.0164, while well below 0.05, exceeded this threshold, and the OS benefit was not found to be statistically significant.
Whereas IMpower133 and CASPIAN were successful in detecting a statistically significant OS benefit, KEYNOTE-604 failed to do so, even as it achieved its goal for PFS. A wiser alpha allocation for OS, as implemented in the successful earlier studies of anti–PD-L1 combination with EP, would have led to a third ICI—the anti–PD-1 pembrolizumab—achieving a statistically significant improvement in OS for ES-SCLC.
When properly controlled, the flexibility of goals and hypotheses often corresponds to a higher bar for reaching significant results, but this strategy can sometimes backfire. Appropriately balancing of risk versus expected benefit must always to be weighted carefully before embarking on a clinical trial.
- Rudin C, Awad M, Navarro A, et al. Pembrolizumab or placebo plus etoposide and platinum as first-line therapy for extensive-stage small-cell lung cancer: randomized, double-blind, phase III KEYNOTE-604 study. J Clin Oncol. 2020;38(21):2369-2379.
- Rudin C, Awad M, Navarro A, et al. KEYNOTE-604: pembrolizumab or placebo plus etoposide and platinum as first-line therapy for extensive-stage small-cell lung cancer. Abstract presented at: 2020 American Society of Clinical Oncology Virtual Scientific Program; May 29-31.
- Committee for Medicinal Products for Human Use. Atezolizumab summary of opinion (post authorisation). European Medicines Agency; 2019. EMA/CHMP/416072/2019. July 25, 2019. Accessed November 25, 2020. https://www.ema.europa.eu/en/documents/smop/chmp-post-authorisation-sum…
- Committee for Medicinal Products for Human Use. Durvalumab summary of opinion (post authorisation). European Medicines Agency; 2020. EMA/CHMP/393807/2020. July 23, 2020. Accessed November 25, 2020. https://www.ema.europa.eu/en/documents/smop/chmp-post-authorisation-sum…
- Horn L, Mansfield AS, Szczesna A, et al. First-line atezolizumab plus chemotherapy in extensive-stage small-cell lung cancer. N Engl J Med. 2018;379:2220-2229.
- Paz-Ares L, Dvorkin M, Chen Y, et al. Durvalumab plus platinum-etoposide versus platinum-etoposide in first-line treatment of extensive-stage small-cell lung cancer (CASPIAN): a randomised, controlled, open-label, phase 3 trial. Lancet. 2019;394:1929-1939.
- Ye Y, Li A, Liu L, et al. A group sequential Holm procedure with multiple primary endpoints. Stat Med. 2013;32:1112-1124.
- Dmitrienko A, D’Agostino RB Sr. Multiplicity considerations in clinical trials. N Engl J Med. 2018;378:2115-2122.
- Burman CF, Sonesson C, Guilbaud O. A recycling framework for the construction of Bonferroni-based multiple tests. Stat Med. 2009;28:739-761.
- Lan K-KG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70:659-663.
- Maurer W, Bretz F. Multiple testing in group sequential trials using graphical approaches. Stat Biopharm Res. 2013;5:311-320.