CMOR Lunch’n’Learn
11 April 2024
Ross Wilson
Consider, for example, a clinical trial to estimate a treatment effect
We may want to know:
What is the ‘true’ (average) treatment effect?
How certain are we about that estimate?
What is the range of plausible values of the effect?
What is the probability that the treatment is ‘effective’?
A useful way I have seen it described:
Classical methods tell us what the observed data (e.g. from this trial) tell us about the treatment effect
Bayesian methods tell us how we should update our beliefs about the treatment effect based on these data
Maximum likelihood - what true value of the treatment effect is most compatible with observed data?
Hypothesis testing - how likely is it that the observed data could be due to chance alone?
Confidence intervals - what range of values can we be ‘confident’ the treatment effect falls within?
Conceptually, we consider the treatment effect to be a random variable
What is the probability it will rain in Dunedin tomorrow?
\(\mathrm{Pr}(\text{Rain tomorrow})\)
\(\mathrm{Pr}(\text{Rain tomorrow}\ |\ \text{Today's weather})\)
\(\mathrm{Pr}(\text{Rain tomorrow}\ |\ \text{Tomorrow's forecast})\)
\[\mathrm{Pr}(A)\]
to
\[\mathrm{Pr}(A\ |\ \mathit{data})\]
\[\mathrm{Pr}(A | B) = \frac{\mathrm{Pr}(B | A) \mathrm{Pr}(A)}{\mathrm{Pr}(B)}\]
\[f(\theta | y) = \frac{f(y | \theta) f(\theta)}{f(y)}\]
(updated knowledge about a parameter \(\theta\) given data \(y\))
\[f(\theta | y) = \frac{f(y | \theta) f(\theta)}{f(y)}\]
\[\mathrm{Posterior} \propto \mathrm{Likelihood} \times \mathrm{Prior}\]
In principle:
we start with some beliefs about the parameters of interest
review those beliefs in light of the evidence at hand
and calculate an updated belief as a combination of the prior and the new evidence
Prior
Likelihood
Posterior
Where do our ‘priors’ come from?
Previous research
Common sense/intuition?
‘Weakly informative’ or ‘non-informative’ priors
There is usually at least some prior evidence relevant to the research question
Can be much more flexible than traditional approaches
Using more—and more varied—sources of data
More flexible models tailored to particular situations
Bayesian analyses are often quite complex, and care is needed in presentation and interpretation
The prior(s) used should always be explicitly stated and justified
Bayesian analysis produces an estimated parameter distribution, not a single point estimate
It is important to note (as in classical analyses) that the distribution captures uncertainty in the parameter estimate, not between-person variability in treatment effects
The GREAT trial was a study of a new drug for early treatment after myocardial infarction, compared with placebo
The primary outcome was 30-day mortality rate, with data:
New | Placebo | ||
---|---|---|---|
Death | 13 | 23 | 36 |
No death | 150 | 125 | 275 |
163 | 148 | 311 |
Standard analysis of these data gives an OR of (13 / 150) / (23 / 125) = 0.47, with 95% CI 0.24 to 0.97
Prior distribution was based on the subjective judgement of a senior cardiologist, informed by previous published and unpublished studies
‘an expectation of 15–20% reduction in mortality is highly plausible, while the extremes of no benefit and a 40% relative reduction are both unlikely’
Two initial thoughts on this prior:
This is a strong prior judgement, compared to the amount of information provided by the trial
If we are already confident that the treatment is effective, why are we doing the trial?
Bayesian analysis can also be used to ask how the results of this trial should change the views of a reasonably skeptical observer
Assuming a prior centred on no effect (OR = 1), with 95% interval from 50% reduction (OR = 0.5) to 100% increase (OR = 2):
Posterior OR = 0.70 (95% interval 0.43 to 1.14), i.e., no effect would still be considered reasonably plausible
School | Estimated treatment effect | Standard error of effect estimate |
---|---|---|
A | 28 | 15 |
B | 8 | 10 |
C | −3 | 16 |
D | 7 | 11 |
E | −1 | 9 |
F | 1 | 11 |
G | 18 | 10 |
H | 12 | 18 |
We can distinguish 3 different assumptions about the relationship between these estimates:
Identical parameters: All the true effects are identical, and the observed differences are due to sampling variation
Independent parameters: The true effects are independent—knowledge about one tells us nothing about the likely values of the others
Exchangeable parameters: The true effects are different, but drawn from a common distribution
What do these different assumptions imply for our results and interpretation?
Identical parameters: We can pool the results from all studies (weighted by the inverse of the sampling variances)
Independent parameters: Take the estimates in the table at face value
Exchangeable parameters: Estimate a Bayesian hierarchical model
Assume the ‘true’ effects in each school are drawn from a normal (or other) distribution, and estimate the parameters (mean, sd) of that distribution
This requires us to specify prior beliefs about the mean and standard deviation of the effect distribution
(For now, we assume non-informative prior distributions for both)
The effects for all schools are pulled towards the sample mean (between 5 and 10 points, instead of between –3 and 28, but with substantial uncertainty)
Similar hierarchical models can be used for meta-analysis as well
We consider an example of a meta-analysis of beta-blockers for reducing mortality after myocardial infarction
The study included 22 clinical trials, with data for the first few as shown
Study | Control | Treated | Log(OR) | SE |
---|---|---|---|---|
1 | 3/39 | 3/38 | 0.028 | 0.850 |
2 | 14/116 | 7/114 | -0.741 | 0.483 |
3 | 11/93 | 5/69 | -0.541 | 0.565 |
4 | 127/1520 | 102/1533 | -0.246 | 0.138 |
5 | 27/365 | 28/355 | 0.069 | 0.281 |
6 | 6/52 | 4/59 | -0.584 | 0.676 |
7 | 152/939 | 98/945 | -0.512 | 0.139 |
8 | 48/471 | 60/632 | -0.079 | 0.204 |
As before, if we assume exchangeability between the studies, we can estimate a Bayesian hierarchical model for the treatment effects
The results, using a non-informative prior, are
Estimand | 2.5% | 25% | Median | 75% | 97.5% |
---|---|---|---|---|---|
Mean | −0.37 | −0.29 | −0.25 | −0.20 | −0.11 |
Standard deviation | 0.02 | 0.08 | 0.13 | 0.18 | 0.31 |
There are several estimands that may be of interest:
The mean of the distribution of effect sizes
The effect size in any of the observed studies
The effect size in a new comparable study
Estimand | 2.5% | 25% | Median | 75% | 97.5% |
---|---|---|---|---|---|
Mean | −0.37 | −0.29 | −0.25 | −0.20 | −0.11 |
Standard deviation | 0.02 | 0.08 | 0.13 | 0.18 | 0.31 |
Predicted effect | −0.58 | −0.34 | −0.25 | −0.17 | 0.11 |
Observational data can often complement RCT evidence
(Well-conducted) RCTs have good internal validity, but possibly limited external validity
Observational studies are more prone to bias, but better capture real-world practice
There may be value in incorporating both types of evidence to better answer real-world effectiveness questions
From a hierarchical Bayesian perspective, there are several ways we might conceptualise the relationships between observational and experimental evidence:
Irrelevance: Observational studies are subject to bias and we shouldn’t include them
Exchangeable: Studies are exchangeable within types (e.g. observational, RCT), and mean study-type effects are exchangeable
Discounted: Put less weight on the observational studies to reflect their higher risk of bias
Functional dependence: Model the effect as a function of e.g. participant characteristics, which might differ between observational and RCT studies
Equal: Use all evidence from both study types without adjustment
There is obviously a lot of scope to specify different models here—careful sensitivity analyses are crucial
Five RCTs and five observational studies
We consider a three-level hierarchical exchangeable model:
As in the educational coaching example, the ‘true’ effects in each study are assumed to be different (due to e.g. different screening protocols, populations, etc.), but drawn from a common distribution
We observe for each study a random outcome around the true effect due to sampling variation
Unlike the educational coaching example, the common distribution from which each study’s true effect is drawn is not universal, but study type-specific—these study-type mean effects are themselves drawn from a higher-level distribution
We need to specify prior distributions for the overall population effect, the between-type variance, and the between-study variance for each type
Bayesian analysis helps us answer the fundamental question: What should we believe about a treatment effect (or other parameter), taking account of all available evidence?
Bayesian methods are particularly well-suited to (partial) pooling of evidence from different sources
Allows direct probability statements about quantities of interest, and (probabilistic) predictive statements about unobserved quantities
Limitations:
Mathematical (and computational) complexity
The use of perceived subjective priors is sometimes controversial
Flexibility perhaps raises issues of ‘data mining’ or selection of specifications to give desired results
rstanarm
, brms
rstan
, rjags