Treatment-effects estimation Number of obs = 10000 Thanks to David Drukker, of Stata Corp., for assistance with the following code.įirst, we use teffects to give us estimates of the marginal mean of the binary outcome (equivalent to the probability that y=1) when z is set to 0 and then to 1: An alternative is based on the theory of estimating equations, and is implemented in Stata's teffects command. We have found a point estimate for the risk ratio, but we would of course also like a confidence interval, to indicate the precision of the estimate. However, the approach is still perhaps not widely used. The approach we have described here is not new - see this paper by Sander Greenland. This gives us an estimated risk ratio, comparing z=1 to z=0, of 1.43, identical to the risk ratio estimated when we first simulated data in which treatment assignment was entirely random (and in particular independent of x).īy using a logistic regression working model to come up with the predictions, we overcome the numerical difficulties which are often encountered when one instead attempts to directly fit a GLM adjusting for the confounders with a log link and binomial response. Now to estimate the risk ratio for the effect of z=1 compared to z=0, we simply take the ratio of the marginal risk under these two conditions, i.e. We then set all individuals to z=1, and again calculate P(y=1). We then set all individuals z to 0, and ask for the predicted probability that y=1.
#Teffects stata 12 code
This code first generates a new variable, zcopy, which keeps a copy of the original treatment assignment variable. However, we can use this model to calculate predicted probabilities for each individual assuming first that all individuals are not treated (z=0), and then assuming that all individuals are treated (z=1): This of course gives us an odds ratio for the treatment effect, not a risk ratio. Logistic regression Number of obs = 10000 To do this we first fit an appropriate logistic regression model for y, with x and z as predictors: This problem, of log link GLMs failing to converge, is well known, and is an apparent road block to estimating a valid risk ratio for the effect of treatment, adjusted for the confounder x.Įstimating the risk ratio via a logistic working modelĪ relatively easy alternative is to use a logistic working model to estimating a risk ratio for treatment which adjusts for x. This however fails to converge, with Stata giving us repeated (not concave) warnings. Iteration 3: log likelihood = -5733.9167 (not concave) Iteration 2: log likelihood = -5733.9167 (not concave) glm y z x, family(binomial) link(log) eform The most obvious approach is to add x to our GLM command: Using a log-link generalized linear model How then can we adjust for the confounding effects of x, and estimate the risk ratio for z=1 compared to z=0? The crude risk ratio is now biased upwards, since we have generated the data such that those with higher values of x are more likely to be in the z=1 group, and those with higher values of x are more likely to have y=1. To do so we simulate a new dataset, where now the treatment assignment depends on x: Let us now consider the case of observational data. The risk ratio is estimated as 1.43, and because the dataset is large, the 95% confidence interval is quite narrow.Įstimating risk ratios from observational data To illustrate the methods to come, we first simulate (in Stata) a large dataset which could arise in a randomized trial: Ideally the assignment to treatment groups would be randomized, as in a randomized controlled trial. The ideal situation – randomized treatment assignment With observational data, where the exposure or treatment is not randomly allocated, estimating the risk ratio for the effect of the treatment is somewhat trickier. In randomized studies it is of course easy to estimate the risk ratio comparing the two treatment (intervention) groups. However, most people find risk ratios easier to interpret than odds ratios. Estimates from logistic regression are odds ratios, which measure how each predictor is estimated to increase the odds of a positive outcome, holding the other predictors constant. The logit link used in logistic regression is the so called canonical link function for the binomial distribution. When analysing binary outcomes, logistic regression is the analyst’s default approach for regression modelling.