- Jul 2018
-
europepmc.org europepmc.org
-
On 2016 Sep 05, David Evans commented:
Mendelian randomization (MR) is an epidemiological method that can be used to strengthen causal inference regarding the relationship between a modifiable environmental exposure and a medically relevant trait and to estimate the magnitude of this relationship [1]. Whilst the principles on which MR are based are relatively easy to comprehend, many scientists report finding it difficult to understand the method at first. Reviews of MR that present the methodology and concrete examples in the context of particular biomedical issues may be of considerable value in introducing the ideas to practitioners and researchers in different fields, and several have appeared [2,3].
In a recent issue of Nature Reviews Rheumatology, Robinson et al. review some of the theory behind MR as well as application of the technique to the field of rheumatology [4]. Whilst a useful introduction for a rheumatology audience, Robinson et al.’s article contains some errors and inaccuracies in the description of the MR method that might mislead researchers who attempt to apply the approach based on this review. There are some infelicities in the description of some biological processes (e.g. the authors’ contend in their abstract that “alleles for genetic variants are randomly inherited at meiosis”- this is false- alleles are inherited at conception, not during meiosis), but here we have restricted ourselves to pointing out some of the issues that are directly relevant to the theory and practice of MR:
(1) The authors seem to be confused in the choice of independent and dependent variables in the two stage least squares instrumental variables regression analyses. The correct way of describing the first stage of this procedure is that the exposure is regressed on the instrumental variable (not vice versa as the authors sometimes do in their manuscript). In addition, the second panel of Figure 2 in their paper illustrates fitted values from the first stage analysis regressed on the outcome variable. This is not correct. The second stage of the two stage least squares procedure is equivalent to regressing the outcome variable on the fitted values from the first stage regression (not vice versa). The authors also do not comment on the requirement of correcting the standard errors of the parameter estimates should the analysis be performed in this two-step fashion (although most statistics packages that implement two stage least squares regression will do this automatically for the user). We also point out that technically the authors describe a variant of a two-stage residual inclusion estimator rather than two-stage least squares [5].
(2) The visual description of residuals in the first panel of Figure 2 is incorrect. As in ordinary least squares regression, residuals refer to the part of the dependent variable (here urate) that is not predicted by the regression. The residuals should therefore be represented by vertical double headed arrows between the individual data points and the regression line, not by horizontal double headed arrows between the data points and the regression line.
(3) The authors claim that “The genotypic measure of exposure is simple to obtain and being objective is not subject to experimental biases (such as recall bias)”. Whilst it is true that genotypes are not subject to many of the biases common in classical epidemiology, they are still subject to possible measurement error (i.e. genotyping error, imputation uncertainty and population stratification), all of which must be borne in mind to ensure that the results of any MR analysis are robust.
(4) The authors mistakenly claim that it is straight forward to demonstrate that a genetic variant is not related to possible confounders of the exposure outcome association. We disagree. Whilst it is usually elementary to show that a genetic variant is associated with an exposure of interest (hence satisfying one of the core assumptions for a valid genetic instrument), demonstrating that a genetic variant is not associated with factors that confound the association between exposure and outcome is impossible [6]. The best an investigator can hope to do is to show that the putative genetic instrument is unrelated to a range of potential confounding variables [7]. If no association is found (or fewer associations than are expected by chance), then this will increase confidence that the genetic variant fulfils this core assumption, but an investigator can never prove this assumption outright since there may still be residual/unmeasured confounders/confounding that are associated with the genetic variants but have not been tested explicitly.
(5) The authors misunderstand the nature of two sample MR analysis and the Wald statistic used in this procedure. The authors claim that the Wald method does not provide an estimate of the causal effect of the exposure on the outcome. This is false. The Wald method provides estimates of the causal effect of the exposure on the outcome and their standard errors [8]. The authors also claim that MR analysis requires measurement of the biological exposure. Again this false. Investigators can use two sample MR on summary results data obviating the requirement of measuring an exposure variable in their analyses, and indeed this is one of the benefits of this type of analysis [9].
(6) The authors misinterpret results obtained from the Durbin Wu Hausman statistic (a statistical test typically used to compare observational and instrumental variable estimates of the association between the exposure and the outcome). They incorrectly state that, in the presence of reverse causality, estimates from an MR analysis will be in the direction opposite to the observational association. This is not the case. In reality, reverse causality would result in a causal estimate of zero rather than an estimate in the opposite direction to the observational association (assuming that the genetic variant that instruments the exposure is a valid instrument). Typically a significant Durbin Wu Hausman statistic indicates a difference between observational and causal estimates of the exposure-outcome association and can be a result of the presence of latent confounding in the observational analysis or indeed reverse causality. The statistic makes the strong assumption that the model for the instrumental variable analysis is valid, and also often has low power.
Finally, we note that the authors have failed to mention several recent extensions of MR methodology that allow relaxation of some aspects of the IV assumptions, providing forms of sensitivity analysis to conventional approaches [10,11]. These will become progressively more useful as the number of genetic variants known to be related to various medically relevant exposures increases. Extensive discussion of the MR methodology as well as some recent developments in sensitivity analyses are available elsewhere [9,12,13].
David M Evans, Tom Palmer, George Davey Smith
References
[1] Davey Smith et al (2003). Int J Epidemiol, 32(1):1-22.
[2] Jansen et al (2014). Eur Heart J, 35(29), 1917-24.
[3] Sekula et al (in press). J Am Soc Nephrol.
[4] Robinson et al (2016). Nat Rev Rheumatol, 12(8), 486-96.
[5] Terza et al (2008). J Health Econ, 27, 531-543.
[6] Didelez et al (2007). Stat Methods Med Res, 16:309-30.
[7] Davey Smith et al (2007). PLOS Med, 4(12), e352.
[8] Pierce et al (2013). Am J Epidemiol, 178:1177–84.
[9] Davey Smith et al (2014). Hum Mol Genet, 23(R1):R89-98.
[10] Bowden et al (2015). Int J Epidemiol, 44(2):512-25.
[11] Bowden et al (2016). Genet Epidemiol, 40(4):304-14.
[12] Evans et al (2015). Ann Rev Genom Hum Genet, 16:327-50.
[13] Burgess et al (in press). Epidemiology.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2016 Sep 05, David Evans commented:
Mendelian randomization (MR) is an epidemiological method that can be used to strengthen causal inference regarding the relationship between a modifiable environmental exposure and a medically relevant trait and to estimate the magnitude of this relationship [1]. Whilst the principles on which MR are based are relatively easy to comprehend, many scientists report finding it difficult to understand the method at first. Reviews of MR that present the methodology and concrete examples in the context of particular biomedical issues may be of considerable value in introducing the ideas to practitioners and researchers in different fields, and several have appeared [2,3].
In a recent issue of Nature Reviews Rheumatology, Robinson et al. review some of the theory behind MR as well as application of the technique to the field of rheumatology [4]. Whilst a useful introduction for a rheumatology audience, Robinson et al.’s article contains some errors and inaccuracies in the description of the MR method that might mislead researchers who attempt to apply the approach based on this review. There are some infelicities in the description of some biological processes (e.g. the authors’ contend in their abstract that “alleles for genetic variants are randomly inherited at meiosis”- this is false- alleles are inherited at conception, not during meiosis), but here we have restricted ourselves to pointing out some of the issues that are directly relevant to the theory and practice of MR:
(1) The authors seem to be confused in the choice of independent and dependent variables in the two stage least squares instrumental variables regression analyses. The correct way of describing the first stage of this procedure is that the exposure is regressed on the instrumental variable (not vice versa as the authors sometimes do in their manuscript). In addition, the second panel of Figure 2 in their paper illustrates fitted values from the first stage analysis regressed on the outcome variable. This is not correct. The second stage of the two stage least squares procedure is equivalent to regressing the outcome variable on the fitted values from the first stage regression (not vice versa). The authors also do not comment on the requirement of correcting the standard errors of the parameter estimates should the analysis be performed in this two-step fashion (although most statistics packages that implement two stage least squares regression will do this automatically for the user). We also point out that technically the authors describe a variant of a two-stage residual inclusion estimator rather than two-stage least squares [5].
(2) The visual description of residuals in the first panel of Figure 2 is incorrect. As in ordinary least squares regression, residuals refer to the part of the dependent variable (here urate) that is not predicted by the regression. The residuals should therefore be represented by vertical double headed arrows between the individual data points and the regression line, not by horizontal double headed arrows between the data points and the regression line.
(3) The authors claim that “The genotypic measure of exposure is simple to obtain and being objective is not subject to experimental biases (such as recall bias)”. Whilst it is true that genotypes are not subject to many of the biases common in classical epidemiology, they are still subject to possible measurement error (i.e. genotyping error, imputation uncertainty and population stratification), all of which must be borne in mind to ensure that the results of any MR analysis are robust.
(4) The authors mistakenly claim that it is straight forward to demonstrate that a genetic variant is not related to possible confounders of the exposure outcome association. We disagree. Whilst it is usually elementary to show that a genetic variant is associated with an exposure of interest (hence satisfying one of the core assumptions for a valid genetic instrument), demonstrating that a genetic variant is not associated with factors that confound the association between exposure and outcome is impossible [6]. The best an investigator can hope to do is to show that the putative genetic instrument is unrelated to a range of potential confounding variables [7]. If no association is found (or fewer associations than are expected by chance), then this will increase confidence that the genetic variant fulfils this core assumption, but an investigator can never prove this assumption outright since there may still be residual/unmeasured confounders/confounding that are associated with the genetic variants but have not been tested explicitly.
(5) The authors misunderstand the nature of two sample MR analysis and the Wald statistic used in this procedure. The authors claim that the Wald method does not provide an estimate of the causal effect of the exposure on the outcome. This is false. The Wald method provides estimates of the causal effect of the exposure on the outcome and their standard errors [8]. The authors also claim that MR analysis requires measurement of the biological exposure. Again this false. Investigators can use two sample MR on summary results data obviating the requirement of measuring an exposure variable in their analyses, and indeed this is one of the benefits of this type of analysis [9].
(6) The authors misinterpret results obtained from the Durbin Wu Hausman statistic (a statistical test typically used to compare observational and instrumental variable estimates of the association between the exposure and the outcome). They incorrectly state that, in the presence of reverse causality, estimates from an MR analysis will be in the direction opposite to the observational association. This is not the case. In reality, reverse causality would result in a causal estimate of zero rather than an estimate in the opposite direction to the observational association (assuming that the genetic variant that instruments the exposure is a valid instrument). Typically a significant Durbin Wu Hausman statistic indicates a difference between observational and causal estimates of the exposure-outcome association and can be a result of the presence of latent confounding in the observational analysis or indeed reverse causality. The statistic makes the strong assumption that the model for the instrumental variable analysis is valid, and also often has low power.
Finally, we note that the authors have failed to mention several recent extensions of MR methodology that allow relaxation of some aspects of the IV assumptions, providing forms of sensitivity analysis to conventional approaches [10,11]. These will become progressively more useful as the number of genetic variants known to be related to various medically relevant exposures increases. Extensive discussion of the MR methodology as well as some recent developments in sensitivity analyses are available elsewhere [9,12,13].
David M Evans, Tom Palmer, George Davey Smith
References
[1] Davey Smith et al (2003). Int J Epidemiol, 32(1):1-22.
[2] Jansen et al (2014). Eur Heart J, 35(29), 1917-24.
[3] Sekula et al (in press). J Am Soc Nephrol.
[4] Robinson et al (2016). Nat Rev Rheumatol, 12(8), 486-96.
[5] Terza et al (2008). J Health Econ, 27, 531-543.
[6] Didelez et al (2007). Stat Methods Med Res, 16:309-30.
[7] Davey Smith et al (2007). PLOS Med, 4(12), e352.
[8] Pierce et al (2013). Am J Epidemiol, 178:1177–84.
[9] Davey Smith et al (2014). Hum Mol Genet, 23(R1):R89-98.
[10] Bowden et al (2015). Int J Epidemiol, 44(2):512-25.
[11] Bowden et al (2016). Genet Epidemiol, 40(4):304-14.
[12] Evans et al (2015). Ann Rev Genom Hum Genet, 16:327-50.
[13] Burgess et al (in press). Epidemiology.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-