Reviewer #3 (Public Review):
To motivate the proposal, Karageorgiou et al. first identify a problem in applying current multivariable MR (MVMR) methods with many correlated exposures. I believe this problem can really be broken into two pieces. The first is that MVMR suffers from weak instrument bias. The second is that some traits may have nearly co-linear genetic associations, making it hard to disentangle which trait is causal. These problems connect in that inclusion of co-linear traits amplifies the problem of weak instrument bias - traits that are nearly co-linear with another trait in the study will have no or very few conditionally strong instruments.<br /> The authors then propose a solution: Apply a dimension reduction technique (PCA or sparse PCA) to the matrix of GWAS effect estimates for the exposures. The identified new components can then be used in MVMR in place of the directly measured exposures.
I think that the identified problem is timely and important. I also like the idea of applying dimension reduction techniques to GWAS effect estimates. However, I don't think that the manuscript in its current form achieves the goals that it has set out. Specifically, I will outline the weaknesses of the work in three categories:<br /> 1. The causal effects measured using this method are poorly defined.<br /> 2. The description of the method lacks important details.<br /> 3. Applied and simulation results are unconvincing.<br /> I will describe each of these in more detail below.
1. To me, the largest weakness of this paper is that it is not clear how to interpret the putatively causal effects being measured. The authors describe the method as measuring "the causal effect of the PC on outcome" but it is not obvious what this means.
One possible implication of this statement is that the PC is a real biological variable (say some hidden regulator) that can be directly intervened on. If this is the intention it should be discussed. However, this situation would imply that there is one correct factorization and there is no guarantee that PCs (or sparse PCs) come close to capturing that.
The counterfactual implied by estimating the effects of PCs in MVMR is that it is possible to intervene on and alter one PC while holding all other PCs constant.<br /> In the introduction, the authors note (and I agree) that one weakness of MR applied to correlated traits is that "MVMR models investigate causal effects for each individual exposure, under the assumption that it is possible to intervene and change each one whilst holding the others fixed." However, it is not obvious that altering one PC while holding the others constant is more reasonable.
2. This section combines a few items that I found unclear in the methods section. The most critical one is the lack of specification on how to select instruments.<br /> For the lipids application, the authors state that instruments were selected from the GLGC results, however, these only include instruments for LDL, HDL, and TG, so 1) it would not be possible to include variants that were independently instruments for one of the component traits alone and 2) there would be no instruments for the amino acids. There is no discussion of how instruments should be selected in general.<br /> This choice could also have a dramatic impact on the PCs estimated. The first PC is optimized to explain the largest amount of variance o of the input data which, in this case, is GWAS effect estimates. This means that the number of instruments for each trait included will drive the resulting PCs. It also means that differences in scaling across traits could influence the resulting PCs.
The other detail that is either missing or which I missed is what is used as the variant-PC association in the MVMR analysis. Specifically, is it the PC loadings or is it a different value? Based on the computation of the F-statistic I suspect the former but it is not clear. If this is the case, what is the effect of using loadings that have been shrunk via one of the sparse methods? It would be nice to see a demonstration of the bias and variance of the resulting method, though it is not clear to me what the "truth" would be.
3. In the lipids application, the fact that M.LDL.PL changes sign in MVMR analysis are offered as evidence of multicollinearity. I would generally associate multicollinearity with large variance and not bias. Perhaps the authors could offer some more insight on how multicollinearity would cause the observation.<br /> A minor point of confusion: I was unable to interpret this pair of sentences "Although the method did not identify any of the exposures as significant at Bonferroni-adjusted significance level, the estimate for M.LDL.PL is still negative but closer to zero and not statistically significant. The only trait that retains statistical significance is ApoB." The first sentence says that none of the exposures were significant while the second sentence says that Apo B is significant. The GRAPPLE results don't seem clearly bad, indeed if only Apo B is significant, wouldn't we conclude that of the 118 exposures, only Apo B is causal for heart disease? It would help to discuss more how the conclusions from the PC-based MVMR analysis compare to the conclusions from GRAPPLE.
It is a bit hard to interpret Table 4. I wasn't able to fully determine what "VLD, LDL significance in MR" means here. From the text, it seems that it means that any PC with a non-zero lodaing on VLDL or LDL traits was significant, however, this seems like a trivial criterion for the PCA method, since all PCs will be dense. This would mean this indicator only tells us whether and PCs were found to "cause" heart disease.
In simulations, I may be missing something about the definition of a true and false positive here. I think this is similar to my confusion in the previous paragraph. Wouldn't the true and false positive rates as computed using these metrics depend strongly on the sparsity of the components? It is not clear to me what ideal behavior would be here. However, it seems from the description that if the truth was as in Fig 7 and two methods each yielded one dense component that was found to be causal for Y, these two methods would get the same "score" for true positive and false positive rate regardless of the distribution of factor loadings. One method could produce a factor that loaded equally on all exposures while the other produced a factor that loaded mostly on X1 and X2 but this difference would not be captured in the results.