Reviewer #2 (Public Review):
While the question of 'are AlphaFold-predicted structures useful for drug design' has largely seen comparisons of AF versus experimental protein structures, this paper takes a less explored (but perhaps more practically important) angle of 'are AlphaFold-predicted structures any better than the previous generation of homology modeling tools' to the protein-ligand (rigid) docking problem. The conclusions of this work will be of largest interest to the audience less familiar with the precision required for successful rigid docking, while the expert crowd might find them obvious, yet a good summary of results previously shown in the literature. Further work, understanding the structural objectives/metrics that should be placed on future AlphaFold-like models for better pose prediction performance, would greatly expand the practicality of the observations made here.
The main conclusion of the paper, that structural accuracy (expressed as RMSD) of the protein model is not a good predictor of the accuracy the model will show in rigid docking protein-ligand pose prediction, is a good reminder of the well-appreciated need for high-quality side chain placements in docking. The expected phenomenon of AlphaFold predicting 'more apo-like structures' is often discussed in the field, and readers should be cautious about drawing conclusions from the rigid (rather than flexible, as in some previous works) docking done here.
The authors have very clearly communicated that the use of AlphaFold-generated structures in traditional docking might not be a good idea, and motivated that the time of a molecular designer might be better spent preparing a high-quality homology model. The visual presentation of the conclusions is very clear but might leave the reader wanting a more in-depth discussion of which structural elements of the AF models lead to bad docking outcomes. For example, Fig. 3 presents an example of a very accurate AlphaFold prediction leading to the ligand being docked completely outside of the binding pocket. Close inspection of the Figure suggests a clash of the ligand with the slightly displaced tryptophan residue in the AF model that might be to blame, as can be confirmed by comparison of the model and PDB structure by the reader themselves but has not been discussed by the authors. Only a few examples of the systems used are shown even visually, leaving the reader unable to study more interesting cases in depth without re-doing the work themselves.
The authors acknowledged that several recent studies exist in this space. They point out two advancements made in their work, worthy of further review. Similarly, it's important to evaluate the novelty of this work's claims vs previously available results, and the diversity of information made available to the reader.
"First, we use structural models generated without any use of known structures of the target protein. For machine learning methods, this requires ensuring that no structure of the target protein was used to train the method." This is done by limiting the scope of the work to GPCRs whose structures became available only after the training date of AlphaFold (April 30, 2018), as well as not using templates available after that date during prediction. The use of a time limit seems less preferable than the approach taken in Ref. 1 of discarding templates above a sequence identity cutoff. On the other hand, the 'ablation test' performed in Ref. 2 showed no loss in accuracy when no templates were used at all. Authors should discuss in more detail whether these modeling choices could change anything in their conclusions and why they made their choices compared to those in previous work.
"Second, we perform a systematic comparison that takes into account the variation between experimentally determined structures of the same protein when bound to different ligands." Cross-docking is indeed a more appropriate comparison than self-docking (as done in previous works), and the observation that the accuracy of AF models is similar to that between different holo structures of the same protein is interesting. Previous literature on cross-docking should however be discussed, and the well-known conclusions from it that small variations in side-chain positions, in otherwise highly similar structures, can lead to large changes in docked poses. It is important to realize that AlphaFold models are 'just another structure' - if previous literature is sufficient to show the sensitivity of rigid docking, doing it again on AF structures does not add to our understanding. Further, Ref. 3 might have already addressed the question of correlation between binding site RMSD and docking pose prediction accuracy - see e.g. Supplementary Figure 3 there (also Figure S15 in Ref. 2).
Further, the authors should discuss the commonly brought up problem of AlphaFold generating 'more apo-like structures' - are the models used here actually 'holo-like' because of the low RMSDs? (what RMSD differences are to be expected between apo and holo structures of these systems?) How are the volumes of the pockets affected? The position on this problem taken by previous works is worth mentioning - "much higher rmsd values are found when using the AF2 models (...), which reflect the difficulties in performing docking into apo-like structures" in Ref. 1 and "computational model structures were predicted without consideration of binding ligands and resulted in apo structures" in Ref. 2.
Because of this 'apo problem', Ref. 2 assumed that rigid docking (as done here) would not succeed and used flexible docking where "two sidechains at the binding site were set to be flexible". In fact, the reader of this new paper will be left to wonder if it is not simply presenting a subset of the results already seen in Ref. 2, where "the success ratios dropped significantly for them because misoriented sidechains prevented a ligand from docking (Figure S14)". While this conclusion is not made as clear in Ref. 2 as it is here, a comparison of Figures 4 and S14 there will lead the reader to the same conclusion, and more -- that flexible docking meaningfully improves the performance of AF models, and more so than homology models.
Finally, certain data analyses present in previous works but not here should be necessary to make this work more informative to the readers:<br />
a) Consideration of multiple top poses, e.g., in Ref. 2, Figures 4 and S14 mentioned before, comparison of success rates in top 1 and top 3 docked poses add much context.<br />
b) Notes on the structural features preventing successful docking, see e.g., in Ref. 1, Table 2 or in Ref. 4, Tables 2 and 4.
This work has the potential to become an important piece of the puzzle, if deeper insights into the reasons for AF model failures are drawn by the authors. These could include a discussion of the problematic structural elements (clashes of side chain with ligands, missing interactions/waters, etc.), potential solutions with some preliminary data (flexible docking, softening interactions, etc.), or proposals for metrics better than RMSD to score the soundness of pockets generated by AF for docking.
References:<br />
1. Díaz-Rovira, A. M., Martín, H., Beuming, T., Díaz, L., Guallar, V., & Ray, S. S. (2023). Are Deep Learning Structural Models Sufficiently Accurate for Virtual Screening? Application of Docking Algorithms to AlphaFold2 Predicted Structures. Journal of Chemical Information and Modeling, 63(6), 1668-1674. https://doi.org/10.1021/acs.jcim.2c01270<br />
2. Heo, L., & Feig, M. (2022). Multi-state modeling of G-protein coupled receptors at experimental accuracy. Proteins: Structure, Function, and Bioinformatics, 90(11), 1873-1885. https://doi.org/10.1002/prot.26382<br />
3. Beuming, T., & Sherman, W. (2012). Current assessment of docking into GPCR crystal structures and homology models: Successes, challenges, and guidelines. Journal of Chemical Information and Modeling, 52(12), 3263-3277. https://doi.org/10.1021/ci300411b<br />
4. Scardino, V., Di Filippo, J. I., & Cavasotto, C. (2022). How good are AlphaFold models for docking-based virtual screening? [Preprint]. Chemistry. https://doi.org/10.26434/chemrxiv-2022-sgj8c