Reviewer #2 (Public Review):
Summary:
This work presents a new machine-learning method, RaSP, to predict changes in protein stability due to point mutations, measured by the change in folding free energy ΔΔG.
The model consists of two coupled neural networks, a 3D self-supervised convolutional neural network that produces a reduced-dimensionality representation of the structural environment of a given residue, and a downstream supervised fully-connected neural network that, using the former network's structural representation as input, predicts the ΔΔG of any given amino-acid mutation. The first network is trained on a large dataset of protein structures, and the second network is trained using a dataset of the ΔΔG values of all mutants of 35 proteins, predicted by the biophysics-based method Rosetta.
The paper shows that RaSP gives good approximations of Rosetta ΔΔG predictions while being several orders of magnitude faster. As compared to experimental data, judging by a comparison made for a few proteins, RaSP and Rosetta predictions perform similarly. In addition, it is shown that both RaSP and Rosetta are robust to variations of input structure, so good predictions are obtained using either structures predicted by homology or structures predicted using AlphaFold2.
Finally, the usefulness of a rapid approach such as RaSP is clearly demonstrated by applying it to calculate ΔΔG values for all mutations of a large dataset of human proteins, for which this method is shown to reproduce previous findings of the overall ΔΔG distribution and the relationship between ΔΔG and the pathological consequences of mutations. The RaSP tool and the dataset of mutations of human proteins are shared.
Strengths:
The single main strength of this work is that the model developed, RaSP, is much faster than Rosetta (5 to 6 dex), and still produces ΔΔG predictions of comparable accuracy (as compared with Rosetta, and with the experiment). The usefulness of such a rapid approach is convincingly demonstrated by its application to predicting the ΔΔG of all single-point mutations of a large dataset of human proteins, for which using this new method they reproduce previous findings on the relationship between stability and disease. Such a large-scale calculation would be prohibitive with Rosetta. Importantly, other researchers will be able to take advantage of the method because the code and data are shared, and a google colab site where RaSP can be easily run has been set up. An additional bonus is that the dataset of human proteins and their RaSP ΔΔG predictions, annotated as beneficial/pathological (according to the ClinVar database) and/or by their allele frequency (from the gnomAD database) are also made available, which may be very useful for further studies.
Weaknesses:
The paper presents a solid case in support of the speed, accuracy, and usefulness of RaSP. However, it does suffer from a few weaknesses.
The main weakness is, in my opinion, that it is not clear where RaSP is positioned in the accuracy-vs-speed landscape of current ΔΔG-prediction methods. The paper does show that RaSP is much faster than Rosetta, and provides evidence that supports that its accuracy is comparable with that of Rosetta, but RaSP is not compared to any other method. For instance, FoldX has been used in large-scale studies of similar size to the one used here to exemplify RaSP. How does RaSP compare with FoldX? Is it more accurate? Is it faster? Also, as the paper mentions in the introduction, several ML methods have been developed recently; how does RaSP compare with them regarding accuracy and CPU time? How RaSP fares in comparison with other fast approaches such as FoldX and/or ML methods will strongly affect the potential usefulness and impact of the present work.
Second, this work being about presenting a new model, a notable weakness is that the model is not sufficiently described. I had to read a previous paper of 2017 on which this work builds to understand the self-supervised CNN used to model the structure, and even so, I still don't know which of 3 different 3D grids used in that original paper is used in the present work.
A third weakness is, I think, that a stronger case needs to be made for fitting RaSP to Rosetta ΔΔG predictions rather than experimental ΔΔGs. The justification put forward by the authors is that the dataset of Rosetta predictions is large and unbiased while the dataset of experimental data is smaller and biased, which may result in overfitting. While I understand that this may be a problem and that, in general, it is better to have a large unbiased dataset in place of a small biassed one, it is not so obvious to me from reading the paper how much of a problem this is, and whether trying to fix it by fitting the model to the predictions of another model rather than to empirical data does not introduce other issues.
Finally, the method is claimed to be "accurate", but it is not clear to me what this means. Accuracy is quantified by the correlation coefficient between Rosetta and RaSP predictions, R = 0.82, and by the Mean Absolute Error, MAE = 0.73 kcal/mol. Also, both RaSP and Rosetta have R ~ 0.7 with experiment for the few cases where they were tested on experimental data. This seems to be a rather modest accuracy; I wouldn't claim that a method that produces this sort of fit is "accurate". I suppose the case is that this may be as accurate as one can hope it to be, given the limitations of current experimental data, Rosetta, RaSP, and other current methods, but if this is the case, it is not clearly discussed in the paper.