Discussion, revision and decision
Discussion and Revision
Author response
We would like to thank the reviewers for their valuable comments. Below we provide pointwise response and the changes made in the revised manuscript.
To Dr. Jyotsnamayee Sabat
Pt-13: I want to know how the representative sequences were selected for different states. Is it based on no. of sequences submitted or positivity rate of a particular region?
All the Indian isolates available in GISAID for the period 27th Jan – 27th May 2020 were download and considered for analysis. NO state-wise selection was done.
To Dr Parvin Abraham
Pt-12: The dataset is only from 27th Jan – 27th May 2020. Maybe they can include more Numbers.
The period of data collection was restricted to 27th Jan – 27th May 2020 to basically understand the variations observed across different states of the country during the early phase of pandemic. Also, we are interested in assessing the impact of lockdown in containing the spread of COVID19 and state-specific subclusters, if any.
To Hurng-Yi Wang:
Pt-13: Agarwal and Parekh analyzed 685 SARS-CoV-2 isolates collected during 27th Jan - 27th May 2020 from India and described the distribution of virus strains and mutations across the country. While the information might be valuable to some local readers, the results are mainly descriptive and the data are a bit out of date. In addition, I have the following comments.
The period of data collection is restricted to 27th Jan – 27th May 2020 to basically understand the variations observed across different states of the country during the early phase of pandemic. Also, we are interested in assessing the impact of lockdown in containing the spread of COVID19 and state-specific subclusters, if any.
Some details of the methods are lacking. For example, the MUpro provides two methods, it is necessary to specify which method was used in the analysis. The confidence score of each prediction should also be provided. Besides, some results from I-Mutant and MUpro were conflicting, the authors may want to discuss the discrepancy.
In the revised manuscript we give the sign of DDG predicted using the tools I-Mutant2.0 and the MUpro along with the respective confidence scores. In I-Mutant2.0, the sign of protein stability change predicted and reliability index (which provides confidence to the prediction) are now incorporated in Table-1. Similarly, the sign change and confidence scores given by MUpro on using SVM and NN based models have been incorporated. We expect all the models to give same results, except in cases where the predictions may be hard to make. This has now been explicitly mentioned in the Materials and Methods section: “In I-Mutant2.0, the sign of DDG is based on SVM classifier, and the associated confidence score is given by the reliability index. On the other hand, MUpro provides sign change prediction using two models, one SVM-based and the other using Neural Networks. InTable-1, the predicted sign of DDG by I-Mutant2.0 and MuPRO along with the respective confidence scores is reported.”
The “Analysis of the Mutational Profile of Indian Isolates” should be moved to Materials and Methods.
There indeed was some redundancy in the information available in the Materials and Methods section and in the section “Analysis of the Mutational Profile of Indian Isolates”. We have now edited the Materials and Methods section appropriately and deleted the para under the above- mentioned section.
The authors provided lengthy discussion about the effect of each mutation in some lineages, such as 20A and I/A3i. However, as these mutations are tightly linked, the effect of each individual mutation is difficult to access. It is possible that some of the mutations are just hitchhikers. They may want to address this alternative point.
For 20A we define the haplotype comprising four co-occurring mutations D614G, C241T, C3037T, and C14408T. Similarly, six co-occurring mutations C6312A, C13730T, C23929T, C28311T, C6310A (S2015R) and C19524T are shown to be associated with subclade I/A3i. Together as a set, these are useful in identifying clusters or group of isolates with similar mutational profile. However, those that are non-synonymous mutations are likely to have some individual impact on the overall stability of the respective protein. And so, we have presented both these results. To address this point, we have added a sentence at the end of Materials and Methods section and is reproduced below: “While we report individual effects of mutations on protein stability, some of the mutations in a haplotype may not be under natural selection and are just hitchhiking mutations.”
Several figures are confusing and lack detail. The diversity plots of Figure 3 and Figure 8 are hard to be precisely compared to the mutations that occurred among different plots. Phylogenetic trees, as well as their figure legends, are confusing, especially Figure 9 and Figure 10. For Figure 9, it is impossible to tell which mutation site had changed from C to T. For Figure 10, spots depicted in yellow are both position 29827 A>T and position 29830 G>T, green spot only notes as G, but A29827 is not mentioned in the figure. Furthermore, the mutation position of blue spot C cannot be found.
We have now redrawn the diversity plots in Figure 3 and Figure 8, (labelled Figure 2 and Figure 4, respectively, in the revised manuscript) and are shown below. We have introduced horizontal lines to show the height of the divergence line at variant positions discussed in the manuscript, and these are also marked with the same colour in corresponding subplots for comparison.
In the revised manuscript, Figures 9 and 10 are now Supplementary Figures 2c and 2d respectively. The new figure legends are: Supplementary Figure 2: The sequences carrying the mutations a) C5700A b) C23929T c) C18877T d) G29830T are depicted in yellow colour. Figure 10 (now Supplementary Figure 2(d)) is now re-plotted, and we have removed the blue dot corresponding to ‘C’ since no samples from India had this variation.
Figure 9 and Figure 10 were not mentioned inside the text.
It has now been added in the manuscript: Supplementary Figure 2(c) – On Pg-9, in the first line under the heading “Identification of novel subclade I/GJ-20A and unique mutations in Maharashtra”. Supplementary Figure 2(d) – On Pg-11, in the last paragraph under the heading “Identification of novel subclade I/GJ-20A and unique mutations in Maharashtra”.
The Top 10 mutations in PCA analysis are the mutations in 20A and I/A3i. It is reasonable to observed a clear association of the clusters with the clades. It is not clear, however, how these distribution correlate with lockdown, contact tracing and quarantine measures.
From Supplementary Figure 1 clade 20A (shown in ‘Green’) is predominantly observed in Gujarat (178/201) and the distribution of clade 19A (shown in ‘Blue’) is high in Telangana (75/97), followed by Delhi (55/76), Maharashtra (31/80), and Tamil Nadu (19/34). Four mutations, C6312A, C13730T, C23929T, and C28311T are reported to be associated with subclade I/A3i, which is India-specific subclade of 19A. These co-occurring mutations are found in ~32% of Indian samples sequenced (till 31st May 2020). Only 5 isolates of this subclade were observed after May in India with the last one dated 13th June 2020 (according to data available in Nextstrain). This indicates that the spread of subclade I/A3i had been largely contained during lockdown with efforts of contact tracing and quarantining the infected individuals. Also, Telangana and Delhi isolates cluster together due to shared I/A3i mutations, primarily due to the Tablighi Jamaat congregation that occurred just before lockdown was announced. Similarly, clade 20A defining mutations were observed to occur in ~ 90% of Gujarat samples. Due to the countrywide lockdown from 25th March 2020, this clade and its sub-clusters were localized in the state, defined by Gujarat-specific mutations, e.g., I/GJ-20A.
Hurng-Yi Wang:
I agree to change to Verified manuscript.
Decision
Verified manuscript
Dr. Abraham: Verified manuscript
Dr. Sabat: Verified manuscript
Dr. Wang: Verified manuscript