Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Reviewer #1 (Evidence, reproducibility and clarity (Required)):
Marta Sanvicente-García et al and colleague developed a comprehensive and versatile genome editing web application tool and a nextflow pipeline to give support to gene editing experimental design and analysis.
The manuscript is well written and all data are clearly shown.
While I did not tested extensively, the software seems to work well and I have no reason to doubt the authors' claims.
I usually prefer ready to use web applications like outknocker, they are in general easier to use for rookies (it would be good if the author could cite it, since it is very well implemented) but the nextflow implementation is anyway well suited.
We have been able to analyze the testing dataset that they provide, but we have tried to run it with our dataset and we have not been able to obtain results. We have also tried to run it with the testing dataset of CRISPRnano and CRISPResso2 without obtaining results. The error message has been in all the cases: “No reads mapping to the reference sequence were found.”
Few minor points:
Regarding the methods to assess whether the genome editing is working or not, I would definitely include High Resolution Melt Analysis, which is by far the fastest and probably more sensitive amongst the others.
Following the Reviewer 1 suggestion, we have added this technique in the introduction: “Another genotyping method that has been successfully used to evaluate genome editing is high-resolution melting analysis (HRMA) [REFERENCE]. This is a simple and efficient real-time polymerase chain reaction-based technique.”
Another point that would important to taclke is that often these pipelines do nto define the system they are working with (eg diploid, aploid vs etc). This will change the number of reads needed ato unambigously call the genotypes detected and to perform the downstream analysis (the CRISPRnano authors mentioned this point).
In the introduction, it is already said: " it is capable of analyzing edited bulk cell populations as well as individual clones". In addition, following this suggestion we have added in the help page of CRISPR-A web application and in the documentation of the nextflow pipeline a recommended sample coverage to orient the users on that.
I am also wondering whether the name CRISPR-A is appropriate since someone could confuse it with CRISPRa.
CRISPR-A is an abbreviation for CRISPR-Analytics. Even if it is true that it can be pronounced in the same way that CRISPRa screening libraries, it is spelled differently and would be easily differentiated by context.
CROSS-CONSULTATION COMMENTS
Reviewer 2 made an excellent work and raised important concerns about the software they need to be addressed carefully.
In the meantime we had more time to test the software and can confirm some of the findings of Reviewer 1:
1) We spent hours running (unsuccessfully) CRISPR A on Nextflow. The software does not seem to run properly.
2) No manual or instruction can be found on both their repositories (https://bitbucket.org/synbiolab/crispr-a_nextflow/
https://bitbucket.org/synbiolab/crispr-a_figures/)
We have added a readme.md file to both repositories and we hope that with the new documentation the software can be downloaded and run easily. We have also added an example test in CRISPR-A nextflow pipeline to facilitate the testing of the software. Currently, the software is implemented in DLS1 instead of DLS2, making it impossible to be run with the latest version of nextflow. We are planning to make the update soon, but we want to do it while moving the pipeline to crisprseq nf-core pipeline to follow better standards and make it fully reproducible and reusable.
Few more points to be considered:
- UMI clustering is not proper terminology. Barcode multiplexing/demultiplexing (SQK-LSK109 from Oxford Nanopore).
We have added more details in the methods section “Library prep and Illumina sequencing with Unique Molecular Identifiers (UMIs)” to clarify the process and used terminology: “Uni-Molecular Identifiers are added through a 2 cycles PCR, called UMI tagging, to ensure that each identifier comes just from one molecule. Barcodes to demultiplex by sample are added later, after the UMI tagging, in the early and late PCR.”
We had already explained the computational pipeline through which these UMIs are clustered together to obtain a consensus of the amplified sequences in “CRISPR-A gene editing analysis pipeline” section in methods:
“An adapted version of extract_umis.py script from pipeline_umi_amplicon pipeline (distributed by ONT https://github.com/nanoporetech/ pipeline-umi-amplicon) is used to get UMI sequences from the reads, when the three PCRs experimental protocol is applied. Then vsearch⁴⁸ is used to cluster UMI sequences. UMIs are polished using minimap2³² and racon⁴⁹ and consensus sequences are obtained using minialign (https://github.com/ocxtal/minialign) and medaka (https://github.com/nanoporetech/medaka).”
We also have added the following in “CRISPR-A gene editing analysis pipeline” methods section to help to understand differences between the barcodes that can be used: “In case of working with pooled samples, the demultiplexing of the samples has to be done before running CRISPR-A analysis pipeline using the proper software in function of the sequencing used platform. The resulting FASTQ files are the main input of the pipeline.”
Then, SQK-LSK109 from Oxford Nanopore is followed through the steps specified in methods: “The Custom PCR UMI (with SQK-LSK109), version CPU_9107_v109_revA_09Oct2020 (Nanopore Protocol) was followed from UMI tagging step to the late PCR and clean-up step.”
Finally, we want to highlight that, as can be seen in methods as well as in discussion, UMIs are used to group sequences that have been amplified from the same genome and not to identify different samples: “Precision has been enhanced in CRISPR-A through three different approaches. [...] We also removed indels in noisy positions when the consensus of clusterized sequences by UMI are used after filtering by UBS.” As well as in results (Fig. 5C).
- Text in Figure 5 is hard to read.
We have increased the letter size of Figure 5.
- They should test the software based on the ground truth data
We have added a human classified dataset to do the final benchmarking. And we can see that for all examined samples CRISPR-A has an accuracy higher than 0.9. As has been shown in the figure with manual curated data, CRISPR-A shows good results in noisy samples using the empiric noise removal algorithm, without need of filtering by edition windows.
- The alignment algorithm is not the best one, I think minimap2 would be better for general purpose (at least it work better for ONT).
As can be seen in figure 2A, minimap is one of the alignment methods that gives better results for the aim of the pipeline. In addition, we have tuned the parameters (Figure 2B) for a better detection of CRISPR-based long deletions, which can be more difficult to report in a single open gap of the alignment.
- The minimum configuration for installation was not mentioned (for their Docker/next flow pipeline).
Proper documentation to indicate the configuration requirements for installation has been added to the readme.md of the repository·
- Fig 2: why do they use PC4/PC1?
Principal Component Analysis is used to reduce the number of dimensions in a dataset and help to understand the effect of the explainable variables, detect trends or samples that are labeled in incorrect groups, simplify data visualization… Even PC4 explains less variability than PC2 or PC3, this helps us to understand and better decipher the effect of the 4 different analyzed parameters even if the differences are not big. We have decided to include as a supplementary figure other PCs to show these.
- There are still typos and unclear statements thorughout the whole manuscript.
One more drawback is that the software seems to only support single FASTQ uploading (or we cannot see the option to add more FASTQ).
In the case of paired-end reads instead of single-end reads, in the web application, these can be selected at the beginning answering the question “How should we analyze your reads? Type of Analysis: Single-end Reads; Paired-end Reads”. In the case of the pipeline, now it is explained in the documentation how to mark if the data is paired-end or single-end. It has to be indicated in “input” and “r2file” configuration variables.
In the case of multiple samples, and for that reason multiple FASTQ files, there is the button to add more samples in the web application. In the pipeline, multiple samples can be analyzed in a single run by putting all together in a folder and indicating it with variable “input”.
Since usually people analyze more than one clone at the time (we usually analyze 96 clones together) this would mean that I have to upload manually each one of them.
All files can be added in the same folder and analyzed in a single run using the nextflow pipeline. Web application has a limit of ten samples that can be added clicking the button “Add more”.
Also, the software (the webserver, the docker does not work) works with Illumina data in our hands but not with ONT.
This should be clarified in the manuscript.
If a fastq is uploaded to CRISPR-A, the analysis can be done even if we haven't specifically optimized the tool for long reads sequencing platforms. We have checked the performance of CRISPR-A with CRISPRnano nanopore testing dataset and we have succeeded in the analysis. See results here: https://synbio.upf.edu/crispr-a/RUNS/tmp_1118819937/.
Summary of the results:
Sample
CRISPRnano
CRISPR-A
'rep_3_test_800'
42.60 % (-1del); 12.72 % (-10del)
71% (-1del);
16% (-10del)
– 36 (logo)
'rep_3_test_400'
37.50 % (-1del); 15.63 % (-10 del)
65% (-1del);
28% (-10del)
– 38 (logo)
'rep_1_test_200'
39.29 % (-1del); 8.33 % (-17del)
10del; 17del; 1del
'rep_1_test_400'
80.11 % (-17 del)
del17; del20; del18; del16;del 16
'rep_0_test_400'
80.11% (-17 del)
del17; del20; del 18; del16; del16
'rep_0_test_200'
71.91% (-17 del)
del17; del18
As we can see from these exemple, CRISPR-A reports all indels in general without classifying them as edits or noise. Since nanopore data has a high number of indels as sequencing errors the percentages of CRISPR-A are not accurate. Eventhat, CRISPR-A reports more diverse outcomes, which are probably edits, than CRISPRnano.
Therefore, we have added the following text in results:
“Even single-molecule sequencing (eg. PacBio, Nanopore..) can be analyzed by CRISPR-A, targeted sequencing by synthesis data is required for precise quantification.”
Reviewer #1 (Significance (Required)):
As I mentioned above I think this could be a useful software for those people that are screening genome editing cells. Since CRISPR is widely used i assume that the audience is broad.
There are many other software that perform similarly to CRISPR-A but it seems that this software adds few more things and seems to be more precise. It is hard to understand if everything the author claims is accurate since it requires a lot of testing and time and the reviewing time is of just two weeks. But 1) I have no reason to doubt the authors and 2) the software works
Broad audience (people using CRISPR)
Genetics, Genome Engineering, software development (we develop a very similar software), genetic compensation, stem cell biology
Reviewer #2 (Evidence, reproducibility and clarity (Required)):
Summary:
CRISPR-Analytics, abbreviated as CRISPR-A, is a web application implementing a tool for analyzing editing experiments. The tool can analyze various experiment types - single cleavage experiments, base editing, prime editing, and HDR. The required data for the analysis consists of NGS raw data or simulated data, in fastq, protospacer sequence and cut site. Amplicon sequence is also needed in cases where the amplified genome is absent from the genome reference list. The tool pipeline is implemented in NextFlow and has an interactive web application for visualizing the results of the analysis, including embedding the results into an IGV browser.
The authors developed a gene editing simulation mechanism that enables the user to assess an experiment design and to predict expected outcomes. Simulated data was generated by SimGE over primary T-cells. The parameters and distributions were also fitted for 3 cell lines to make it more generalized (Hek293, K562, and HCT116). The process simulated CRISPR-CAS9 activity and the resulting insertions, deletions, and substitutions. The simulation results are then compared to the experimental results. The authors report the Jensen-Shannon (JS) divergence between the results. The exact distributions that served as input to the JS are not well defined in the manuscript (see below).
To clarify the used distributions in the JS divergence calculation, we have changed the following piece of text in section “Simulations evaluation” of methods:
“ Afterward, we tested the performance on the fifth fold, generating the simulated sequences with the same target and gRNA as the samples that belong to the fifth fold, in order to calculate the distance between these. The final validation, with the mean parameters of the different training interactions, was performed on a testing data set that was not used in the training. Validation was done with samples that had never taken place in the training process. Jensen distance is used to compare the characterization of real samples and simulated samples since this is the explored distance that differentiates better replicates among samples. In order to obtain the different distributions, the T cell data, including 1.521 unique cut sites, was split into different datasets based on the different classes: deletions, insertions and substitutions. For each of these classes, giving as input the datasets with only that class, we obtained the distribution for size and then for position of indels. The same was done for the other three cell lines: K562, HEK293 and HCT116, which included 96 unique cut sites, with three replicates each. The whole datasets (with 1521 and 96 unique cut sites) were split into five-folds (4 for training and one for test) and validation, in order to train and validate the simulator. Using the parameters obtained during the training-test iterations (the average value of the 5 iterations), we generate simulated sequences with the same target and gRNA as the samples that are assigned to the test subset to calculate the Jensen-Shannon (JS) divergence between the simulated and real samples of that subset. Finally, the same was performed for validation. The input for the distance calculations were the generated simulated subset and its real equivalent (same target and gRNA) distributions of the classes. ”
The authors also report an investigation of different alignment approaches and how they may affect the resulting characterization of editing activity.
The authors examine three different approaches to increase what they call "edit quantification accuracy" (aka, in a different place - "precise allele counts determination" - what is this???): (1) spike-in controls (2) UMI's and (3) using mock to denoise the results. See below for our comments about these approaches.
Moreover, the authors developed an empirical model to reduce noise in the detection of editing activity. This is done by using mock (control), and by normalization and alignment of reads with indels, with the notion and observation that indels that are far from the cut site tend to classify as noise.
The authors then perform a comparison between 6 different tools, in the context of determining and quantifying editing activities. One important comparison approach uses manually curated data. However - the description of how this dataset was created is far from being sufficiently clear. The comparison is also performed for HDR experiment type, which can be compared only to 2 other tools.
We have changed alleles by editing outcomes in the title section “Three different approaches to increase precise editing outcomes counts determination” trying to be more clear.
There is already a section in methods “Manual curation of 30 edited samples” explaining how the manual curation was done.
We see the potential contribution aspects of the paper to be the following:
- NextFlow pipeline implementation is an important engineering contribution. Same is true for the interactive web application
- The option to simulate an experiment to assess it is a nice feature and can help experiment design
- Identification of amplicons when not provided as input
- CRISPR-A seeks substitutions along the entire amplicon sequence and is less dependent on the quantification window and on the putative cutsite
- Analysis of the difference, in edit activity, comparing different cell lines
- CRISPR-A supports the use of UMIs
- Interesting sequence pattern insights - like "...found certain patterns associated with low diversity outcomes: free thymine or adenine at the 3' nucleotide upstream of the cut site that leads to insertions of the same nucleotide, a free cytosine at the same place that leads to its loss, and strong micro-homology patterns that lead to long deletions " We further comment on the soundness of these contributions in our comments below and on their significance in our comments related to the general potential significance of the paper.
Major comments:
-
Upon attempting to run an analysis from the web interface (https://synbio.upf.edu/crispr-a) and using: fastq of Tx and mock (control), the human genome and the gRNA sequence provided as input for the protospacer field, our run was not successful. In fact the site crashed with no interpretable error message from CRISPR-A. We have improved the error handling together with the explanations in the help page, where you will find a video. Hopefully these improvements will avoid unexpected crashings.
-
Moreover, there should be more clear context. There is no information regarding the type of experiments that can be analyzed with the tool. We figure it is multiplex PCR and NGS but can the tool also be used for GUIDESeq, Capture, CircleSeq etc.? Experiments that could be analyzed are specified in Results: “CRISPR-A analyzes a great variety of experiments with minimal input. Single cleavage experiments, base editing (BE), prime editing (PE), predicted off-target sites or homology directed repair (HDR) can be analyzed without the need of specifying the experimental approach.” We have also specified this in the nextflow pipeline documentation as well as in the web application help page.
-
No off target analysis. Only on-target The accuracy of the tool allows checking if edits in predicted off-target sites are produced, this being an off-target analysis with some restrictions, since just variants of the predicted off-target sites are assessed. Translocations or other structural off-targets will not be detected by CRISPR-A since the input data analyzed by this tool are demultiplexed amplicon or targeted sequencing samples.
-
No translocations and long/complex deletions The source of used data as input does not allow us to do this. There are other tools like CRISPECTOR available for this kind of analysis. We have added this to supplementary table 1.
-
We view the use of a mock experiment as control as a must for any sound attempt to measure edit activity. This is even more so when off-target events need to be assessed (any rigorous application of GE, certainly any application aiming for clinical or crop engineering purposes). We therefore think that all investigation of other approaches should be put in this context. We agree with the necessity of using negative controls to assess editing. For that reason we have included the possibility of using mocks in the quantification. In addition, there are few tools that include this functionality.
-
It's a nice feature to have simulated data, however, it is not a good approach to rely on it. As can be seen in the manuscript we highlight the support that simulations can give without pretending to substitute experimental data by just simulated data. Simulated data has been useful in the development and benchmarking of CRISPR-A, but we are aware of the limitations of simulations. Here some examples from the manuscripts explaining how we have used or can be used simulated data:
“Analytical tools, and simulations are needed to help in the experimental design.”
“simulations to help in design or benchmarking”
“We developed CRISPR-A, a gene editing analyzer that can provide simulations to assess experimental design and outcomes prediction.”
“Gene editing simulations obtained with SimGE were used to develop the edits calling algorithm as well as for benchmarking CRISPR-A with other tools that have similar applications.”
Even simulated data has been useful for the development and benchmarking of CRISPR-A, we have also used real data and human validated data.
-
In p7 the authors indicate the implementation of three approaches to improve quantification. They should be clear as to the fact that many other tools and experimental protocols are also using these approaches. for example, ampliCan, CRipresso2 and CRISPECTOR all take into account a mock experiment run in parallel to the treatment. Even in page 7 (results) we don’t mention the other tools that also use mocks for noise correction, we detail this information in Supplementary Table 1. CRISPResso2 was not included since they can run mocks in parallel but only to compare results qualitatively, i.e. there is not noise reduction in their pipeline. It has been added to the table.
-
Figure1: ○ The figure certainly provides what seems to be a positive indication of the simulations approach being close to measured results. Much more details are needed, however, to fully understand the results.
We have added more details.
○ Squema = scheme ??
We have changed the word “schema” by diagram.
○ What was the clustering approach?
As is said in the caption of Figure 1 the clustering is hierarchical: “hierarchical clustering of real samples and their simulations from validation data set.” And we have added that “The clustering distance used is the JS divergence between the two subsets.”
○ What is the input to the JS calculation? What is the dimension of the distributions compared? These details need to be precisely provided.
The distribution has two dimensions, sizes and counts or positions and counts.
As said before, to clarify the used distributions in the JS divergence calculation, we have changed the following piece of text in section “Simulations evaluation” of methods:
“ Afterward, we tested the performance on the fifth fold, generating the simulated sequences with the same target and gRNA as the samples that belong to the fifth fold, in order to calculate the distance between these. The final validation, with the mean parameters of the different training interactions, was performed on a testing data set that was not used in the training. Validation was done with samples that had never taken place in the training process. Jensen distance is used to compare the characterization of real samples and simulated samples since this is the explored distance that differentiates better replicates among samples. In order to obtain the different distributions, the T cell data, including 1.521 unique cut sites, was split into different datasets based on the different classes: deletions, insertions and substitutions. For each of these classes, giving as input the datasets with only that class, we obtained the distribution for size and then for position of indels. The same was done for the other three cell lines: K562, HEK293 and HCT116, which included 96 unique cut sites, with three replicates each. The whole datasets (with 1521 and 96 unique cut sites) were split into five-folds (4 for training and one for test) and validation, in order to train and validate the simulator. Using the parameters obtained during the training-test iterations (the average value of the 5 iterations), we generate simulated sequences with the same target and gRNA as the samples that are assigned to the test subset to calculate the Jensen-Shannon (JS) divergence between the simulated and real samples of that subset. Finally, the same was performed for validation. The input for the distance calculations were the generated simulated subset and its real equivalent (same target and gRNA) distributions of the classes. ”
○ What clustering/aggregation approach did the authors use here (average dist, min dist, dist of centers?)
Hierarchical clustering.
○ 5 pairs were selected out of how many? Call that number K.
We have 100 samples in the validation set. Following the suggestion of indicating the total number of samples in the testing set, we have added this information to the figure caption.
○ What does the order of the samples in 1C mean? Is 98_real closer to 22_sim than to 98_sim? If so then state it. If not - what is the meaning of the order? Furthermore - how often, over K choose 2 pairs does this mis-matching occur for the CRISPR-A simulator??
Exactly, it is a hierarchical clustering, where samples are sorted by JS divergence. It was already stated in Results: “In addition, on top of comparing the distance between the experimental sample and the simulated, we have included two experimental samples, SRR7737722 and SRR7737698, which are replicates. These two and their simulated samples show a low distance between them and a higher distance with other samples.” As well as in Figure 1 caption: “For instance, SRR7737722 and SRR7737698, which cluster together, are the real sample and its simulated sample for two replicates.” Then, since these samples are replicates, its simulations will come from the same input and is expectable to find low distance between these two real samples as well as between both of them and their simulation. We have stated it in the discussion.
-
"From the characterized data we obtained the probability distribution of each class" (page 3) - How is this done? how many guides? how many replicates? what is class? where do you elabore regarding it? how you obtain the distributions? More details of the methods need to be provided. Added in methods.
-
The 96 samples used for development here - where are they taken from? This should be indicated in the first time these samples are mentioned. Namely - bottom of P6 Added: “The 96 samples, from these cell lines, are obtained from a public dataset BioProject PRJNA326019.”
-
CRISPECTOR is not mentioned in the comparison in the section: "CRISPR-A effectively calls indels in simulated and edited samples" (Table S2). Is there a specific reason for having left it out? CRISPECTOR, as well as ampliCan, is not in Table S2, since in this table is shown detailed data from Figure 2. CRISPECTOR is compared with CRISPR-A in figure 5, where the different approaches to enhance precision, like using a negative control, are explored.
-
In the section "Improved discovery and characterization of template-based alleles or objective modifications" - part of the analysis was made over simulated data and then over real data. The authors state "it is difficult to explain the origin of these differences...". Thus, needs to be investigated in more detail ... :) (P5) Moreover - the performance over real data is, at the end of the day, the more interesting one for comparison purposes. We have added this sample to the human validated dataset to understand better what was happening in this case and the results and pertinent discussion have been added in the manuscript: “CRISPResso2 is detecting a 2% more of reads classified as WT. These 2% correspond with the percentage classified as indels by CRISPR-A. In total, the percentage difference between CRISPResso2 and CRISPR-A template-based class is 0.6%, higher in CRISPR-A. CRISPR-A percentage is closer to the ground truth data than CRISPResso2.”
-
We found no explanation of "spike-in"/"spike experimental data" across the entire article. There is some general language about lengths but the scheme is still totally unclear. We have indicated in methods section when we were talking about the spike-in controls.
-
Description of the 96 gRNAs? Is this data from REF26? If so - where do you state this? If so - how do the methods described herein avoid the unique characteristics of the data of REF26? We have added the reference: “The 96 samples, from these cell lines, are obtained from a public dataset BioProject PRJNA326019.” In addition, there are other sources of data, simulations and now even human validated data.
-
"distance between the percentage of microhomology mediated end-joining deletions of samples with the same target was calculated and the mean of all these distances was used to reduce the information of the 96 different targets to a single one." (P6) What is the exact calculation used? which distance? How was clustering performed? What is the connection for gene expression? The used distance was euclidean distance and the clustering was performed using hierarchical clustering. We have added this information to the manuscript. Regarding the connection of gene expression, we are exploring the correlation of two phenotypes: the gene expression of the proteins differentially related with NHEJ and MMEJ pathways, and the gene editing landscape (indel patterns that are related with MMEJ and those that are more prone to be generated with NHEJ). We have tried to improve this explanation in the manuscript.
-
"we have fitted a linear model to transform the indels count depending on its difference in relation to the reference amplicon" (P7) - needs more explanation. Is this part of the pipeline? We have explained better how we have fitted the linear model in methods: “A linear regression model was fitted to obtain the parameters of Equation 1 using spike-in controls experimental data (original count, observed count and size of the change in the synthetic molecules). We have used the lm function from R. Parameter m in Equation 1 is equivalent to the obtained coefficient estimate of x which was 0.156 and n is the intercept (n=10). ”.
The model is optionally used as part of the pipeline as explained at the end of section “CRISPR-A gene editing analysis pipeline” to correct amplification biases due to differences in amplicon size. Then, what is part of the pipeline is the use of this model to make the transformation of counts from the observed counts to the predicted original counts. This is done with Equation 1 and can be found in the pipeline (VC_parser-cigar.R).
-
What is it "...manually curated data set"? (page 8) This is explained in “Manual curation of 30 edited samples” in methods.
-
Section "CRISPR-A empiric model removes more noise than other approaches" - with what data were the comparisons performed? Moreover, how were the comparison criteria selected (efficiency and sensitivity)? The literature already used several approaches to compare data analysis tools for editing experiments. See for example ampliCan, Crispresso (1 and 2) and CRISPECTOR. Maybe the authors should follow similar lines. The data used in this comparison comes from the reference 26:“26. van Overbeek, M. et al. DNA Repair Profiling Reveals Nonrandom Outcomes at Cas9-Mediated Breaks. Mol. Cell 63, 633–646 (2016).” We have added it to the manuscript.
The values of efficiency and sensitivity were not used directly for the comparison. We wanted to firstly evaluate our own algorithm. For that we obtained the values of efficiency and sensitivity for the previous mentioned dataset. These values were chosen to firstly have an idea of firstly, how much noise the algorithm is able to detect, and secondly, how much of it is able to be reduced after the Tx vs M process. That established a framework of comparison in which we can then compare directly the reported percentage of edition of the different tools.
Regarding the approaches used to compare data analysis tools for editing experiments, we are going to explain why we haven’t followed similar lines or how we have now included it:
In the case of ampliCan, the comparison that they do is with a synthetic dataset with introduced errors:
"synthetic benchmarking previously used to assess these tools (Lindsay et al. 2016), in which experiments were contaminated with simulated off-target reads that resemble the real on-target reads but have a mismatch rate of 30% per base pair".
In CRISPResso2, they benchmarked the efficiency against an inhouse dataset but this dataset is not published. Finally, for the benchmarking of CRISPECTOR, a manual curated dataset is used as a standard: "Assessment of such classification requires the use of a gold standard dataset of validated editing rates. In this analysis, we define the validated percent indels as the value determined through a detailed human investigation of the individual raw alignment results". In this sense, we have added a human validated dataset to do something similar to complement the analysis that we had already done.
In the end, we consider that simulated or synthetic datasets, as those used by ampliCan or CRISPResso2, does not capture the complete landscape of confounding events that can be detrimental to the analysis results. Similar limitations are found in the use of a gold standard dataset of validated editing rates, since the amount of reads or samples that can be validated by humans is not big since it is time consuming. In addition, humans can also make errors and have biases. Eventhogh, we have found very valuable talking into consideration adding a human validated dataset to complete our exploration.
- In the section "CRISPR-A empiric model removes more noise than other approaches" the authors state, incorrectly, that CRISPECTOR only reports the percentage of editing activity per site (there is much more information reported in the HTML report, including the type of edit event detected - deletion, of various lengths, insertions, substitutions etc). (P8) We thank the reviewer for the observation, as indeed the state is incorrect. What we wanted to express is that with CRISPECTOR we cannot trace individually each of the called indels, as any sort of excel or file with this content is given in the output. Therefore we cannot investigate which events have been corrected. To be precise in our statement we changed this sentence to the following:
“CRISPECTOR, although providing extensive information on the statistics and information about the indels, is not possible to track the reads along their pipeline, thus we cannot know which have been corrected and which have not.”
-
Section "CRISPR-A noise subtraction pipeline" describes a pretty naive method for noise subtraction (P12). Should be rigorously compared, for Tx vs Mock experiments, to CRISPECTOR and to CRISPResso2. In the section "CRISPR-A empiric model removes more noise than other approaches", we perform an exhaustive comparison with a dataset that contains 288 Mock Files vs 864 Tx files. This can be better appreciated in the, now included, figure Sup. 13A. CRISPResso2 was intentionally left out since their pipeline does not use a model to reduce noise but other approaches like reducing the quantification window.
-
"recalculated using a size bias correction model based on spike-in controls empiric data.." (P14). Where is the formula? The formula comes from Equation 1. Now it is correctly referenced.
-
Section "Noise subtraction comparison with ampliCan and CRISPECTOR" - fake mock was generated for comparison. We consider the avoidance of a Mock control in experiments designed to measure editing activity to not be best practice. It is OK to support this approach in CRISPR-A. However - the comparison to tools that predominantly work using a Mock control (including ampliCan and CRISPECTOR) should be done with actual Mock. Not with fake Mock .... (P15) We understand the claims of the reviewer for this point as the use of a “fake” mock may not be the best practice for general comparisons. Nevertheless here what we wanted to compare is the difference in the edition percentages using mock and not using it. Since to make a run for on-target data CRISPECTOR requires a mock, the only way to replicate the conditions of “no mock” was to use a synthetic file with the same characteristics of the treated files in terms of depth, but with no edition/noise events to avoid any correction outside this framework. The other run was made with the 288 real Mocks. This was a solution ad Hoc for CRISPECTOR, with ampliCan we used only real mock since they allow to make runs without a mock for on-target.
We changed the word fake for synthetic in the Noise subtraction comparison with ampliCan and CRISPECTOR section:
“As for CRISPECTOR, since it requires a mock file to perform on-target analysis, synthetic mock files were generated”.
Minor comments:
- "Also, most of these tools lack important functionalities like reference identification, clustering, or noise subtraction" - bold part incorrect for CRISPECTOR, although it is not aiming only for CRISPECTOR In supplementary table 1, it is already elucidated which are the functionalities that each tool has. We have also added more context to that statement to highlight the differences between different tools:
“Even not all of them have the same missing functionalities, as can be seen in the Supplementary table 1, CRISPR-A is the only tool that can identifies the amplicon reference from in a reference genome, correct errors through UMI clustering and sequence consensus, correct quantification errors due to differences in amplicon size, and includes interactive plots and a genome browser representation of the alignment.”
-
"Same parameters and probability distributions were fitted for three other cell lines: Hek293, K562, and HCT11626, to make SimGE more generalizable and increase its applicability" (page 3) - how was fitted? It was fitted in the same way as the t-cell samples as specified in methods. We have detailed more methods explaining how SimGE is built.
-
What is the "nature of modification"? (P5) We have changed nature by type for a better understanding.
-
In the section "CRISPR-A effectively calls indels in simulated and edited samples" (P5) towards the end, the authors write that the CRISPR-A algorithm did not give good results for a few examples. They then state that this was corrected and then yielded good results. There is no explanation of what correction was done, if it was implemented in the code and how to avoid/detect it in further cases. The problem was that the used reference sequence was too short. There is no modification in CRISPR-A code, we have just used the whole amplicon reference sequence obtained with the amplicon reference identification functionality of CRISPR-A. We have tried to explain it better in the manuscript: “Once the reference sequence is corrected used is the one corresponding to the whole reference amplicon, obtained with CRISPR-A amplicon sequence discovery function, CRISPR-A shows a perfect edition profile”
-
Cell culture, transfection, and electroporation - explanation only for HEK293, what about the others? (P15) We already had explained it for HEK293 and for C2C12, that are the experiments done by use. In the case of the analysis of the three cell lines and 96 targets we reference the source of the data as this data was not produced in our lab.
-
Typos and unclear wording: ○ "obtention" (P8) → changed by obtaining
○ "mico" >> micro (P 7,10) → changed
○ "Squema" >> scheme (Fig.1) → changed
○ "decombuled" (P10) → changed by separated
○ "empiric" >> empirical (P8 and other places) → changed
○ "Delins" (P14) → this is not a typo, it is used to indicate that a deletion and insertion has take place (http://varnomen.hgvs.org/recommendations/DNA/variant/delins/)
○ "performancer" (P9) → Change to performance
○ Change word across all article - "edition" to "editing" → changed. In the case of edition windows it has been changed by quantification windows.
○ "...has enough precision to find" (P6) not related to "results" section → We have moved to discussion.
- Comments on figures: ○ Fig. 2C:
■ No CRISPECTOR in the analysis
It is not included because for on-target analysis this tool requires a mock control sample. For this reason, it is compared in Figure 5D, where samples using negative controls are compared, and in Figure 5E where all tools and their different analysis options are compared.
■ It is simulated data only
Yes, it is. Comparison with real data is done in Figure 2D and 2E. And now we also have added a ground truth data in our comparisons obtained from human validation of the classification of more than 3,000 different reads.
■ It is not violin plot as mentioned in the description
It is a violin plot, but in general there is not much dispersion of the data points making the density curves flat.
○ Fig 3A - Is it significant? Yes, it is. We have added this information in the caption of the figure.
○ Fig. 4:
■ A
-
Each row/column is a vector of 96 guides? No, as it is said in the caption of the figure, it is the “mean between the distances calculated for each of the 96 different targets.”
-
How is the replicate number decided? Is it a different experiment by date? What is separating between experiments? Rep numbers? All this information should be found in the referenced paper from which this dataset comes from as already referenced.
■ B - Differential expression:
We have realized that the caption was not correct, missing the explanations for Fig. 4B and all the following ones moved to a previous letter.
-
How? did you measure RNA? It is already stated in methods that RNAseq data was obtained from SRA database and the analysis was done using nf-core/rnaseq pipeline: “RNAseq differential expression analysis of samples from BioProject PRJNA208620 and PRJNA304717 was performed using nf-core/rnaseq pipeline⁵².”
-
Is the observed data in the figure sufficiently strong in terms of P-value? Yes, at is it is highlighted in the plot with ** and ***. We have also added the p-value in the cation of the figure.
-
Where is the third cell-line? As mentioned in the text, we have just chosen the cell lines that show us higher differences in the the percentage of MMEJ: “HCT116 than in K562, which are the cell lines with the major and minor ratios of MMEJ compared with NHEJ, respectively”.
○ Fig.13 - There is no A and B as mentioned in the text
We thank the reviewer for the observation as we mistakenly uploaded the wrong figure. We corrected it.
Reviewer #2 (Significance (Required)):
We repeat the aspects of contribution, as listed in the first part of the review, and comment about significance:
- NextFlow pipeline implementation is an important engineering contribution. Same is true for the interactive web application
Significant engineering contribution. Nonetheless, we were not able to run the analysis. So - needs to be checked.
Hopefully now that the documentation is properly added to the repository it will be easier to run analysis.
-
The option to simulate an experiment to assess it is a nice feature and can help experiment design
An important methodology contribution
-
Identification of amplicons when not provided as input
Not important in the context of multiplex PCR and NGS measurement assays, as amplicons will be known. Not clear what other contexts the authors were aiming at.
It is useful to save time, no need to look for the sequence of each amplicon and add it as input. Also, it can help to detect unspecific amplification, since all amplicons of the same genome can be retrieved from the discovery amplicon process. In addition, we have already found one example where this avoids getting incorrect results: “Once the reference sequence used is the one corresponding to the whole reference amplicon, obtained with CRISPR-A amplicon sequence discovery function, CRISPR-A shows a perfect edition profile”. We have added this to the discussion of the manuscript.
- CRISPR-A seeks substitutions along the entire amplicon sequence and is less dependent on the quantification window and on the putative cutsite
Importance/significance needs to be demonstrated
In figure 3 are shown the results of template-based and substitutions detection. CRISPR-A is a versatile and agnostic tool for gene editing analysis. This means that it can be prepared for the analysis of gene editing of future tools, since the cut site or other elements of experiment design are not required. In addition, it has been shown that when a mock is used its performance is comparable to filtering by edition windows, avoiding the loss of edits when the cut site is slided.
- Analysis of the difference, in edit activity, comparing different cell lines
Significant contribution. However - the methods need to be much better explained and the results better described in order for this to be useful to the community.
We have made an effort to try to be more clear in the description of the results.
- CRISPR-A supports the use of UMIs
Mildly significant technical contribution. However - only addresses on-target. Also addressing off-target would have been significant.
The use of UMIs is something that has never been done before in this context. Sequencing biases are not taken into account and editing percentages are reported as observed. Being able to differentiate between different molecules at the beginning of the amplification sequence, allows a higher precision avoiding under or overestimation of each of the species in a bulk of cells.
In the case of off-targets, can be for sure done using sequencing the predicted off-target sites. In addition, there are other methods, like GuideSeq that can be used to discover off-targets, but this kind of data is out of the scope of CRISPR-A. Even that, we are aware of the importance of being able to analyse off-targets when in a context of a broad analysis platform and we will take these into consideration when participating in the building of crisprseq pipeline from nf-core.
- Interesting sequence pattern insights - like "...found certain patterns associated with low diversity outcomes: free thymine or adenine at the 3' nucleotide upstream of the cut site that leads to insertions of the same nucleotide, a free cytosine at the same place that leads to its loss, and strong micro-homology patterns that lead to long deletions "
As stated - interesting.