This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag031), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1:
The authors assembled the genomes of three Cobitis species native to Eurasia in an attempt to investigate the effects of structural variants on hybrid meiotic failure. This is certainly an interesting topic given the advances in our abilities to study hybridization that have been enabled by modern genomic sequencing methods, and the evolutionary consequences of asexually-reproducing species that result from rare instances of these hybrid events.
Major comments:
The introduction of the manuscript is well-written and focused on the topic at hand. Language was mostly clear throughout the manuscript. However, the paper overall is very lengthy and would benefit from extensive revision. Personally, I think the assembly and annotation of the three genomes is worthy of being a paper (genome report) on its own. Extraction of this material into a separate manuscript would allow the authors to hone the remainder of the paper into a much more concise and focused manuscript.
Some aspects of the methods section related to genome assembly and annotation could be clarified and/or bolstered. Presentation of methods is mostly clear, but the description of genome annotation methods is a bit tough to follow. This procedure included many complicated steps and may benefit from a flow chart, even if included only as a supplemental figure.
Several important quality control steps pertaining to genome assembly and DNA/RNA sequence processing were not mentioned. Authors do not report methods used for quality filtering or trimming. They do not report any process for removal of sequencing adapters. Additionally, they do not report screening of the genome assemblies for contamination from other species. These are critical steps in producing high-quality genome assemblies that need to be addressed.
Presentation of statistics describing genome assembly quality, contiguity, and completeness could be improved. Authors might want to take some inspiration from statistics required for reporting in genome reports published by other journals, such as G3 or Genome Biology and Evolution. Sequencing depth is not reported in any context for the initial assemblies. Only log-transformed values are available in a single figure. Throughout the manuscript, authors conflate sequencing coverage (the proportion of a genome or genomic region that has been sequenced) with sequencing depth (the number of times a base or genomic region has been sequenced).
For the sex-linked primers designed by the authors - I would recommend development of an internal positive control that would be expected to amplify in both sexes and be easily distinguishable from the sex-linked locus by size or fluorescent label. This allows the users to distinguish between failed PCRs and identification of the homogametic sex. This is especially important because the fish selected for marker development were collected from a relatively small portion of the species' distributions (Figure 1) so there could be population-specific differences that affect reliability of these markers for identifying sex. This is a problem I regularly encounter in my own work for wide-ranging species.
I was also surprised that the authors did not conduct a GWAS analysis. That seems to be a fairly typical analysis included in studies of this type to elucidate sex-linked SNPs. It would add to an already extensive manuscript; however, this could add an additional argument for splitting this manuscript in two. It would provide more space to include it in a more focused manuscript.
The results section contains many statements that would be more appropriate in the Methods section, or could be deleted entirely because they are redundant with statements already present in the Methods section. Additionally, there are some sentences that are more appropriate for inclusion in the Discussion section because they are interpretive. I have included examples under the 'Minor comments' section of this review. Some of the material presented as results in the Supplementary tables is presented in a confusing manner, and appears to contain errors (see examples in 'Minor comments' section below).
The first several paragraphs of the Discussion section either repeat material already covered in the Results section, or go on tangents that are not directly related to the main purpose of the paper. However, some of it could be more appropriate to include in a genome report if the authors split the manuscript in two.
Given the above issues, I find that the paper needs extensive editing and possibly more analytical work (if some of the methodological deficiencies were overlooked in the analysis phase as well as the writing phase of this project). It is unlikely this work could be accomplished in the normal window for a revision. Therefore, I regrettably suggest rejection of the manuscript.
Finally, I have no meaningful experience with FISH probes or chromosomal painting so unfortunately, I can't provide much comment on those portions of the paper.
Minor comments:
Line 291: please provide specific version number for Hisat2
Line 319: version numbers for D-Genies and SyRI missing
Line 331: version number for NGenomeSyn missing
Line 439-440: Authors provide N50 values, but the paper would benefit from providing some additional metrics, such as N90 and L90, to help readers gauge the contiguity of these genomes.
Line 442 - 443: I'm having a hard time understanding how the authors are calling these 'chromosome-level' assemblies when nearly a third (>30%) of the genome of two species (C. tanaitica and C. elongatoides) could not be assembled into chromosomal scaffolds.
Line 457 - 458: Either the term 'topologically associated domains' is missing, or the authors need to remove the parentheses from around TADs if it was defined earlier in the manuscript.
Line 470: change 'less' to 'fewer'
Line 483 - 486: The statements that observed patterns of repeat families 'suggest' something are interpretive and should be moved to the discussion.
Line 499 - 500: This sentence repeats content of the methods section. I suggest deleting it.
Line 540 - 564: If I am understanding correctly, the discussion of 'coverage' here would be more accurately described as 'depth' since the authors seem to be talking about average sequencing depth in different areas of the genome. Furthermore, authors never provide untransformed measures of sequencing depth in any context (the initial genome assemblies, pool-seq data, re-sequenced individuals, etc.). Therefore, it is difficult to determine if the differences being discussed here are derived from data with enough statistical power to measure differences in sequencing depth between male and female fish.
Lines 614 - 619: This could be explored with GWAS
Lines 635 - 641: Much of this paragraph is a description of methods and belongs in the Methods section.
Lines 664 - 667: Much of this is interpretive - more appropriate for the discussion.
Lines 700 - 711: This paragraph has little or no relevance to the main topic of this paper (hybrid meiotic failure).
Line 745: remove "loci's"
Line 813 - 815: PMER was already defined earlier in the paper.
Line 854: I suggest removal of "the first of their kind in an asexually reproducing vertebrate," because such statements rarely age well, and the concept behind the paper is interesting enough to stand on its own without pointing out the novelty of it being the 'first' time it was detected.
References section: Capitalization of article titles varies from one reference to the next. Scientific names are sometimes italicized; other times they are not.
Table 2: 'L50' and 'Number of Chromosomes' are always going to be integers. Why are there two significant digits to the right of the decimal point?
Supplementary Figure S2: 'Cobitis' should be italicized.
Supplementary Table S7: This table presents pre- and post-HiC values in a confusing manner that is nonsensical and probably erroneous. For example, the N50 values seem problematic. How do you have a 154 Kbp pre-HiC N50 contig value for C. elongatoides, but a 154 Mbp post-HiC N50 contig value for the same species? This is longer than the longest reported chromosome for any species (C. taenia) in Supplementary Table S8 (99 Mbp).
Supplementary Table S10: I don't know what the percentages in line 33 refer to?