Reviewer #3 (Public Review):
The authors reanalyze an existing dataset of single-cell Sperm-seq data to search for signals of transmission distortion. They develop an improved genotype imputation method and use this approach to phase donors and characterize the landscape of ancestry across each sperm genome. Using these data, the authors determined that there are no regions in any of the male donors' genomes that display a significant excess of TD. The main biological claim of the paper is that there is a strict adherence to Mendelian transmission ratios in human males.
The computational approaches for accurately phasing and reconstructing haplotypes in individually lightly sequenced gametes is a potentially useful advance that I expect may be valuable for geneticists analyzing similar datasets. The quality of software documentation and usability is high. I have concerns about the appropriateness of the comparisons selected for this approach and the algorithm does not appear particularly novel.
I have no doubt about the authors' basic conclusion that there are no strong male TD loci in the male donors examined. However, I find their statements about "strict adherence to Mendelian ratios" and many references to strong statistical power to be oversold. The power of this study is still quite limited relative to the strength of TD that we would expect to find in human populations.
Major Concerns:
There are really two distinct papers here. One is about improved imputation and crossover analysis from sperm-seq data and one is about TD. The bulk of the methodological development is a rework of the approach for genotype imputation and haplotype phasing in Sperm-seq. Yet, the major conclusions are focused on a scan for TD. I am left wondering if analyzing these data using the original method in the Bell et al paper would have produced different conclusions about either? If not, is there a systematic bias such that one would find an excess of false detections of TD? Phasing slightly more markers is not a particularly compelling link between these sections because even fairly sparsely distributed markers that are correctly phased would certainly be fine in a scan for TD within a single individual due to linkage. If this cannot be shown I wonder if this work would be better split into two manuscripts with one more technical paper describing the differences in recombination maps associated with rhapsodi and the other as a brief report stating that strong TD is probably uncommon in human males.
It is not surprising that rhapsodi outperforms Hapi since Hapi was designed for a very different quantity of samples and sequencing depths. I appreciate the authors' point that Hapi performed better than other methods in comparisons run by the Hapi authors. However, they were looking at very few gametes (10 or so, I believe). For that reason, this comparison is not appropriate to address the application to the datasets used in this paper. The authors should include an analysis comparing rhapsodi against hapcut2, PHMM and other methods that are appropriate for the full scale and sequencing depth of the data. Additionally, the original Bell paper used a phasing + HMM approach of some kind for exactly this data. Why wasn't that approach considered as a point of comparison?
With respect to the method for imputation, no comparison is made to known recombination maps nor do the authors make any comparison across the maps derived from each donor. Reporting an improved method without it motivating novel biological conclusions is not compelling in itself. I suggest the authors expand that analysis to consider these are related questions. E.g., are there males whose recombination maps differ in specific regions? Are those associated with known major chromosomal abnormalities? Is this map consistent with estimates from LD, pedigrees, Bell et al?
Most of the validations presented are based on simulated data. This is fine and has some advantages, but real data imposes challenges that these analyses do not address. My understanding is that the Bell et al. (2020) paper includes a donor with a phased diploid genome. A comparison of rhapsodi's phasing accuracy against that genome should be included.
The main biological conclusion about a "strict adherence to Mendelian expectations across sperm genomes" is an overstatement. Statistical power of this study is still limited relative to the strength of TD that would be expected within human populations. One reason is the multiple testing correction. Another is that 1000-3000 draws from a binomial distribution with expected p = 0.5 is just not sufficient to overcome binomial sampling variance. In light of this concern and the central conclusion of this paper, the authors' discussion of power is inadequate. The main text really should contain explicit discussion of the required genotype ratio skew for TD in each donor to be detected with good power. Given previous pedigree studies, it is not surprising that no significant TD was discovered that exceeded the necessary ~10% effect sizes to be detectable. Recent, much more powerful analyses in mice, Drosophila and plants, indicate that strong TD is probably uncommon and even weak effects can be detected but are uncommon.
This manuscript would benefit from a much clearer examination of statistical power and a detailed comparison of the power of this approach vs pedigree-based analyses as well as bulk gamete sequencing approaches. Although the authors are correct that all scans for TD in human genomes have been pedigree or single-cell based, more powerful alternatives are known. These are based on sequencing pools of individuals or gametes (e.g., Wei et al. 2017, Corbett-Detig et al. 2019). Each of those studies has been able to identify signatures of segregation distortion below the thresholds required for significance in this study. These and related works should be acknowledged in both the introduction and discussion. Although I appreciate that the ability to phase the genome in a single experiment may be appealing, phasing diploid genomes via hi-c omni-c is straightforward and the advantages in statistical power suggest that approaches using pools of gametes are preferable for well-powered scans for TD.