37 Matching Annotations
  1. Mar 2025
    1. as preliminary analysis indicated that most features of interest were represented at this point

      It doesn't seem that Fig S5 shows how layer 26 was selected. It would be interesting to at least get a short description in the methods of how this layer was chosen. Other work on mechanistic interpretability in protein language models has shows that different types of features can be learned in different layers of the model.

    2. Together, these results highlight the competitive performance of Evo 2 in predicting the pathogenic effects of human coding SNVs

      As an evolutionary geneticist to me the most interesting benchmark here are the PhyloP scores. When I see models like EVO2 my concern is always that they are able to effectively memorise phylogenetic conservation. This is totally valid from a biological standpoint however, this can be done with a far simpler phylogenetically explicit method like PhyloP, GERP etc. What is far more exciting is the possibility that a flexible, large model like EVO2 could pick up on non-linear (e.g epistatic) patterns which is something PhyloP type methods are blind to. That PhyloP is very competitive in all these tasks I think is quite telling that for the most part the power of all these models comes from identifying conservation rather than more general 'biological rues'. However that in some instances PhyloP can be improved upon is very exciting nonetheless, in my opinion this is the golden benchmark to be trying to beat.

    3. These values were then used as a predictive variable in a logistic regression model of gene essentiality, and directly compared to simple genetic metrics such as GC content and transcript length. Gene age values from the original lncRNA essentiality study (Sarropoulos et al., 2019) were used where available as an additional control.

      Aside from NT, these alternative metrics of lncRNA essentiality seem over simplistic compared to a model as complex as EVO2. Are there no other alternative models for lncRNA essentiality? Maybe a tweak of sequence conservation methods could work here too.

  2. Feb 2025
    1. Adult HA were sampled at four sites: two in the native area (Russia [Siberia] and China)

      The results certainly seem to suggest these native populations might be bottlenecked too. Is there any indication on how central these sampling locations are to the species native range? Is it possible that the range edge was sampled?

    2. In all populations studied and for each species, derived alleles were mostly rare (with frequencies below 0.1)

      Site-frequency spectrum plots per population+mutation would quantitatively demonstrate these patterns without the need to arbitrarily bin allele frequency.

    3. and a negative correlation between t

      This correlation is based on two autocorrelated measures (as theta pi synonymous is measured in both the X and Y axis), so it should be interpreted with caution.

    4. crop pest (DVV)

      I wonder how much the fact that DVV is a crop pest might influence the results for this species. It would be easy for me to imagine that most DVV populations (native and invasive) have experienced agriculture related bottlenecks and/or population expansions. Pests like corn rootworm have repeatedly adapted to the use of pesticides/GM crops a process which often involved a bottleneck (followed by expansion) and may cause similar effects on the evolution of load in native/invasive populations. Data on the population ecology or local agricultural practices (and history of pest load) may be helpful in figuring if the selective landscape of these populations could have such effects

  3. Jan 2025
    1. but that selection acts against EGC near codons specifying fixed derived amino acids, i.e., mutations that differentiate Arabidopsis thaliana gene copies from those of Arabidopsis lyrata.

      Very cool result!

    2. (Fig. 2)

      It would be helpful to have colour labelled legends in the figures.

    3. deed, visual inspection identified numerous linked specific variants at polymorphic sites, sometimes spanning hundreds of positions, indicative of gene conversion tracts.

      As you point out, it seems some of the strongest evidence for ECG is that polymorphism is not just shared, but shared polymorphisms are linked. One way you could statistically quantify this is by running a tool scanning for evidence of identity by descent (IBD), or use a tree sequence approach, treating each gene from each accession as an individual genome (like in the multiple sequence alignment you construct). This isn't strictly what IBD tools are for, but it should provide a good proxy given that A. thaliana has low polymorphism and high linkage disequilibrium. Relatedly it would help if intervals for putative EGC could be filled in, not just the limits marked as in Fig S6. This would make it easier to see what the length of EGC tracts is.

    4. This can explain why there is more segregating fitness variation within populations than predicted under mutation selection balance (1).

      This seems to me as a fairly strong statement that doesn't quite line up with the goals/results of this study. Technically what this study shows is how ECG can contribute to standing genetic variation in a population. The specific paradox of standing genetic variation in phenotypes however, looks to reconcile why there is more variation than expected under mutation selection balance (MSB). MSB in practice is agnostic to the type of mutations, when/where they arose, simply their fitness effects. As detailed in the first reference, it is clear that SNPs alone are inconsistent with MSB which is not surprising since they are only a fraction of genetic variants found in populations. However again as detailed in the first reference, quantitative genetics approaches that use mutation accumulation experiments are agnostic to mutation type, and provide a framework for testing if MSB is sufficient to explain standing levels of genetic variation. Paradoxically they often find that MSB is not enough to explain the high variation in phenotypes (due to genetic effects) we observe (see https://doi.org/10.1098/rspb.2018.1864 for an example), implying that forces such as balancing selection must also be working to maintain excess genetic variation. What this study demonstrates is that non-SNPs can contribute to variation (as other studies have demonstrated for transposons, inversions, indels etc.). This alone does not demonstrate the sufficiency of MSB to explain observed genetic variation in phenotypes though.

  4. Dec 2024
    1. Conversely, at larger time scales, the dynamical noise contribution dominates and the trajectory-to-trajectory fluctuations are large enough to hide the signal coming from the ancestral sequence, precluding the possibility to reconstruct i

      It might be interesting to see what the scale of hamming distance distribution is in the underlying MSA's for the focal protein families is, vs. at what scales of hamming distance such effects are observed in the simulations. One potential concern could be that couplings/epistasis are estimated from the MSA on one scale of sequence divergence, but the simulations are pushed to much larger scales, in which cases the epistatic interactions inferred from the MSA might no longer be accurate.

    1. For over a century, scholars and dog-enthusiasts alike have sought to unravel the complex evolutionary history of man’s best friend

      I enjoyed reading your paper! This canid dataset offers such a great opportunity for exploring genotype-phenotype mappings, it's great to see how such associations can be teased apart in studies such as this.

    2. For small individuals, 34 SNP

      Relatedly, it might be interesting to compare effect size estimates across these different data subsets. A large swing in additive effect sizes for markers across populations has implications regarding epistatic interactions the focal locus may be part of, see: https://www.nature.com/articles/nrg3627

    3. For breed average height, we found 27 SNPs

      It would be interesting to see a simple summary in the text of how much sharing there was in significant markers between the breed average/small/large subsets.

    4. detect non-additivity

      It's hard to tell from the description in this manuscript how non-additivity is captured by this analysis. A quick one-liner on this might be helpful for readers.

    5. ROH sharing matrix as a kinship matrix.

      Do you have a sense for how much of a difference there is between using an ROH sharing vs a general whole-genome SNP/marker based kinship matrix? My initial reaction was that a more independent whole genome derived matrix that captures structure across the whole genome might be more desirable to capture fine scale population structure/ancestry differences etc. But perhaps using ROH runs themselves since they are the target of the analyses is better.

    6. The second genomic narrative

      A small content suggestion. The introduction of this paper spends a considerable amount of time discussing the potential history of dog domestication. However, this background seems only tangentially related to the content of this paper which rather aims to take advantage of population structure in dogs to explore associations between genotypes and phenotypes in the context of domestication. More background on the genomics of domestication might be more relevant.

  5. Oct 2024
    1. We expected that purging wouldremove putatively deleterious alleles from small populations but have no effect on frequenciesof alleles in less deleterious categorie

      I'm a little confused by this statement. This would make sense in certain cases (e.g., highly recessive mutations). But SNPeff and GERP scores give no information on recessiveness. If these are just any deleterious variants, then the expectation should be the opposite, that high Ne pops should purge load easier.

    2. such as in recently bottlenecked populations

      Does this seem plausible in the case of kelp? If bottlenecking has been recent, the effect on Ne will be instant, but it will take for the signal of differences in purging to build up. Have strong kelp declines been more recent (>50 years)?

    3. A strong isolation-by-distance pattern of increasing genetic distance (dXY) with geographic distance (Figure S4) andthe presence of populations admixed between clusters (Figure 1A-D) suggest that adjacentclusters are connected by gene flow.

      Seems in bull kelp lots of gene flow occurs across the southern tip of Vancouver Island. On the other hand, the northern tip seems to represent a barrier (judging by the clusters in Fig1). Are there any hydrological/oceanographic reasons to expect this maybe?

    4. Bull kelp and giant kelp are the principal canopy-forming species in kelp forests of thenortheast Pacific, supporting highly productive and biodiverse ecosystems13

      I really liked reading this paper. It's great to see such detailed sampling and interrogation of the pop-gen of these keystone species.

    5. We estimatedeffective population size (Ne)

      Does this mean you report selfing Ne (rather than typical coalescent Ne) in your analyses? This would be useful to highlight in the main text.

    6. 1) All else being equal, indivi

      Is there any correlation in Ne/inbreeding/low diversity between the two species of help where they co-occur? This could be a useful indicator for conservation efforts.

    7. We observed no evidence of purging in either species. We predicted that smallerpopulations would show a reduction in DMA frequency at evolutionarily conserved sites (GERPanalysis) due to increased homozygosity and exposure to selection, yet DMA frequency wasuncorrelated with population size (Figure 3A-B)

      While there are no differences between populations, the regression lines for the GERP/SNPeff analyses clearly show that less constrained sites harbor more diversity than more constrained sites, implying that purifying selection is acting on purging deleterious variants in both species. Seems like purging is present in this dataset it is more of a time-scale issue when it comes to detecting it. This makes sense particularly for recessive variants since they will be hidden from selection for a lot of time.

  6. Sep 2024
    1. Capacity to generate adaptive variation can evolve by natural selection. However, the idea that mutation becomes biased toward specific adaptive outcomes is controversial. Here, using experimental bacterial p

      I greatly enjoyed reading this preprint. The dissection of the origin of the de novo contingency locus was very cool.

    2. Central to our findings was a selective process where lineages better able to generate, by mutation, adaptive phenotypic variants, replaced those that were less proficient (Figure 1). In one metapopulation, a single lineage emerged with capacity to transition rapidly between phenotypic states via expansion and contraction of a short nucleotide repeat in a manner precisely analogous to that of contingency loci in pathogenic bacteria

      Do you have any thoughts on why global mutator alleles underpinned evolvability in two populations, and a local mechanism in the other? Seems that increased mutation rates are a common by-product of experimental evolution (e.g. instances in the LTEE). There is a nice paper in yeast that has demonstrated that mutator alleles tend to be favoured in cases where local population size is high (which allows selection to more efficiently act on the beneficial variants they produce). Might be relevant here: Sign of selection on mutation rate modifiers depends on population size, Raynes et al.,2018

    3. As the number of repeats increased, the rate at which transitions occurred visibly increased (Figure 3E).

      What a cool result. This reminds me of the observation in stickleback that the independent evolution of loss of pelvic hindfins tends to target the same locus because of the specific molecular characteristics of that stretch of sequence. This may be of relevance to this study: DNA fragility in the parallel evolution of pelvic reduction in stickleback fish, Xie et al. 2019.

    4. The selective regime employed was contrived, with selection on lineages being strictly enforced. Such stringent conditions are likely limited in nature. However, microbial pathogens faced with the challenge of persistence in face of the host immune response, will experience strong lineage-level selection, with repeated transitions through selective bottlenecks [38]. As we have shown here, precisely these conditions can promote the evolution of evolvability.

      It seem to me that the key component of the experimental selection regime is that individual level and lineage level selection were allowed to act in separate consecutive timesteps, shielding lineage level selection from being swamped out by 'short-sighted' individual level selection. I wonder how likely this scenario is to play out in circumstances such as pathogen evolution where I would imagine that both levels of selection should still be acting concurrently. I agree however that the presence of contingency loci implies that some form of ecological conditions likely exist that allows for this to happen.

  7. Aug 2024
    1. The magnitude of the raw difference is typically much larger than that of the posterior effects. The difference is likely caused by LD, in that the raw difference of a single mutation contains contributions from other linked mutations, which may inflate the estimates.

      Could you constrain this analysis to mutations that are in LE with other de-novo mutations to test this hypothesis?

    2. Here we employ a classical line-cross strategy with MA lines, to break down the linkage disequilibrium among the accumulated mutations. We then combine whole-genome sequencing with high-throughput competitive fitness assays to estimate the DFE of a set of 169 spontaneous mutations.

      I greatly enjoyed reading this paper. True experimental estimates of the DFE in MA studies are super valuable and provide a very interesting comparison for pop-gen based DFE methods as pointed out by the authors.

    3. Averaged over all RI(AI)Ls, accounting for variation among assay blocks and removing two outlying lines, the regression of W on number of mutations is not significantly different from 0 (slope = −0.0051, F1,509=1.83, P>0.17), although the trend suggests that mutations are deleterious, on average.

      Is there a chance that false negative mutations (i.e. incorrectly unobserved events in the MA lines) could contribute to this result?

    4. The simplest way to infer the mutational effect at a locus is to calculate the mean value of all lines with a mutant allele and all lines with an ancestral allele at that locus; the difference is the raw difference (uRAW) of the mutation at that locus. As a sanity check, we plotted the inferred Bayesian posterior effect against the raw difference; ideally, the correlation should be +1. The correlations were positive, but well below 1 in all three cases (Figure 4). The magnitude of the raw difference is typically much larger than that of the posterior effects. The difference is likely caused by LD, in that the raw difference of a single mutation contains contributions from other linked mutations, which may inflate the estimates.

      Two quick thoughts for further sanity checks. 1) Does this regression look any different for SNPs vs indels? 2) Do the individual mutation specific effects conform to expectations one might have based on the functional annotations available for these mutational events?

  8. Jul 2024
    1. An important caveat is that, although the DE framework makes reasonable fitness predictions for these two drug pairs, it fails in many other environments and for many other genotypes, again highlighting the prevalence of ExExG.

      The DE approach seems quite powerful especially since it adds a 'benign E' reference line for fitness comparisons. I would love to see how the prediction from this model lines up with true fitness in figure 2 for all lines tested.

    2. In terms of synergy vs antagonism, our results suggest that a small number of mutations can change a drug combination from having a synergistic to an antagonistic effect. For example, figure 2C shows a case where LRLF acts synergistically on a yeast strain harboring a single nucleotide mutation to the HDA1 gene, but acts antagonistically on a different evolved yeast mutant. Similarly, figure 3 shows cases where a drug pair changes from having a synergistic to an antagonistic effect across different mutants.

      It seems from figure 2 and 3, the dominant pattern in the dataset is that of antagonistic interactions (at least in respect to the additive model). This made me wonder two things: 1) Are there are general biological explanations for such a pattern or considerations for why this might be expected? I'm thinking of the GxG equivalent where we know for example that diminishing returns epistasis is a common feature of adaptive populations, and this can be linked to theoretical models of fitness landscapes in the context of Fisher's geometric model etc. 2) Is this the correct biological null model to use? Certainly in the quant-gen world the additive approach would be the go-to starting point, but is this relevant for the context of these fitness estimates? My first gut feeling was that the average null model should be more relevant. Not sure if a pop-gen multiplicative approach is another potential null.

    3. Here, we take a large collection of roughly 1,000 antifungal drug-resistant yeast mutants evolved using this method and ask how often fitness in multidrug environments is predicted by fitness in single drug environments (Figure 1D)

      I enjoyed reading this paper and the novel ExExG framing of the study! This is a great dataset, I hope more genomic data can be attached to it in the future enabling even more mutation specific questions to be asked.

    4. Four different models (horizontal axis) are used to calculate expected fitness for each of roughly 1000 mutants per drug pair

      It would be useful to get a short description of these models here (aside from the methods) for clarity.