23 Matching Annotations
  1. Jul 2024
    1. Strikingly, the resulting landscape was dominated by generated proteins, which comprised 94.1% of the total phylogenetic diversity (as measured by cumulative branch length) and resulted in a 10.3-fold increase in diversity relative to the entire CRISPR-Cas Atlas (Fig. 2b). Novel phylogenetic groups were distributed across the tree, suggesting that the model has captured the full diversity of Cas9 and is not overfitting to any particular lineage.

      I find it hard to interpret the importance of these results without more context.

      For example, how surprising is it to see this enrichment given the initial n of natural and generated proteins?

      How might decisions with respect to tree construction effect the branch length distribution? It seems possible that you would get different a different outcome if you varied the mmseqs parameters or implemented different criteria for choosing representative proteins.

      Furthermore - though novel phylogenetic groups are distributed throughout the tree - it would be interesting to know if the overall distribution across clades is predicted by the abundance of natural proteins across the tree. I.e. do clades with more natural proteins in the training data tend to produce more generated proteins?

    1. We ran the analysis using a rooted time-calibrated species tree obtained from timetree.org 33.

      What was the rationale for using a tree from timetree.org as opposed inferring one from the gene families?

      I imagine that a comparing the effects of using timetree vs. an inferred tree on CSUBST outputs would be enlightening. Such a comparison could be an empirical way to assess the effects of topological error in this data set (and would be a nice complement to some of the analyses in Fukushima and Pollock).

  2. May 2024
    1. where nj is the raw sequence count for species j, d(i, j) is the time to last common ancestor between species i and j collected from the TimeTree of Life resource (Kumar et al., 2022), and α ∈ R≥0 is a hyperparameter used to scale d appropriately. Under the assumption that mutations occur at a fixed rate, <img class="highwire-embed" alt="Embedded Image" src="https://www.biorxiv.org/sites/default/files/highwire/biorxiv/early/2024/03/12/2024.03.07.584001/embed/inline-graphic-3.gif"/> gives the expected overlap in sequence between two species’ orthologs, to approximate the effective sequence counts they contribute to each other4.

      It's great that even with the use of fixed rates you see a substantial increase in fraction of bias explained. Since mutation rates obviously do vary, I wonder just how much better you might do using a model that doesn't explicitly fix them...

    2. Under the assumption that mutations occur at a fixed rate, <img class="highwire-embed" alt="Embedded Image" src="https://www.biorxiv.org/sites/default/files/highwire/biorxiv/early/2024/03/12/2024.03.07.584001/embed/inline-graphic-3.gif"/> gives the expected overlap in sequence between two species’ orthologs, to approximate the effective sequence counts they contribute to each other4.

      What does it look like if you just use the branch lengths from the phylogeny to do this weighting? I would guess you get at least some increase in the Spearman correlations and it's a straightforward approach.

  3. Mar 2024
    1. or each behavior, all individuals seem to exhibit very similar bout duration distributions.

      It is hard not to notice that the distributions for certain states (e.g. Meerkat vigilant/resting state) are noisier than others. It would be interesting to see a comparison of the variance of these distributions as a function of species and/or state to see if the claim in this sentence is statistically supported.

  4. Feb 2024
    1. We hypothesize that sea robins initially developed fin ray-like legs for locomotion. Ancestral organs then evolved limited sensory capability to facilitate manipulation of the visible substrate in search of food. Finally, evolution of sensory papillae further specialized legs to localize and uncover buried prey.

      How much history/ecological data are there available for these species? It could be interesting to pair the phylogenetic patterns with other trait data to explicitly test different evolutionary hypotheses. e.g. is there a relationship with prey type? substrate? depth? biotic diversity?

    2. To test this ability, we developed a simple behavioral assay in which sea robins (Prionotus carolinus) were housed in a controlled tank with either mussels or capsules containing crude or filtered mussel extract buried in sand without visual cues (Fig. 1a, b, Supplementary movie 1). Sea robins alternated between short bouts of swimming and walking (Fig. 1b) and appeared to “scratch” at the sand surface with their legs while walking, which we hypothesized represented sensory behavior.

      Do these behaviors vary at all as a function of what prey are used? I'm guessing you tested squid and crabs with P. carolinus as you did with P. evolans?

      Presumably motile (squid/crabs) prey would give off a different set of cues that less/non-motile prey (mussels)? Specifically, I wonder if there is a tradeoff between chemo- and mechanosensation that is dependent on the amount of movement? Examining this relationship could be a potential route into the neural computations underlying digging behavior...

  5. Jan 2024
    1. We classified individual haploid yeast cells into five different cell cycle stages (M/G1, G1, G1/S, S, G2/M) via unsupervised clustering of the expression of 787 cell-cycle-regulated genes30 in combination with 22 cell-cycle-informative marker genes (Figures 1B, S2 and S3)

      How sparse is this matrix? Given an average of ~1,500 UMIs and ~800 cell-cycle genes, I'm assuming the distribution of expression for the cell-cycle genes is quite distributed/uneven across the cells?

      If very uneven, I wonder if some of the cell cycle designations might be driven by sparsity as opposed to canonical expression signatures associated with each stage? One way to parse this out might be to look at the PC loadings using as input to clustering/UMAP/etc. Do any show signatures of extreme sparsity (e.g. binary expression only one or several genes)?

      More broadly, it might be helpful to report the average # of cell-cycle genes detected in each cell.

  6. Dec 2023
    1. Second, we evaluated whether sequences of codas reflect longer-term trends. To do so, we collected coda triples of the same discrete coda type, and measured the correlation between tempo drift across adjacent pairs. We found a significant positive correlation, compared to a null hypothesis that drift between adjacent pairs is uncorrelated (test: Spearman’s rank-order correlation (two-sided), r(2586) = 0.57, p = 2e−220, 95% CI= [0.54, 0.60], n = 2588). Thus, rubato is distributed across sequences of multiple codas.Finally, we evaluated whether rubato is perceived and controlled by measuring whales’ ability to match their interlocutors’ coda durations when chorusing. We measured the average absolute difference in duration between (1) pairs of overlapping codas from different whales, and (2) pairs of non-overlapping codas of the same discrete coda type. Durations are significantly more closely matched for overlapping codas (0.099s on average) than would be expected under a null hypothesis that chorusing whales match only discrete coda type (which would give a drift of 0.129s on average) (test: permutation test (one-sided), p = 0.0001, n = 908; see Supplementary Section 6).

      I wonder if calculating the autocorrelation of coda durations might be a nice complementary measure here. Autocorrelation could give you a sense of the time scale over which the rubatos decay and, seemingly, might also provide a sense for the timescale of longer-term trends.

      Similarly, I wonder if cross-correlation might be useful for comparing the information quantity shared with interlocutors? The correlation value would be interesting, in addition to any patterns of temporal lag between codas. It might be a comprehensive metric for comparing the similarities of codas over time (as opposed to just looking at overlapping codas).

  7. Nov 2023
    1. cellPLATO performs UMAP on morphological/motility parameters then uses HDBSCAN cluster analysis to define behavioural clusters

      It is hard to tell from the text if HDBSCAN is run on the behavioral parameters or on the UMAP output. If the latter, then I would take extreme caution in thinking about the generalizability of the method given the numerous issues with clustering on nonlinear manifolds. Either way, it would also be helpful to report more information on what the specific morphological/motility parameters are and any normalizations/manipulations that were done on them prior to UMAP and clustering.

      Also, any justification for choosing of UMAP and HDBSCAN would be useful.

    2. UMAPs 1, 2 and 3

      This might be a slightly confusing way to refer to UMAP dimensions (is it accepted that a UMAP dimension = a single 'UMAP'?)

    3. We first investigated two fundamental measurements of cell migration and morphology, namely cell speed and cell area. When comparing conditions, the median migration speed of NK cells on VCAM-1 was 3.48 μm/min and 2.54 μm/min on ICAM-1 (Fig. 2A). The effect size distribution for VCAM-1 was greater, demonstrating statistical significance (p <0.00001) (41), and its distribution did not overlap with the control condition (ICAM-1). NK cells migrating on VCAM-1 also had smaller median cell area (114 μm2) compared with ICAM-1 (175 μm2) (Fig. 2B), with nonoverlapping effect size distribution (p < 0.00001).

      Does donor identify have any effect here? Do the donors differ at all in their speed/area distributions and effect sizes? This would be useful to know here and for many other analyses presented in the manuscript. More broadly, it is a little hard to assess the generalizability of the behavioral results presented here (including the cellPLATO analyses) without knowing more about the influence of experimental variables like this.

  8. Oct 2023
    1. We next adapted an experimental paradigm used to study prey capture in zebrafish for these other species (Mearns et al., 2020). Individual larvae were placed in chambers with prey items (either artemia or paramecia).

      These species are ecologically diverse (e.g. benthic vs. riverine) and likely possess corresponding sensory differences. Given this, it seems possible that their prey capture behaviors may vary as a function of sensory environment. For example, benthic species may display different repertoires in dark conditions.

      Have you tested the effect of varying the sensory environment on prey capture behaviors? Is there intra-specific variation? Are species-specific behaviors invariant? Whatever the outcome, these experiments would help refine the picture of how these behaviors evolved and could lead to more specific sensorineural hypotheses.

  9. Sep 2023
    1. Finally, 14 convergent amino acid substitutions with high confidence among known echolocating mammalian lineages were obtained (Table S3), and these sites were found to be effective in differentiating echolocating and nonecholocating mammals (Fig. 1A; Fig. S2).

      I wonder if it might be worth including a brief comment on the identity of these genes and/or their potential relationships with echolocation? Do they seem to be sensible functional hits? Seeing as the echolocation score appears to work quite well it would be interesting to known a bit more about any molecular context for these predictive loci.

    1. phonotypes

      phenotypes

    2. (1) at least two orthology prediction algorithms agree the human and worm genes are orthologs; (2) the WormBase (version WS270) (Harris et al., 2020) gene description includes either ‘neuro’ or ‘musc’ (this captures variants of neuronal, neural, muscle, muscular etc.);

      I'm wondering about how varying these criteria would effect the number of/which genes were detected.

      For criteria 1, what was the rationale for choosing agreement between >2 algorithms? From Fig 1C, it's hard to tell if there is a relationship between %homology and #of agreeing algorithms. What benefit do you get from using this cutoff? What are the tradeoffs? It might be helpful to include a figure similar to 1C, but including the full set of genes before filtering and to walkthrough the outcomes of different cutoffs.

      Similarly, for criteria 2, what type of/how many hits do you get if you don't select for 'neuro' or 'musc'? Is there any chance that, though you are using a behavioral readout, genes not annotated 'neuro'/'musc' might still contribute to a behavioral phenotype (e.g. via pleiotropy/epistasis)? Would be useful to include a statement of your thinking on this!

  10. Jun 2023
    1. Our study is unique in that instead of using gene expression values directly, we use principal components calculated from gene expression values as our phylogenetic characters. In addition, we remove later principal components that may represent highly heterogeneous cell-specific signal.

      Seems like it would be worth including a direct comparison of Brownian motion to other evolutionary models. The computational overhead shouldn't be very high and, if the comparison supports the use of Brownian motion, it could be a more compelling argument than this.

    2. This dataset was chosen for the uniformity of sampling, consistency of lab and sequencing protocols, the high quality of its cell type annotations, and the abundance of genomic resources available for the five model species. UMI counts were downloaded as CSV files from the NCBI GEO database (GSE146188). A file containing meta-data, including cluster assignment and cell type labels, was obtained from the Broad Institute Single Cell Portal

      I wonder about the effect of scRNA-seq methodology on downstream results here. How do droplet-based approaches (like that used for van Zyl et al.) compare to others (e.g. Smart-seq2) when generating cell type trees? There can substantial differences in the # of genes detected by these methods, with droplet-based approaches often generating datasets with less genes. Does this affect the estimation of rank and/or the outputs of the PCA you use for evolutionary modeling? It seems like this would be an important issue to solve since droplet-based methods are essentially downsampling informative data in a nonrandom way that may bias evolutionary inference.

      TLDR: are cell tree topologies consistent independent of sequencing methodologies?

    1. we designed and constructed a low-cost parallel imaging platform capable of measuring C. elegans growth for 60 individual animals simultaneously over the course of their ≈ 70 hour development at a temporal resolution of 0.001 Hz, resulting in a time series of ≈ 200 observations per animal. In addition to length and area measured automat-ically, egg hatching, and first egg-laying by mature adults are manually recorded.

      Is there a reason for the coarse sampling at 0.001 Hz? Mechanical constraints of the XY plotting robot? Data size constraints? Obviously faster sampling would open up locomotion/behavior as a read out of other possibly interesting, orthogonal phenotypes (with their own developmental modes). Given the video data you are already collecting, it seems like if faster sampling is possible this would be a relatively straightforward - and informative - set of phenotypes to add in?

  11. May 2023
    1. we designed and constructed a low-cost parallel imaging platform capable of measuring C. elegans growth for 60 individual animals simultaneously over the course of their ≈ 70 hour development at a temporal resolution of 0.001 Hz, resulting in a time series of ≈ 200 observations per animal. In addition to length and area measured automat-ically, egg hatching, and first egg-laying by mature adults are manually recorded.

      Is there a reason for the coarse sampling at 0.001 Hz? Mechanical constraints of the XY plotting robot? Data size constraints? Obviously faster sampling would open up locomotion/behavior as a read out of other possibly interesting, orthogonal phenotypes (with their own developmental modes). Given the video data you are already collecting, it seems like if faster sampling is possible this would be a relatively straightforward - and informative - set of phenotypes to add in?

  12. Mar 2023
    1. This inter-annotator variability can be associated with (a) subjective differences of behavior definition among human labelers (b) varying level of annotator’s expertise, and (c) training with-in and across labs.

      What about intra-annotator variability? Seemingly this could also be an important contributor to inter-annotator variation. Might it make sense to compare multiple annotations from a single annotator and use the average as the basis for the ethograph generation?

    2. In order to test inter-annotator variability, we use generated a set of single mouse behavior classifiers for two simple behaviors, left and right turn. We inferred behavior from all four classifies on a large set of videos and compared the two pairs of classifiers from each annotator

      How do these comparisons look for other (potentialyl more 'complex') behaviors? Presumably, turning should be among the more straightforward behaviors for a human to recognize. Do the patterns of inter-annotator agreement change with other behaviors (e.g. grooming) and, if so, would accounting for this increase/descrease performance of the neural network? This is a general risk when using human-based annotations for behavioral classification and it seems to me not easily solved by focussing on a single behavior.

  13. Feb 2023
    1. To assess nuclear density, we measured the average distance from each nucleus to its nearest neighbour

      I wonder if it might be useful to do some analyses of the specific spatial orientation of nuclei across the centrifugation experiments. While it is sensible that density may be the primary signal driving cellularization, it is also interesting to consider that there may be higher order relationships between cell distribution or spatial organization that are predictive of the different outcomes (i.e Flip, lysis, irregular invaginations) since centrifugation is a relatively forceful and disruptive approach. Spatial relationships of nuclei could theoretically be extracted by segmentation/registration and performing some basic statistical comparisons to uncover the relationship between the images (e.g. PCA).