Hypothesis

42 Matching Annotations

Jun 2024
www.biorxiv.org www.biorxiv.org

ESPRESSO: Spatiotemporal omics based on organelle phenotyping

4
1. keith.cheveralls 28 Jun 2024
  
  in Arcadia Science
  
  Asphenotypic changes during keratinocyte differentiation span across both space and time, this applicationperfectly showcases the power of ESPRESSO spatiotemporal omics in identifying not only the presenceof distinct phenotypes, but also providing insights about their spatiotemporal evolution.
  
  Again, I think a baseline here would make this claim more convincing. In other words, what aspects of the differentiation dynamics described here could only be captured by ESPRESSO?
2. keith.cheveralls 28 Jun 2024
  
  in Arcadia Science
  
  As shown in Figures 1c and 1d, GMM clustering easily identified the cell type-specific phenotypes andallowed the quantification of properties of interest in their organelle networ
  
  It would be helpful to compare this result to some baseline obtained from an established method like cell painting. In other words, can existing techniques also readily distinguish these cell types?
3. keith.cheveralls 28 Jun 2024
  
  in Arcadia Science
  
  to increase the acquisition speed 16-fold
  
  it would be helpful to also provide some absolute measures of throughput here, such as how many FOVs of a given size and resolution can be imaged per unit time.
4. keith.cheveralls 28 Jun 2024
  
  in Arcadia Science
  
  organelle properties are normalized, selected and reduced in dimensionality byPacMAP35, generating low-dimensional embeddings that encode the high-dimensional organelleproperties of each cell. A Gaussian Mixture Model (GMM36) clustering algorithm is then applied
  
  It sounds like the clustering was done after the embedding step; that is, using the low-dimensional embeddings from PacMAP, rather than the original feature matrix. If so, I'm worried that this will result in inaccurate clusters, as PacMAP (like all such methods) does not perfectly preserve the relationships between the original high-dimensional feature vectors.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.06.13.598932v1.full.pdf
May 2024
www.biorxiv.org www.biorxiv.org

Illuminating the functional landscape of the dark proteome across the Animal Tree of Life through natural language processing models

4
1. keith.cheveralls 31 May 2024
  
  in Arcadia Science
  
  This suggests that genes not annotated by eggNOG-mapper are probablyproteins that either catalyze some protein, RNA, or DNA chemical modification, or bind to othermolecules, form macromolecular complexes, and are involved in the regulation of essentialprocesses for animals
  
  This is a bit confusing; it's so vague and general that it sounds like it could describe almost any protein.
2. keith.cheveralls 31 May 2024
  
  in Arcadia Science
  
  We therefore considered this evidence assupportive for not filtering
  
  I'm not sure that two examples can constitute evidence for or against filtering. Is it possible to use a ground-truth dataset to make this kind of filtering/no-filtering decision with more confidence?
3. keith.cheveralls 31 May 2024
  
  in Arcadia Science
  
  We show thatprotein language model-based annotations outperformed deep learning-based ones
  
  This is a bit confusing, because protein language models are a kind of deep learning model. It would help to clarify what "deep-learning-based models" refers to in this context.
4. keith.cheveralls 31 May 2024
  
  in Arcadia Science
  
  with a reliability index of 1
  
  What does a reliability index value of "1" mean?
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.02.28.582465v2.full.pdf
Mar 2024
www.biorxiv.org www.biorxiv.org

AlphaFind: Discover structure similarity across the entire known proteome

7
1. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  The search starts with acytochrome from corn (Zea Mays), and within the first 50 hits,we find similar structures originating from various animals(fish, eagle, mouse, cat, horse, etc.)
  
  The phrase "within the first 50 hits" feels tantalizing. What else appeared among the top hits? Were there hits that were surprising or potentially false positives? And were there proteins that should have appeared among the top hits, but didn't?
2. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  Here, AlphaFind shows us (Figure 2)that highly similar hemoglobin structures can also be found inother species.
  
  Again, it would be really great to quantify what "highly similar" means here.
3. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  in an average of 7 seconds withnegligible back-end load
  
  It would be helpful to mention details about the hardware here, as the time cost is hard to interpret without that information.
4. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  Therefore, high occurrence of unstructuredregions in the input structure can bias the search. Thisphenomenon is more prevalent in coiled-coil structures but canbe also observed in some small structures
  
  Again, it would be great to quantify this and/or to discuss some examples of proteins for which this is a real problem.
5. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  We tested AlphaFind on a diverse set of proteins varying insize, complexity, and quality. AlphaFind provided biologicallyrelevant results even for small, large and lower qualitystructures. When AlphaFind did not offer structures withhigh TM-scores, the results remained biologically relevant.
  
  I think these claims would be more convincing if they could be quantified and if the performance of AlphaFind could be compared to other existing tools, if possible.
6. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  he latter two methodsin conjunction with (10) establish the basis of the indexingsolution presented in here
  
  What is the relationship between this approach and approaches to indexing or similarity-based lookup used by common vector databases?
7. keith.cheveralls 29 Mar 2024
  
  in Arcadia Science
  
  In the offline phase, we first extract semantic information fromraw cif files into vector embeddings,
  
  It would be helpful to explain in more detail how this is done, since it seems like a crucial step.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2024.02.15.580465v1.full.pdf
Feb 2024
www.biorxiv.org www.biorxiv.org

Universal Cell Embeddings: A Foundation Model for Cell Biology

9
1. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  Every376chromosome group is combined into a single sequence, with chromosome order randomly deter-377mined.
  
  It's surprising to me that chromosomes are randomly ordered; this feels a bit like the equivalent of randomly shuffling the clauses of a sentence. It would be helpful to explain this choice or discuss reasons why it might or might not be a concern.
2. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  Start tokens are unique to each chromosome and species
  
  This feels confusing: if start tokens are unique to species, how is UCE able to generate embeddings for datasets from species it was not trained on?
3. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  However, beyond that,238the effect levels off (Supplementary Fig. 6). This is expected due to the curse of dimensionality239in high-dimensional spaces and the variability in the level of ontological refinement in different240branches of the ontology
  
  This feels awfully hand-wavy. I can understand that a leveling off is expected at some distance, by why at 5 hops in particular?
4. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  or all three species we observed204very high agreement between independent annotations of the novel species’ data and the nearest205cell type centroids in the IMA
  
  It would be helpful to mention here what these three species were and how distantly related they are to the eight species on which UCE was trained.
5. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  We train185a simple logistic classifier on the UCE embeddings of the Immune Cell Atlas [38], and then apply186the classifier to B cell embeddings from Tabula Sapiens v2. This classifier accurately classifies the187Tabula Sapiens v2 cells as memory and naive B cell
  
  This result feels hard to interpret without a comparison to other approaches or models. In other words, are embeddings from UCE uniquely able to capture the information required for this classification task?
6. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  UCE embeddings174distinctly separate cell types more effectively than other methods tested in zero-shot
  
  This feels a bit subjective; I think this claim would be more convincing if it were grounded in a quantitative measure of clustering accuracy.
7. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  We compared several methods and found that UCE substantially out-167performs the next best method Geneformer by 9.0% on overall score, 10.6% on biological conser-168vation score, and 7.4% on batch correction scor
  
  If possible, it would be helpful to contextualize these relative increases in performance, particular given that none of the models listed in Supp Table 1 appear to significantly outperform using the log-normalized raw data. (the "overall score" is 0.74 for UCE and 0.72 for "log-normalized expression"). Without more context, it's hard to know what this means, whether it should be surprising, whether it reflects limitations of the metrics or of the models, etc.
  
  Also, I think it would be more transparent to mention here that there are two metrics for which UCE does not outperform other models (the ARI score and the "ASW (batch) score").
8. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  Genes belonging to the same chromosome are grouped111together by placing them in between special tokens and are then sorted by genomic location
  
  It would be helpful to understand the context and motivation for this design decision. In other words, what aspects of UCE's performance depend (or are suspected to depend) on including information about genomic position?
9. keith.cheveralls 29 Feb 2024
  
  in Arcadia Science
  
  This allows UCE to meaningfully99represent any gene, from any species, regardless of whether the species had appeared in the training100data
  
  It would be good to clarify here if "training data" refers to the data used to train the protein language model or UCE itself.
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2023.11.28.568918v1.full.pdf
www.biorxiv.org www.biorxiv.org

High-volume, label-free imaging for quantifying single-cell dynamics in induced pluripotent stem cell colonies

6
1. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  When calculating doubling times based on mitotic events in the remaining cells that were470not undergoing apoptosis (Figure 6D), the doubling times are similar to those for unexposed cells
  
  Again, it's great to see something like this quantified so carefully!
2. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  Higher intensities of excitation light exposure led to significant cell death that was apparent by manual447inspection of images, and by the reduced relative cell numbers as shown by the green lines
  
  It seems surprising that there is such a big difference from 1x to 1.4x. Is this by design? (was the 1x intensity chosen from prior experience or experiments to be as high as possible without inhibiting cell division?)
3. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  s shown in Figure 6A,438exposure of cells to the minimal intensity of fluorescence excitation light (56 mJ/cm2 referred to as 1x)
  
  It's super helpful that an absolute measure of intensity is provided here, but it would be great to also include the wavelength (or range of wavelengths) of the excitation light.
4. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  Individual cells in the center of the colony tend to move less than cells near the
  
  Is it possible to correct this for the fact that, as the colony itself expands, cells near the edge necessarily must move more than cells in the center (which will not move at all, if the colony as a whole is stationary)?
5. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  Average mitotic rates do not appear to depend on431distance from the colony edge (Figure 5D) and do not correlate with the increased cell motion
  
  It's great to see a subtle detail like this quantified so carefully! Is this consistent with prior work (if there is any)?
6. keith.cheveralls 01 Feb 2024
  
  in Arcadia Science
  
  The233manual data was paired to the 3D U-Net inferenced results using a linear sum assignment routine with234the cost function being proportional to the distance between mitosis events in space with an empirically235determined spatial cutoff of 15 pixels and a time cutoff of 6 frames.
  
  This is a bit hard to understand. How is distance in time measured? (i.e. the difference between the time of mitosis onset in the manual annotations and the segmentation results)
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2023.09.29.558451v2.full.pdf
Jan 2024
www.biorxiv.org www.biorxiv.org

High-volume, label-free imaging for quantifying single-cell dynamics in induced pluripotent stem cell colonies

4
1. keith.cheveralls 31 Jan 2024
  
  in Arcadia Science
  
  are 1 to 2 to 20 to 20.
  
  how were these weights chosen? And is it correct to think of these weights as a kind of correction for the class imbalance between non-mitotic and mitotic nuclei?
2. keith.cheveralls 31 Jan 2024
  
  in Arcadia Science
  
  in each of the 5202frames before division, and as class 3 (one or two daughter cells) in each of the 3 frames after division.
  
  how were these numbers of frames chosen?
3. keith.cheveralls 31 Jan 2024
  
  in Arcadia Science
  
  The binary masks are created by151inferencing with 3 instances of the same model and thresholded by 2 (as explained in more detail in152Supplemental Figure 1B and 1C.
  
  This is a little confusing, especially the "thresholded by 2" part, and I didn't find the caption in Supp Fig 1 to be that much clearer. It would help to explain the origin of the variability in the predictions (in other words, what is an "instance of the same model"?)
4. keith.cheveralls 31 Jan 2024
  
  in Arcadia Science
  
  We trained a 2D U-Net to segment single-cell nuclei from phase contrast images starting with a pre-136trained U-Net (14) as our initial network
  
  It would be great to mention what kind of images the pre-trained model trained on. Do you have a sense for how important it is that pre-training be done on similar images? (and what kinds of similarity are most important: cell type, imaging modality, magnification, etc)
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2023.09.29.558451v2.full.pdf
Dec 2023
www.biorxiv.org www.biorxiv.org

A Pooled Cell Painting CRISPR Screening Platform Enables de novo Inference of Gene Function by Self-supervised Deep Learning

8
1. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  Wereport the number of gene KOs with an AU-ROC > 0.55
  
  Why 0.55 and not 0.5?
2. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  We trained a ViT-small model with patch size = 8, number of global crops = 2, number of local crops = 8on 4 nodes x 8 NVIDIA-V100 GPUs per node (32 GPUs) for 100 epochs
  
  would it be possible (and meaningful) to mention how many GPU hours this required? Also, some more details would be helpful for non-ML experts; e.g., why the choice of 100 epochs, was a stopping criterion used, which epoch was used for the final analysis/results, etc.
3. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  we re-parameterized the first layer ofthe model as:
  
  This equation is a bit opaque; it would be helpful to explain what the superscripts and subscripts of theta mean.
4. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  (both ~1-1.5million cell tile images)
  
  Does the 1-1.5m figure mean single-cell images? or FOVs? It would also be super helpful to comment on how this dataset size was chosen. Was it the minimum amount of data required for this level of performance? More generally, did you do any experiments varying the quantity or diversity of the training data?
5. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  The superior performance of CP-DINO 1640 is unlikely a result oftrivial memorization, as the 1640-genes druggable genome library and 300-genes MoA library sharesimilar numbers of overlapping genes with the 124 PoC library (30 and 26 genes respectively).
  
  I think to make this claim more convincing, it would be important to show how many genes in the 1640 library are very similar to (rather than merely identical to) genes in the 124 PoC library ("very similar" is obviously subjective but I'm thinking of homologs/paralogs or genes that are components of the same complex or pathway)
6. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  nti-phospho-S6 (pS6) antibodywith AlexaFluor 750-conjugated secondary antibody was used in the 6th channel as an establishedbiomarker
  
  it would be helpful to mention here what cellular structures of features the pS6 antibody labels, and also (for the non-biologists among us) what mTORC1 is
7. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  Nevertheless, CP-DINO 300 trained on bioimaging data yielded a moreinformative embedding that has higher median prediction accuracy than the other two models (Fig.S4a-b), and correctly classified more perturbations with better accuracy (Fig. 4c). CP-DINO 300 alsorecovered more known biological relationships from StringDB as measured by cosine similarity of theaggregate gene KO embeddings (Methods) than the other two models (Fig. 4d)
  
  It's awesome to see such an explicit and direct comparison of classic feature engineering with modern unsupervised ML models!
  
  If possible it would be great to quantify how much better the DINO-based approach is; Figures 4a-d are a bit hard to understand at first and obscure the relative differences; Fig 4d in particular doesn't give the impression that DINO is that much better than the CellStats approach (even though the 0.12 of DINO vs the 0.09 of CellStats is actually a 30% improvement!). Also, some measure of statistical significance would be helpful; in particular, how likely is it that the 0.09 vs 0.12 in Fig 4d is reproducible?
8. keith.cheveralls 22 Dec 2023
  
  in Arcadia Science
  
  phenotypic clustering of genes by their annotated mechanism of action,
  
  It feels like there's a typo here somewhere, since genes don't really have a "mechanism of action" and the screen here does not involve compounds but rather gene KOs. Is the idea to use the phenotype of the KOs to cluster genes by the MoA of the compounds that target them? In any case, the reference to MoAs here is doubly confusing because the clustering shown in Fig 4E appears to capture cellular localization (and also pathway membership?), but I couldn't see any discussion of the clustering relative to the MoAs of the compounds used to select the 300 genes
Visit annotations in context

Annotators

keith.cheveralls

URL

biorxiv.org/content/10.1101/2023.08.13.553051v3.full.pdf

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL