35 Matching Annotations
  1. Last 7 days
    1. The dataset was normalized to 10000 counts per cell, Log1p transformed and filtered to contain2000 highly variable genes. The first important observation is that state-of-the-art approaches,except CPM

      Does marker‑gene expression change monotonically along the CPM geodesic from root to leaf?

    1. You observed that for ambiguous cases or high-levels of missing data, the model tended to predict the PUR population, suggesting it acts as a "default". Since PUR is an admixed population, does this imply the model learns that a state of high uncertainty or mixed/missing signals is most characteristic of admixed genomes in the training set? Could this "default" behavior be mitigated by training with a null or "uncertain" class?

  2. Jul 2025
    1. We employeda random forest classifier [51] in R (4.3.1), training it on these 290 data points with the ’ntree’ parameter setto 100

      What was the accuracy of the random forest classifier used to create the unified state annotations?

    1. Euclidean distances from a residue ofinterest to all disease residue positions in a structure were calculated using amino acid residuecenters of mass as the reference point

      How does this calculation account for protein flexibility, which isn't represented in a single static model?

    1. Catfish is a dominant species, accounting for59% of all food fish sales in 2013, valued at $480.0 mil-lion.

      Sorry but I'm a bit confused here. It looks like $480 million is 59% of the total from 2023 ($819.6 million), but it says for 2013. Am I misreading this and it means that it was 59% of the total from 2013 which also just happens to be 59%?

  3. Jun 2025
    1. highly consistent performance for variants carrying fewer than five mutations

      For the DMS dataset for GFP there is a pretty sharp drop in fitness when 5 mutations is reached. For this model, in order to get accurate predictions for larger numbers of mutations, would much larger DMS datasets be needed. Do you see anyway of computationally advancing this so that it's not limited by experimental measures given the size of these spaces even with a handful of mutations?

    1. se bench-marks are only valid tests if LLMs misunderstandconcepts in ways that mirror human misunder-standings

      Two questions:

      First, do you think approaches using alternative modalities (e.g., visual explanations like https://arxiv.org/abs/2502.19400v2) could help detect the conceptual misalignments you identify?

      Second, as models improve, do you envision needing continuously evolving datasets to detect new potemkin patterns, or do you think there might be an optimal set of evaluation tasks that would be more fundamentally robust to training advances?

  4. May 2025
    1. the binding site and mutation site of the protein-nucleic acid before andafter mutation

      Do you think adding additional sites would cause much of a performance difference?

  5. Mar 2025
    1. Structure-tuning is a fine-tuning technique where a sequence-only model is trained to predict structuretokens – rather than masked amino acids – for each protein residue

      Is this technique novel? This seems like a good approach for adding in other features that can be relatively predicted using sequence only. Are there any plans to do that?

  6. Feb 2025
    1. fthis limit is exceeded, we mark the generation asunsuccessful

      I know this isn't the goal of this paper, but I was curious if you had information about how performance metrics change on multiple video generations. For example, if you generated several video for the same topic and agent, how much do the metric vary around them?

    2. We set a maximum of N attempts where N = 5.

      Is there any particular reasoning for choosing N=5? I'm not familiar with how these evaluations have been done historically so this may be obvious but it's not clear to me why 5 would be natural choice.

    1. 720∑i=0P (resultcA,B = i) · E(ei)

      This sum has 721 elements since 0 is included. I see that 720 of these represent possible elements, does the value of 0 represent a non-viable element?

  7. Jan 2025
    1. encodingstrategies

      Using mean pooling for the larger embeddings for some of the transformer based models makes sense, as that's such a common approach used. However, I was curious if you looked into any other pooling strategies and what impact that had? Or, using a small embedding model, how different performance looks like when the embeddings aren't reduced?

  8. Dec 2024
    1. 320 neurons to 10,420 latent features

      What was the motivation for the value 10,420? Is the ability to extract features pretty consistent (no the actual weights obviously but the behavior of being able to extract meaningful features) as long as the space is large enough?

    2. visualizations primarily focus on features from the fourth layer

      Was the fourth layer chosen arbitrarily or did it have any particularly nice properties for visualization?

    1. assuring that the417discriminator can well capture the sequence-function relationship

      Is the discriminator learning something much more complicated than identifying something like how to identify MDH domains?

    2. convolutional neural network (CNN)-based protein discriminator

      This seems to work well based on the results, was there previous work/concepts that lead to this as the architecture for the discriminator?

  9. Nov 2024
    1. Guider1 andGuider2, were designed to improve the network’s ability to distinguish between differentsequence types. Guider1 consists of a multi-head self-attention mechanism with 8 heads andtwo fully connected layers, while Guider2 is a Gated Recurrent Unit (GRU) with 256neurons

      Sorry if I missed this, but what was the motivation for choosing these particular discriminator models? They seem very reasonable given the results, but I'm curious how these two types of models were chosen based on the structure of the initial problem?

    1. recombination, or a haplotype switch, occurs between two consecutive vertices ai.u and ai+1.u in P ifai.h ̸ = ai+1.h

      Would an "ideal" path be one where there is a single haplotype path that crosses every single \(a_{i}.u\) vertex? This would be something that generally exists for a given sample, but if present this would be the best path?

  10. Oct 2024
    1. PED(X, Y ) = 12 E[infπ(∥d(X, X′) − d(Y, X′)π ∥p)]+ 12 E[infπ(∥d(Y, Y ′) − d(X, Y ′)π ∥

      Sorry if this is clear, but I'm a little unclear on the notation. Is X the input data (so empirical results from a scRNA-seq experiment) and Y the generated dist? If so then are X' and Y' subsets of the respective distributions?

  11. Jul 2024
    1. x−i,j (t), which is the expression of gene j (the expres-sion of all other genes is masked)

      Is this a vector with only a value at position j? So a vector of size N with only position j having a value set, hence being different than x_{j}(t)?

  12. May 2024
    1. e green and light blue clusters are on one side, and the other colors(especially the dark blue and magenta) are on the other side of the hole.

      It's hard to tell exactly which part of the structure is being referenced (at least for me). It might be helpful to add an annotation like a circle to show which area is being discussed.

    1. xi,j − ̃x

      The indexing of x_{ij} doesn't seem to be a unique element but a pair (u_{ij}, s_{ij}). Is this loss function calculating the difference between the spliced and unspliced differences combined?

  13. Feb 2024
    1. scGPT v1 outperformed the scGPT model overall, raising the issue146of the need for increasing the size of pre-training datasets for this task

      Wasn't scGPT v1 which out performed scGPT trained on a smaller pre-training data set?

    1. The fact that thisnarrative captured so much attention despite a complete lack of supporting evidence promptsus to reflect on how our biases shape our interpretation of data, and how extreme differencesin believing people based on where they work can lead to incorrect and harmful conclusions.Here, we are reflecting on our experiences, and we invite readers to do the same.

      Really interesting article!

      Given the impact this had do you feel there are changes or criticisms needed around the review and publication process of the Bloom results? I'm also curious if you have any thoughts on how pre-print and open science can do a better job with contentious results and discussions around them.

  14. Oct 2023
    1. e also found associations top53, telomere maintenance, and cell fate within 1 Mbp of our top 25 loci of interest. Ourtop 25 loci also have links to cancer and height or body size, though these prevalent diseasesand biomarkers are of course heavily studied and consequently commonly annotated, and sowe cannot know whether their appearance is simply due to their frequency

      Is this 1Mbp in either direction of a loci of interest? Just binning the human genome by 25 points gives about 1.7% of the genome within 1Mbp of these uniform bins. Depending on what percent of genes are associated with the traits of interest that could be very rare, or fairly common. Is there a way of viewing how impactful this result is in comparison to the size of the genome annotated as relevant to these traits?