Hypothesis

38 Matching Annotations

Last 7 days
arxiv.org arxiv.org

Guiding Generative Models for Protein Design: Prompting, Steering and Aligning

1
1. erin.mcgeever 30 Nov 2025
  
  in Public
  
  Guiding Generative Models for Protein Design: Prompting, Steering and Aligning
  
  You discuss preference-based methods (DPO) being used to bias toward stability or reduce MHC-I epitopes. How sensitive are these methods to the quality and balance of the preference pairs? What happens when preferences are noisy or when one class heavily outweighs the other?
Visit annotations in context

Annotators

erin.mcgeever

URL

arxiv.org/abs/2511.21476
www.biorxiv.org www.biorxiv.org

Using General Algebra to Model the Directed Evolution of an Asexual Population

1
1. erin.mcgeever 30 Nov 2025
  
  in Arcadia Science
  
  Using General Algebra to Model the Directed Evolution of an Asexual Population
  
  You mention that "defining a specific algebra that accurately represents a particular trait remains an important task" but don't provide concrete methodology. What heuristics or systematic approaches would you recommend for researchers trying to map their specific biological trait to an appropriate algebraic structure?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.11.22.689897v2
Sep 2025
arxiv.org arxiv.org

Twin Peaks: Dual-Head Architecture for Structure-Free Prediction of Protein-Protein Binding Affinity and Mutation Effects

1
1. erin.mcgeever 30 Sep 2025
  
  in Public
  
  In contrast, our model maintains consistent performance across diverse proteinfamilies, including those absent from the training data
  
  Your ECOD-separated held-out set shows Pearson correlation of 0.299, while the COVID-19 benchmark achieves 0.496—a 65% difference. You attribute the held-out performance to chain concatenation (70% of those complexes), but concatenation affects 28.5% of your entire dataset. Could this performance gap indicate the model learns family-specific patterns rather than fully generalizable principles?
Visit annotations in context

Annotators

erin.mcgeever

URL

arxiv.org/pdf/2509.22950
Aug 2025
www.biorxiv.org www.biorxiv.org

Uncovering Developmental Lineages from Single-cell Data with Contrastive Poincaré Maps

1
1. erin.mcgeever 31 Aug 2025
  
  in Arcadia Science
  
  The dataset was normalized to 10000 counts per cell, Log1p transformed and filtered to contain2000 highly variable genes. The first important observation is that state-of-the-art approaches,except CPM
  
  Does marker‑gene expression change monotonically along the CPM geodesic from root to leaf?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.08.22.671789v1.full.pdf
www.biorxiv.org www.biorxiv.org

A Transparent and Generalizable Deep Learning Framework for Genomic Ancestry Prediction

1
1. erin.mcgeever 31 Aug 2025
  
  in Arcadia Science
  
  You observed that for ambiguous cases or high-levels of missing data, the model tended to predict the PUR population, suggesting it acts as a "default". Since PUR is an admixed population, does this imply the model learns that a state of high uncertainty or mixed/missing signals is most characteristic of admixed genomes in the training set? Could this "default" behavior be mitigated by training with a null or "uncertain" class?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.08.26.672448v1
Jul 2025
www.biorxiv.org www.biorxiv.org

Integrated flexible DNA methylation-chromatin segmentation modeling enhances epigenomic state annotation

1
1. erin.mcgeever 31 Jul 2025
  
  in Arcadia Science
  
  We employeda random forest classifier [51] in R (4.3.1), training it on these 290 data points with the ’ntree’ parameter setto 100
  
  What was the accuracy of the random forest classifier used to create the unified state annotations?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.07.25.666820v2.full.pdf
www.biorxiv.org www.biorxiv.org

A knowledge-based distance metric highlights underperformance of variant effect predictors on gain-of-function missense variants

1
1. erin.mcgeever 31 Jul 2025
  
  in Arcadia Science
  
  Euclidean distances from a residue ofinterest to all disease residue positions in a structure were calculated using amino acid residuecenters of mass as the reference point
  
  How does this calculation account for protein flexibility, which isn't represented in a single static model?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.07.23.666325v1.full.pdf
www.biorxiv.org www.biorxiv.org

Machine Learning and Network Analysis to Predict Hypothetical Protein Functions of Aeromonas hydrophila

1
1. erin.mcgeever 29 Jul 2025
  
  in Arcadia Science
  
  Catfish is a dominant species, accounting for59% of all food fish sales in 2013, valued at $480.0 mil-lion.
  
  Sorry but I'm a bit confused here. It looks like $480 million is 59% of the total from 2023 ($819.6 million), but it says for 2013. Am I misreading this and it means that it was 59% of the total from 2013 which also just happens to be 59%?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.07.22.666223v1.full.pdf
Jun 2025
www.biorxiv.org www.biorxiv.org

Accelerating Virtual Directed Evolution of Proteins via Reinforcement Learning

1
1. erin.mcgeever 30 Jun 2025
  
  in Public
  
  highly consistent performance for variants carrying fewer than five mutations
  
  For the DMS dataset for GFP there is a pretty sharp drop in fitness when 5 mutations is reached. For this model, in order to get accurate predictions for larger numbers of mutations, would much larger DMS datasets be needed. Do you see anyway of computationally advancing this so that it's not limited by experimental measures given the size of these spaces even with a handful of mutations?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.06.25.661516v1.full.pdf
arxiv.org arxiv.org

Potemkin Understanding in Large Language Models

1
1. erin.mcgeever 30 Jun 2025
  
  in Public
  
  se bench-marks are only valid tests if LLMs misunderstandconcepts in ways that mirror human misunder-standings
  
  Two questions:
  
  First, do you think approaches using alternative modalities (e.g., visual explanations like https://arxiv.org/abs/2502.19400v2) could help detect the conceptual misalignments you identify?
  
  Second, as models improve, do you envision needing continuously evolving datasets to detect new potemkin patterns, or do you think there might be an optimal set of evaluation tasks that would be more fundamentally robust to training advances?
Visit annotations in context

Annotators

erin.mcgeever

URL

arxiv.org/pdf/2506.21521
May 2025
arxiv.org arxiv.org

Topological Machine Learning for Protein-Nucleic Acid Binding Affinity Changes Upon Mutation

1
1. erin.mcgeever 31 May 2025
  
  in Public
  
  the binding site and mutation site of the protein-nucleic acid before andafter mutation
  
  Do you think adding additional sites would cause much of a performance difference?
Visit annotations in context

Annotators

erin.mcgeever

URL

arxiv.org/pdf/2505.22786
www.biorxiv.org www.biorxiv.org

From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models

1
1. erin.mcgeever 30 May 2025
  
  in Arcadia Science
  
  Sketch proof of lower variance under OU model
  
  In addition to this theory calculation, would it be possible with your data to look at empirical variance and that it behaves as expected?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.05.20.655154v1
www.biorxiv.org www.biorxiv.org

Inferring genotype-phenotype maps using attention models

1
1. erin.mcgeever 02 May 2025
  
  in Arcadia Science
  
  Inferring genotype-phenotype maps using attention models
  
  Our group read this paper as part of our journal club and found the approach of using attention very interesting. We dug into a couple aspects that we wanted to explore further and released a notebook publication here: https://research.arcadiascience.com/pub/observation-geno-pheno-attention/release/1/, thanks for a great paper that inspired us to explore this topic further!
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.04.11.648465v1
Mar 2025
www.biorxiv.org www.biorxiv.org

Distilling Structural Representations into Protein Sequence Models

1
1. erin.mcgeever 01 Mar 2025
  
  in Arcadia Science
  
  Structure-tuning is a fine-tuning technique where a sequence-only model is trained to predict structuretokens – rather than masked amino acids – for each protein residue
  
  Is this technique novel? This seems like a good approach for adding in other features that can be relatively predicted using sequence only. Are there any plans to do that?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.11.08.622579v2.full.pdf
Feb 2025
arxiv.org arxiv.org

2502.19400v1.pdf

2
1. erin.mcgeever 28 Feb 2025
  
  in Public
  
  fthis limit is exceeded, we mark the generation asunsuccessful
  
  I know this isn't the goal of this paper, but I was curious if you had information about how performance metrics change on multiple video generations. For example, if you generated several video for the same topic and agent, how much do the metric vary around them?
2. erin.mcgeever 28 Feb 2025
  
  in Public
  
  We set a maximum of N attempts where N = 5.
  
  Is there any particular reasoning for choosing N=5? I'm not familiar with how these evaluations have been done historically so this may be obvious but it's not clear to me why 5 would be natural choice.
Visit annotations in context

Annotators

erin.mcgeever

URL

arxiv.org/pdf/2502.19400
arxiv.org arxiv.org

2501.18009v1.pdf

1
1. erin.mcgeever 03 Feb 2025
  
  in Public
  
  720∑i=0P (resultcA,B = i) · E(ei)
  
  This sum has 721 elements since 0 is included. I see that 720 of these represent possible elements, does the value of 0 represent a non-viable element?
Visit annotations in context

Annotators

erin.mcgeever

URL

arxiv.org/pdf/2501.18009v1
Jan 2025
www.biorxiv.org www.biorxiv.org

Benchmarking the coding strategies of non-coding mutations on sequence-based downstream tasks with machine learning

1
1. erin.mcgeever 03 Jan 2025
  
  in Arcadia Science
  
  encodingstrategies
  
  Using mean pooling for the larger embeddings for some of the transformer based models makes sense, as that's such a common approach used. However, I was curious if you looked into any other pooling strategies and what impact that had? Or, using a small embedding model, how different performance looks like when the embeddings aren't reduced?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2025.01.01.631025v1.full.pdf
Dec 2024
arxiv.org arxiv.org

2412.12101v1.pdf

2
1. erin.mcgeever 20 Dec 2024
  
  in Arcadia Science
  
  320 neurons to 10,420 latent features
  
  What was the motivation for the value 10,420? Is the ability to extract features pretty consistent (no the actual weights obviously but the behavior of being able to extract meaningful features) as long as the space is large enough?
2. erin.mcgeever 19 Dec 2024
  
  in Public
  
  visualizations primarily focus on features from the fourth layer
  
  Was the fourth layer chosen arbitrarily or did it have any particularly nice properties for visualization?
Visit annotations in context

Annotators

erin.mcgeever

URL

arxiv.org/pdf/2412.12101
www.biorxiv.org www.biorxiv.org

Binary Discriminator Facilitates GPT-based Protein Design

2
1. erin.mcgeever 19 Dec 2024
  
  in Public
  
  assuring that the417discriminator can well capture the sequence-function relationship
  
  Is the discriminator learning something much more complicated than identifying something like how to identify MDH domains?
2. erin.mcgeever 19 Dec 2024
  
  in Public
  
  convolutional neural network (CNN)-based protein discriminator
  
  This seems to work well based on the results, was there previous work/concepts that lead to this as the architecture for the discriminator?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2023.11.20.567789v3.full.pdf
www.biorxiv.org www.biorxiv.org

71208679

1
1. erin.mcgeever 01 Dec 2024
  
  in Arcadia Science
  
  Hyperparameter optimization was 220performed using a hyperband tune
  
  How computationally intensive was it to solve for all of the hyperparams for this model?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.11.20.624609v1.full.pdf
Nov 2024
www.biorxiv.org www.biorxiv.org

A Large-Scale Foundation Model for RNA Function and Structure Prediction

2
1. erin.mcgeever 30 Nov 2024
  
  in Arcadia Science
  
  masking ratio 0.15
  
  Do different values for percent of tokens masked have much of an impact on model performance?
2. erin.mcgeever 30 Nov 2024
  
  in Arcadia Science
  
  average cluster size is just 2.2
  
  Was there much variation around this?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.11.28.625345v1.full.pdf
www.biorxiv.org www.biorxiv.org

20985035

1
1. erin.mcgeever 01 Nov 2024
  
  in Arcadia Science
  
  Guider1 andGuider2, were designed to improve the network’s ability to distinguish between differentsequence types. Guider1 consists of a multi-head self-attention mechanism with 8 heads andtwo fully connected layers, while Guider2 is a Gated Recurrent Unit (GRU) with 256neurons
  
  Sorry if I missed this, but what was the motivation for choosing these particular discriminator models? They seem very reasonable given the results, but I'm curious how these two types of models were chosen based on the structure of the initial problem?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.10.29.620982v1.full.pdf
www.biorxiv.org www.biorxiv.org

Integer programming framework for pangenome-based genome inference

2
1. erin.mcgeever 01 Nov 2024
  
  in Arcadia Science
  
  recombination, or a haplotype switch, occurs between two consecutive vertices ai.u and ai+1.u in P ifai.h ̸ = ai+1.h
  
  Would an "ideal" path be one where there is a single haplotype path that crosses every single $a_{i}.u$ vertex? This would be something that generally exists for a given sample, but if present this would be the best path?
2. erin.mcgeever 01 Nov 2024
  
  in Arcadia Science
  
  ai.u
  
  Just to verify I'm understanding this notation, $a_{i}.u$ is being used like an accessor for components of the tuple $a_{i}$?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.10.27.620212v1.full.pdf
Oct 2024
www.biorxiv.org www.biorxiv.org

Generative Models Validation via Manifold Recapitulation Analysis

1
1. erin.mcgeever 31 Oct 2024
  
  in Arcadia Science
  
  PED(X, Y ) = 12 E[infπ(∥d(X, X′) − d(Y, X′)π ∥p)]+ 12 E[infπ(∥d(Y, Y ′) − d(X, Y ′)π ∥
  
  Sorry if this is clear, but I'm a little unclear on the notation. Is X the input data (so empirical results from a scRNA-seq experiment) and Y the generated dist? If so then are X' and Y' subsets of the respective distributions?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.10.23.619602v2.full.pdf
Jul 2024
www.biorxiv.org www.biorxiv.org

Inference of gene regulatory networks for overcoming low performance in real-world data

2
1. erin.mcgeever 30 Jul 2024
  
  in Arcadia Science
  
  P recisions
  
  In this case does s denote tuning to how results are classified when calling true and false positives and negatives? Or weight terms in the F-measure score it self?
2. erin.mcgeever 30 Jul 2024
  
  in Arcadia Science
  
  x−i,j (t), which is the expression of gene j (the expres-sion of all other genes is masked)
  
  Is this a vector with only a value at position j? So a vector of size N with only position j having a value set, hence being different than x_{j}(t)?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.07.16.603684v2.full.pdf
May 2024
www.biorxiv.org www.biorxiv.org

Topological Structures in the Space of Treatment-Naïve Patients With Chronic Lymphocytic Leukemia

2
1. erin.mcgeever 24 May 2024
  
  in Arcadia Science
  
  The black arrow highlights the longest barcode.
  
  It might be easier to see if the longest barcode was in a different color or had a dashed line overlayed on top of it.
2. erin.mcgeever 24 May 2024
  
  in Arcadia Science
  
  e green and light blue clusters are on one side, and the other colors(especially the dark blue and magenta) are on the other side of the hole.
  
  It's hard to tell exactly which part of the structure is being referenced (at least for me). It might be helpful to add an annotation like a circle to show which area is being discussed.
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.05.16.593927v1.full.pdf
www.biorxiv.org www.biorxiv.org

Inferring Single-Cell RNA Kinetics from Various Biological Priors

1
1. erin.mcgeever 23 May 2024
  
  in Arcadia Science
  
  xi,j − ̃x
  
  The indexing of x_{ij} doesn't seem to be a unique element but a pair (u_{ij}, s_{ij}). Is this loss function calculating the difference between the spliced and unspliced differences combined?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.05.21.595179v1.full.pdf
Feb 2024
arxiv.org arxiv.org

2402.14583.pdf

1
1. erin.mcgeever 29 Feb 2024
  
  in Arcadia Science
  
  if di is equal to 1984-01-01, then U encompasses all papers published after diuntil 1989-01-01
  
  This part is assuming that t has been set to 5, correct?
Visit annotations in context

Annotators

erin.mcgeever

URL

arxiv.org/pdf/2402.14583.pdf
www.biorxiv.org www.biorxiv.org

Untitled document

1
1. erin.mcgeever 29 Feb 2024
  
  in Arcadia Science
  
  scGPT v1 outperformed the scGPT model overall, raising the issue146of the need for increasing the size of pre-training datasets for this task
  
  Wasn't scGPT v1 which out performed scGPT trained on a smaller pre-training data set?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2023.09.08.555192v5.full.pdf
www.biorxiv.org www.biorxiv.org

A critical reexamination of recovered SARS-CoV-2 sequencing data

1
1. erin.mcgeever 27 Feb 2024
  
  in Arcadia Science
  
  The fact that thisnarrative captured so much attention despite a complete lack of supporting evidence promptsus to reflect on how our biases shape our interpretation of data, and how extreme differencesin believing people based on where they work can lead to incorrect and harmful conclusions.Here, we are reflecting on our experiences, and we invite readers to do the same.
  
  Really interesting article!
  
  Given the impact this had do you feel there are changes or criticisms needed around the review and publication process of the Bloom results? I'm also curious if you have any thoughts on how pre-print and open science can do a better job with contentious results and discussions around them.
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2024.02.15.580500v2.full.pdf
Oct 2023
www.biorxiv.org www.biorxiv.org

A phylogenetic method linking nucleotide substitution rates to rates of continuous trait evolution

1
1. erin.mcgeever 13 Oct 2023
  
  in Arcadia Science
  
  e also found associations top53, telomere maintenance, and cell fate within 1 Mbp of our top 25 loci of interest. Ourtop 25 loci also have links to cancer and height or body size, though these prevalent diseasesand biomarkers are of course heavily studied and consequently commonly annotated, and sowe cannot know whether their appearance is simply due to their frequency
  
  Is this 1Mbp in either direction of a loci of interest? Just binning the human genome by 25 points gives about 1.7% of the genome within 1Mbp of these uniform bins. Depending on what percent of genes are associated with the traits of interest that could be very rare, or fairly common. Is there a way of viewing how impactful this result is in comparison to the size of the genome annotated as relevant to these traits?
Visit annotations in context

Annotators

erin.mcgeever

URL

biorxiv.org/content/10.1101/2023.10.04.560937v1.full.pdf

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL