10,000 Matching Annotations
  1. May 2026
    1. On 2017-06-04 22:18:51, user Nathan Watson-Haigh wrote:

      Unfortunately, it appears from the code that all lines beginning with "#" including all GFF3 pragma lines are discarded in the output. In addition, it reads the whole GFF3 file into memory to do the sort. So for large files, memory might be an issue.

      Please also consider using "#!/usr/bin/env perl" for the shebang line so non-system Perl can be used more easily.

    1. On 2017-06-04 15:52:39, user ANTHONY IVES wrote:

      Two quick comments:

      1. Yeah, these R2s aren't exactly simple, but the R code does everything for you, so you don't have to look under the hood.

      2. As for noisiness, well, R2_ls has about the same as the standard R2 from OLS, and R2_lr has less. I didn't realize how noisy the OLS R2 is until playing with simulations. After simulations, I will never think of any R2 as anything other than an estimator of a variance.

      Thanks for your collective interest!

    1. On 2017-05-08 23:36:43, user Titus Brown wrote:

      This paper introduces the squeakr system for exact and inexact k-mer counting.

      The paper is well written and I have been able to obtain and execute<br /> the software, although I have not spent any time trying to reproduce<br /> the benchmarks.

      The long and glorious history of approximate and exact k-mer counting<br /> in bioinformatics (for many different purposes) is particularly well<br /> discussed, from my (admittedly biased) perspective. I particularly<br /> appreciate the discussion of De Bruijn graph traversal which is usually<br /> omitted in k-mer counting papers.

      squeakr implements both exact and inexact k-mer counting. squeakr<br /> appears to perform better (?) than all other k-mer counting systems in<br /> both inexact and exact modes, although I am unable to decipher figure<br /> 2's shading (see below). squeakr also excels in point queries graph<br /> traversal mode and batch queries applied to large collections of<br /> already loaded k-mers. As such, it is perhaps the most major advance<br /> in k-mer counting I've seen in the last few years. I should say that<br /> we are already planning to integrate the underlying CQF into our own<br /> khmer software for these reasons, and we have found it to be relatively<br /> straightforward; our preliminary performance benchmarks match the ones<br /> in this paper, which is reassuring.

      squeakr also has the very nice property that it uses an approximate<br /> membership query system from which k-mers can be removed, which is<br /> important.

      Paper details:

      I cannot distinguish KMC2 from squeakr inexact in the figures.

      I could not figure out how the exact k-mer counting worked; the<br /> wording in the section just above results (section 3) could be<br /> improved here, in my view.

      Technical issues that should be addressed:

      What commands were used to execute the benchmarks and measure the<br /> results? Please specify in gory detail.

      What version of the code was used? Please cut a release and give it a DOI<br /> (perhaps via Zenodo).

      Please describe (briefly in the paper, or perhaps in more detail on the<br /> github repo) what kind of testing approach you used. How do we know<br /> that the k-mer counts are correct, basically, and how can we check for<br /> ourselves in newer versions?

      I have been able to run this on my own data (hurray!) but I am unclear<br /> as to how to work with squeakr's k-mer counts. Right now it looks like<br /> basically I need to hook in at the c++ level - true? If so it should be<br /> made clear that this is a nice (and very fast) proof of concept that<br /> is not yet directly usable at the command line for k-mer counting.<br /> (This is a documentation issue.)

      Technical upgrades:

      Here are some optional issues you should think about, now or later --

      1. On AWS, these are the magic commands needed to get squeakr compiled on<br /> the following image:

      ubuntu/images/hvm/ubuntu-wily-15.10-amd64-server-20160222 (ami-05384865)

      sudo apt-get update sudo apt-get install -y libboost-dev libssl-dev zlib1g-dev libbz2-dev make libboost-system-dev libboost-thread-dev

      1. 'fallocate' doesn't exist on Mac OS X, so I was unable to make try<br /> squeakr on Mac OS X. This will inhibit a lot of bioinformaticians<br /> from making use of squeakr.

      Miscellaneous questions

      We have been thinking about using a rolling hash function in khmer;<br /> see https://github.com/dib-lab/.... This seems like it<br /> could make squeakr much faster if it replaced murmurhash. Thoughts?

      Signed,

      C. Titus Brown<br /> titus@idyll.org

    1. On 2017-04-06 07:43:41, user Jamie Timmons wrote:

      Hello Benjamin,

      As you indicated, you have not had time to read our original publications, just the blog by Speed, and I believe it would be an advantage to have a look at those.

      Note, that in our original article we select probe-sets in the initial data-set and then using only the probe-set identities, we validate across 7 independent tissue data-sets, including 3 tissue-types, 4 different array generations and 2 different array manufacturers (Affy/Illumina) produced over a >4yr period in different chip labs. So I think we can safely rule out, lol, a "batch effect" in data-set one, explaining the outcome across all the independent studies.

      My point to your original comment was to merely point out that your quick assessment of the Speed's code was in fact incorrect. Glad we both agree on this point now. They did indeed stack the odds during their so-called random feature selection, so their blog is not a 'random' perspective. Any reasonable peer-review process and editor should pick up on this if peer-reviewed (I hope, or we are all doomed!).

      Sampling from the 'best' ~4,000 genes' and claiming this 'random' sampling is at best disingenuous. Especially as all 150 are not needed to discriminate. Each 'new' [sic] 150 sample can combine useful and neutral data and then modestly classify. But modest is all.

      See attached figure for 10,000 random samples from all genes (-our 670) and including non-random hypothesis driven literature 'age' signatures. We are ranked 1st out of 10,000 in 4 independent muscle data sets.

      But I see you are now commenting on another incorrect claim made by Speed that unsupervised selection was sufficient to produce a REPRODUCIBLE and SINGLE age signature. This is also untrue.

      He doesn't present a new single 150 gene-list that works across all our data, and has refused to do so when asked for one. So we are still waiting, 1yr on for Speed's single 150 list that is a multi-tissue age classifier with links to health......

      Here, below, is more explanation on your questions but in the meantime it would be good if you could read 3 articles we produced that cover aspects of this field of ageing, and then we can discuss in a more productive way? if you can then think of a better way for us to explain the differences so that they are more understandable, please do let me know.

      Just to clarify, you are from the Quackenbush lab at Dana-Farber Cancer, correct? Do you know Bruce Spiegelman by any chance?

      John acts as an advisor at GB - I wonder if he has also been misled by the construction of the Speed/Jacob letter?

      Best

      Jamie

      https://genomebiology.biome...

      http://www.fasebj.org/conte...

      DOI: http://dx.doi.org/10.1016/j...

      Sure, in any given n=1 data-set you can find lots useful genes that work in that data-set to rank Young with Old tissue. Its a big biological signal. We matched for aerobic capacity (the strongest predictor of mortality in humans, Meyers et al NEJM 2002) but otherwise the 40yr gap was without evidence of illness, drug-treatment etc. That was the hypothesis.

      Speed never presents a gene-list that matches or out-perform our final 150-list, across all the data-sets. Sure you can find lots of single lists that work in 1 data-set....everyone knows not, its not news. Further, as there are thousands of age-related genes, and if you start with the top 25% most age-regulated genes in our hypothesis data-set, including the 670 we noted in our feature selection process, and then sampling from this non-random pool, you will of course produce lots of combinations.

      In the attached plot you will see the performance of hypothesis driven Age gene-sets from the literature versus 10,000 random samples of 150 (but missing our prototype 670 genes, the first pass - see paper). The randomly sampled 150-gene sets from muscle data-set 1 are applied to 4 independent muscle using external validation (no involvement what so ever of training data set).

      Speed never presents such data and also assumes that anything greater than AUC >0.5 is significant. Its not. There is also a lack of understanding of biology. Through gene-gene interactions, variance is of course distributed. We don't say the lower ranked genes have no variance w.r.t. age, just not sufficiently useful to work across muscle, brain and skin when only selected in 1 muscle data-set. Their use of the word 'random' is misleading.

      https://uploads.disquscdn.c...

    2. On 2017-04-04 11:46:29, user Jamie Timmons wrote:

      To spot the way Terry Speed and Jacob ‘stacked the odds’ for their claimed performance of a “random” 150 gene-set you must examine 2 steps in their code not just 1.

      This will confirm what data is being ‘sampled’ to produce gene lists is anything but random.

      The R session for the work by Terry Speed and Jacob can be found here: http://biorxiv.org/highwire...

      1)

      loadData.R: script where they load all the U133+2 gene-chip datasets from GEO/arrayexpress in their workspace and then create an rData object . It is this rData object is used for their “random” sampling in the ageing-subsample.R script

      If you look carefully at this loadData.R script, on line 132 and line 133 they do this:

      mads <- apply(X.GSE59880, 2, mad)<br /> mad.ok <- names(mads)[mads > quantile(mads, 0.75)]

      The mad function in R is defined as: "Compute the median absolute deviation, i.e., the (lo-/hi-) median of the absolute deviations from the median, and (by default) adjust by a factor for asymptotically normal consistency."

      They save this mad.ok variable (containing the upper quantile of probe-sets that vary with age) in the object ageing-data.RData.

      2)<br /> In the ageing-subsample.R session they load the ageing-data.RData from above and perform LOOCV.

      At line 71 they do this: <br /> common.probesets <- intersect(common.probesets, mad.ok)

      and, then on line 133 they “randomly” sample from this the top 25% (common.probesets) list:<br /> rand.sig <- sample(common.probesets, 150)

      Anyone briefly inspecting the code will look at the ageing-subsample.R script thinking that the loadDat.R is just grabbing all the chip data.

      They have misleadingly kept the quantile ranking part of the code in loadData.R script and this can easily be missed by any reviewers, journal editors and of course bloggers. What can’t be missed, is that they mention this “enhancement” step in the text of their methods, as we have pointed out several times.

      Thus Terry Speed’s claim that he got comparable data through random sampling of the ‘transcriptome’ is untrue. Notably since posting a year ago, Terry Speed has never responded to any query on this matter.

      The rest of the letter combines clinical groups and carries out analysis which we did not do and is invalid from a clinical perspective. The letter also fails to mention that while we use a single 150 set of genes in all data-sets, the process that Speed and Jacob does not.

    3. On 2017-03-30 22:56:06, user Benjamin Haibe-Kains wrote:

      I came across this paper as I am generally interested in scientific exchanges over high-impact studies (as you can tell from my publication track record).

      I read your response and I was curious about your claims so I took a quick look at Jacob and Speed’s code, which is publicly available (see Supplementary Material): http://biorxiv.org/highwire...

      Briefly, the code is indicating that Jacob and Speed did actually select 150 genes from a broad set of genes, not the age-related genes only

      ad-subsample.R -> lines 59, 86, 103 and 105<br /> ad.R -> line 110<br /> ageing-subsample.R -> lines 133, 145<br /> Ageing.ÉR -> lines 89, 112

      If you point me out to the part of their code supporting your claims, I can look at it. Similarly, if you share your ode, I can review it, if times permit. However, I do not plan to spend much time on this issue as this is not my main research area.

    4. On 2017-03-30 13:13:47, user Jamie Timmons wrote:

      Hi Benjamin

      Thanks for your response to the discussion on the incorrect assessment of our work by Jacob and Speed.

      I wondered if you could expand on your posting and update it with the essential information?

      Jacob original stated that he ranked our training data set for differential expression by age and sampled "randomly" from the top 25%. No code was presented and the R code itself doesn't need to reflect the input data matrix or indeed be what was actually used.

      Jacob and Speed have refused to answer any questions or share code for the past year.

      We did carry out 10,000 random samples from the U133+2 with and without our 'model probe sets' and beyond any doubt their claims are false for tissue age classification. No random 150 remotely matches our signature.

      The remaining analysis by Jacob is scientifically invalid as it 1) combines clinical groups that can't be combined and 2) it relies on 'other' gene lists out with any into all model. Anyone can fit a unique list to a given data set. No one has taken 1 model and made it work nicely across 7 independent data sets....hence our paper.

      I'd like to learn more about the route you took to establish your opinions on Disqus and if you can share any materials with my team?

      Thanks<br /> Jamie

      Professor James A Timmons

    5. On 2017-03-29 06:37:29, user Benjamin Haibe-Kains wrote:

      I checked the code for Jacob and Speed and it appears that the random signatures have been selected from the whole microarray chip (or the large set of common probes between microarray chips used in the study) and not the set of 'age' genes as claimed by 'Jamie'.

      Although I have found Jamie's response of high interest, I am not in a position to check the validity of his claims. I strongly suggest the response to Jacob and Speed's critics to be published as a separate biorXiv manuscript or at the very least, a link to the data and code should be provided for further scrutiny by the scientific community.

    1. On 2017-04-04 23:33:04, user Andrew Jones wrote:

      Two pretty major problems with this:

      Firstly, I just looked at their code and I am pretty sure that they include the ancestral sequence in EVERY sample from which they calculate the MISA. In fact, they include the whole lineage. If they knew what they were doing, they would be finished mutating before they label any sequence as 'extant' and take it as a sample (extant sequences are all we have, after all).

      Secondly, their fitness model assumes that every single amino acid has a completely independent effect on fitness; just a string of beads that have no interactions or effect on one another. Like weaselware (www.evoinfo.org). But not at all like a protein.

    1. On 2017-03-08 22:08:46, user Kobi wrote:

      I would suggest expanding the definition of reproducibility to include experimental work, not just computer code. This will help the definition be more generally applicable to all sciences.

    1. On 2017-03-07 13:14:22, user Pat Schloss wrote:

      The preprint from Herren and McMahon describes a new metric - cohesion - to describe the overall connectedness within a community using temporal data. I was excited to see this preprint because I am familiar with McMahon's long history of developing rich time series data for microbial communities in Wisconsin lakes. I also have a lot of my own time series data from humans and mice where we struggle to incorporate time into the analysis to understand the interactions between bacterial populations.

      A significant struggle in analyzing time course community data is the ability to synthesize observations for large numbers of taxa over time. Many of the existing methods people use attempt to adapt methods from cross sectional studies. For example, a study may sample a large number of lakes, people, soils, etc and characterize their microbial communities. They'll then calculate correlations across those samples based on the relative abundance of the populations. Alternatively, they'll used presence/absence data to generate co-occurrence matrices. The problem with these studies is that the next step is to often infer something about the interactions between the populations - even if the populations would never possibly co-occur. Herren and McMahon's efforts to study the connectedness of individual populations and their cohesion is very welcome because it has the potential to get us closer to describing the actual interactions between populations.

      To briefly summarize the approach, the method starts by calculating the Pearson correlation between all pairs of populations across time and then discounts the correlation that would be expected if all interactions were random. This is important because of the compositional nature of the data and the effects of different population sizes. Next, the method calculates the average positive and negative corrected correlation for each population. These become the positive and negative connectedness values for each population. Finally the positive and negative cohesion values for each community is calculated by determining the sum of the product of the connectedness value and the relative abundance for that population.

      The following are general critiques and questions, which I appreciate may be beyond the scope of the current manuscript (note, I am not a reviewer for the manuscript at a journal):

      1. To develop the cohesion metric for a community, the authors sum over all of the populations in the community. This raised three questions for me. First, independent of the relative abundances in each sample, is the *number* of positive and negative connections for each population relevant? It might be worthwhile exploring which populations have more positive/negative connections than others. What does that distribution look like? Second, does the connectedness metric value itself have any value? What are the populations that are highly connected with other populations. Finally, the method generates a cohesion value for each time point. If I think of Lake Mendota as a community that was sampled over time, it would be interesting to know whether it has been more cohesive than Lake Monona over the 19 years of sampling. Thinking of my own work, I would be interested in knowing whether mice that are more susceptible to C. difficile colonization are less cohesive than those that are resistant. Again, this would require a composite score, not individual scores for each time point.

      2. Continuing on my self-serving thread, I wonder how sensitive the method is to the time interval between samples and the number of samples. In my experiments I may have 20 daily samples from a mouse - is this sufficient? What if we miss a day - how will having a jump between points affect the metrics? As the authors state, the Lake Mendota dataset has 293 samples collected over 19 years (e.g. 1.3 samples/month). This is a very unique dataset that is unlikely to be repeated elsewhere. What if we were to get more frequent samples? What if they were more spaced out? What if we only had a year's worth of data? It would be interesting to see the authors describe how their cohesion values change when they subset the dataset to simulate more realistic sampling schemes.

      3. A significant challenge in developing these types of metrics is not knowing what the true value of the metric is in nature. I appreciate Herren and McMahon's effort to validate the metrics by comparing their results to count data and to explaining the variation in Bray-Curtis distances. The manuscript reads almost like they want their method to recapitulate what is seen with those distances. But we already have Bray-Curtis distances, if that's the goal, then why do we need the cohesion metric? It would be interesting to see the authors simulate data from communities with varying levels of cohesion and abundance to see that the method gets back the expected cohesion value. Perhaps it would be possible to generate an ODE-based model to generate the data instead of variance/covariance data. There is one simulation described at the end of the Results (L300); however, it is unclear whether the lack of a meaningful R-squared value was the expected result or not.

      4. Throughout the manuscript, the authors make use of parametric statistics such as Pearson's correlation coefficients and the arithmetic mean. Given that relative abundance data are generally not normally distributed and are likely zero-inflated, I wonder why the authors made these choices. I would encourage the authors to instead use Spearman correlation coefficients and median values. Related to this point, a concern with using these correlation coefficients is the problem of double zeros where two populations may be absent from the same communities. These will appear to be more correlated with each other than they really are, which is why we don't use these metrics for community comparison - we use things like Bray-Curtis. I wonder whether subtracting the null model counteracts the problem of double zeroes.

      5. The authors translate their count data into relative abundance data before calculating their correlation and Bray-Curtis values. I wonder if the authors subsampled or rarefied their data to a common number of individuals. Both of these metrics are sensitive to uneven sampling. Even if the counts are converted to relative abundances, this would not remove the effects. For example, if one sample has 1000 individuals and another has 100, the limit of detection on the first would be 10-fold higher than the second. There may be populations that represent 0.5% of both communities that would not be seen in the second. If they haven't already, I would encourage the authors to subsample their dataset to a common number of individuals.

      6. The "Description of datasets" section of the Methods describes the various datasets in general terms, but what is the nature of the data? How were the phytoplankton counted? How many individuals were sampled from each sample?

      7. It would be great to have the code that was used made publicly available on GitHub

      8. The authors present the material in a format that I have not previously seen in the microbial ecology literature (i.e. ISMEJ where this appears to be destined for review). The authors flip back and forth between presenting a different stage of the algorithm and validating that step. I think this is a bit more aligned with how one would present the material in a talk than in a paper. I've seen similar methods development described before where there might be a methods section on algorithm development and then the results section would test the assumptions and performance of the algorithm. I'm curious to see whether this structure persists through the editorial process.

    1. On 2017-02-04 19:21:51, user Anshul Kundaje wrote:

      Very nice work. Didn't see a link to the code. Could you share. We'd like to compare to our Deeplift method. Also a quick suggestion. I think the paper would be more complete with a systematic comparison to other existing methods such as in-silico mutagenesis, Simonyan et al, LRP and Deeplift. We'd be interested in benchmarking as well on simulations and real data.

    1. On 2017-01-16 07:53:15, user Christoph Nowak wrote:

      Hi Brian - brilliant method!<br /> I'm not a born-and-bred bioinformaticians (but about to get my PhD in molecular epidemiology): Will you be making the code / SMAF package available open access at some point? That would be a huge help!<br /> Thanks for letting me know - <br /> Chris<br /> christoph.nowak@medsci.uu.se<br /> Medical Sciences Dept., Uppsala University, Uppsala, Sweden

    1. On 2016-12-30 20:47:25, user Stephen Royle wrote:

      In the 18th Dec version of this manuscript there was a scaling error in Supplementary Figure 3D. The actual X and Y widths are both 1.64 nm and not ~10 nm as stated in this version of the manuscript. The error in the computer code had been corrected https://github.com/quantixe...

    1. On 2016-12-05 17:15:04, user Geraint Duck wrote:

      This is an interesting idea. However, I wonder about some other "hidden costs" of review that may also need to be considered. For example, the cost of access to both data, software, and *other papers*. Would a "full-time reviewer" have access to the array of non-OA journal subscriptions needed for a complete review? Some publishers will provide access to their own journal collections should you agree to review, but how often is (just) this sufficient? And related, access to software and/or equipment (which you do allude to in your article already) to properly assess and/or run supplied code (especially code that uses proprietary programs, e.g. Matlab and the like).

    1. On 2016-11-24 08:42:29, user Fabian Roger wrote:

      Interesting Study!

      FYI, I stumbled upon few little things: <br /> 1) on line 341 the link to the R code seems to point to the wrong repository (https://zenodo.org/record/1...<br /> 2) You talk about phylogenetic distance / diversity throughout the manuscript, but (as far as I see) don't specify which metric you used. The only sensible metric for a pair of two species is the connecting branch length I guess but maybe good to mention? Update: I see now that you mention it in the figure legends, but not in the methods?<br /> 3) L460 In part - letter missing

    1. On 2016-11-10 15:43:29, user Todd wrote:

      Thanks for a great analysis! In the manuscript, it mentions that APE version 3.6 contains the updated code for correct treatment of support values. I noticed on their website, the updated code already available in the "testing version": 3.5-0.10. Presumably, this will also be available in an upcoming 3.6 release. See release notes: http://ape-package.ird.fr/NEWS

    1. On 2016-11-01 03:49:36, user Anders Goncalves da Silva wrote:

      Sounds very interesting. But, it is a pet peeve of mine when software papers do not include a link to the code or page directly in the abstract.

      Ok, I retract the comment. It is there once you open the PDF.

      And kudos on supplying a Docker image. You should consider submitting to JOSS (http://joss.theoj.org/):Gpe3IpURpNr4FS1C5tOKjmBhLM0 "http://joss.theoj.org/)")

    1. On 2016-10-24 18:49:25, user Alessandro Nascimento wrote:

      It seems that there is a mismatch in on of the lysozyme T4 (M102Q) complexes cited in table VI. The crystal structure 2RBO does not contain the n-phenylglycinonitrile as a ligand. Instead, 2-nitrothiophene is the binder there. So, the PDB code should be 2RBN to correctly point to the complex between T4 Lys M102Q and n-phenylglycinonitrile. <br /> Just to 2c for this very interesting paper!

    1. On 2016-09-01 12:57:07, user Steven Ludtke wrote:

      First, let me say that this development is highly laudable and will, indeed be of great value to the community. However, there are major technical issues with the specific speedup numbers cited here. 4 of the latest generation GPU cards, which are so new they are still difficult to get are being compared to 5 year old Intel CPUs, which are considered "end of life", and are massively slower than new generation chips with more cores and new SIMD instructions. Additionally, it seems that a number of optimizations were made to the code itself, such as use of single-precision and deeper changes which could also impact the CPU, but have (apparently) not been ported there.

      Don't get me wrong, I'm not disputing that the GPU has value, and that it is more cost effective than the CPU in the present study, I am simply saying that the very large factors in absolute speedup and cost-effectiveness cited in this manuscript are massively biased towards the GPU.

    1. On 2016-08-04 08:54:18, user Wolfgang Huber wrote:

      I was asked to review this manuscript for a journal, and decided to share the review here.

      The work is concerned with clustering of gene expression profiles from RNA-seq data, using data transformation and Gaussian mixture models. The manuscript is well-written and of high technical quality. I have a few suggestions for improving it further.

      Main points:

      1.) The transformation to p_ij and then to g_arcsin, g_logit is interesting, and worthwhile considering. However, as the authors note on p.4, there are also other, obvious, and well-established candidates for transformations, such as log(n+c), VST, rlog, moderated CPM (all should be followed by mean centering). Since the title of the ms is "Transformation and model choice..." I would consider it important to include these in the study. One possible result could be that "it doesn't matter very much", or that one or another of these candidates really does poorly; in any case, this would be interesting for readers, as the choice of transformation often creates a good deal of anxiety.

      2.) The arguments for Gaussian mixture models (GMM, e.g. p.16) are well taken, but are a bit old-fashioned, "20th century". Nowadays, there also very good resampling based methods for assessing cluster stability, cluster membership, etc. See, e.g., the clue package on CRAN, or the "Cluster Stability Analysis" Section in the vignette of my Hiiragi2013 Bioconductor package. Adding these methods would be interesting, although I could understand if the authors decide it is out of scope. In that case, I suggest that at least the claims on exclusive utility of GMMs for doing such stability assessments be toned down.

      Smaller points:

      3.) I very much appreciate the provision of an Rmarkdown vignette reproducing all plots in the paper. This is exactly how it should be done. Here two more suggestions, which go beyond with what is required for journal publication, but would greatly increase the impact and quality of this research: To allow execution by readers, I recommend also providing the .Rmd file, not only the rendered PDF. Moreover, to avoid 'code rot' and other reproducibility issues, I recommend submitting the Rmarkdown document (e.g. in the form of a package vignette) to a repository with a build system, such as CRAN or Bioconductor, which will make sure the code actually runs on any regular computer (no dependencies on private files), with current versions of R, etc.

      4.) p.4 "As previously noted, each of these transformations seeks to render the data homoskedastic" -- I do not think this is correct. Homoskedasticity is the stated goal of the variance-stablising transformation, but not of the others.

      5.) p.4 "... but does not facilitate clustering together features with similar patterns of expression across experiments." -- where is the evidence for this claim? (cf. Point 1 above)

      6.) p.4 Notation overload in the equation for p_ij: the symbol j on the right hand side is used for two different things

      7.) p.5 "becomes even more apparent when considering the normalized expression profiles p_ij (Figure 1C)." -- is this not a circular argument if Cluster 1 was itself obtained from the p_ij?

      8.) p.5 "This means that the vector of values p_i are linearly dependent ... For this reason, we consider two separate transformations of the profiles pij to break the sum constraint ..." -- Even though the sum constraint is replaced by a more complicated constraint, the dependency is not broken. Is this argument really tenable (or needed)?

      9.) p.5 What are the values (used by the software) for g_logit for p_ij = 0 or 1? Both can and will happen in practice.

      10.) p.7 It would be helpful if plots showing the graphs of these transformations could be provided.

      11.) p.10 "filtering genes with mean normalized count less than 50" -- This seems like it could be a too stringent threshold, especially in, say, developmental studies, where certain genes are completely switched off in some of the conditions. Indeed these tend to be the most important genes.

      Very small point:

      12.) Introduction: "Increasingly complex studies of transcriptome dynamics are now routinely carried out using high-throughput sequencing of RNA molecules, ..." <br /> Indeed what is being sequenced in RNA-Seq are cDNA molecules. Direct sequencing of RNA is also possible but (currently) usually not called RNA-Seq.

      Typos:

      p.8 compatability

      p.15 hierarhical clustering

      This review was prepared by Wolfgang Huber.

    1. On 2016-07-19 05:30:05, user Frederic Bastian wrote:

      Great work! <br /> p.3: "In addition to lexical criteria, we use ontology structure criteria." "The rules above do not exhaustively cover all cases" "code on GitHub for details"<br /> I would be interested in getting more details about the lexical/ontology structure criteria used to generate prior probabilities. Could you point to where to find this information in the github project?

    1. On 2016-04-16 22:12:48, user Torsten Seemann wrote:

      I think this tool does home potential to improve plasmid assembly and recovery. <br /> I have some comments:

      (1) I feel it is overstating the problem of plasmids being missed. They are usually in the contigs (and graph) but obviously fragmented and sometimes joined to the chromosome graph, but they are there, and can be baited by looking for classical plasmid genes (rep etc).

      (2) The text has no actual description of the algorithm method, or any structured results.

      (3) The code/binary uses the same exe names etc as regular Spades, so it is difficult to install alongside regular spades.

      (4) It seemed to ignore my --threads parameter (at least for hammer stage)

      (5) I fed it a challenging multi plasmid data set of GAI data (PE 36 bp) and it failed on "-k auto" because it said k=55 was bigger than read length. I assumed "-k auto" would know the readlength as it just indexed/corrected all the reads?

      I look forward to further development of plasmidSpades and the integration of the methods into regular Spades.

    1. On 2016-03-17 19:29:25, user Fabien Campagne wrote:

      I disagree with the recommendation to use BioConductor as stated by the authors (section 3, page 11, frameworks). BioConductor is a great option in R, but it is not easy to obtain previous releases of BioConductor and the packages that it offers. If you need computational reproducibility, it is not trivial at all to obtain specific versions of a BioConductor environment. I recommend that the authors try to put their solutions to the test before recommending them. My group experienced many dependency installation issues with BioConductor, including the inability of the release servers to tag URLs with versions, so that even source code cannot be retrieved reliably in the future. <br /> We now routinely create docker images that contain R, BioConductor and a specific set of packages. This is the best way we found to achieve computational reproducibility with R.

    1. On 2016-03-10 15:53:49, user John Didion wrote:

      We discussed this paper in our preprint journal club on 3/3/16. Our comments:

      We appreciate that the authors corrected for demographic covariates and batch effects. However, there are additional potential confounding factors for which we are not confident that the authors have properly corrected:

      • RNA quality: samples from individuals with heart failure are subject to hypoxic stress and increased apoptosis, and thus may display substantially different expression profiles from healthy tissue due to non-biological causes (or at least not the biological causes you are interested in).<br /> • Medications: those with failing hearts are more likely to be on medications that may alter expression profiles.<br /> • Cell type composition: failing hearts are likely to have high infiltration of immune cells, which would change the make-up of the tissue you are profiling, and thus the expression profile.

      We urge you to report RIN scores and summary phenotypes for your case and control samples. Additionally, RNA degradation is correlated with duration ex vivo, so it would be nice to see data showing whether there were any differences in surgical conditions, sample handling, etc. between the healthy and failing hearts. There are various strategies for estimating cell type composition, and/or for estimating the composition by computational deconvolution (e.g. DeconRNASeq).

      It was also not clear as to the criteria for selecting case and control samples. Were control samples rejected for transplantation, or was the tissue sample taken from explanted hearts during the transplantation procedure? If the controls were rejected, what were the reasons, and might they constitute additional confounding factors? It would be nice to see additional detail on the methods of RNA isolation (including how much tissue was used and how much RNA was isolated).

      A general comment about the figures is that font sizes should be increased for readability. Figure 2 a,b are not intuitive, and it is unclear what additional information they convey beyond figure 1c; we already know that the healthy network is more interconnected, so it seems obvious that there should be more trans effects. If you want to be able to claim that trans effects are more significant for disease versus healthy hearts, you need to normalize by network size/connectedness.

      Figure 3 is the strong point of the paper, and we found it to be very effective in conveying the points you seem to be trying to make with the paper. We also find the method to be generally useful, and we urge you to release the source code used to perform this analysis as supplementary material. However, we are confused as to your end goal. Are you trying to create a community resource and starting point for investigating genes underlying heart failure? If so, then you should publish the entire list of genes that meets some significance threshold, not just the top 10.

      In figure 4 b,f,g, it is unclear what the scales mean. How are we supposed to interpret normalized expression difference of 0.850 versus 0.875? Is that a big difference? It would be nice to know the magnitude of difference that constitutes significance. The legend for figure 4 is difficult to follow, and we don’t see where you even discuss panels F and G.

    1. On 2016-03-09 15:28:59, user Matt Shirley wrote:

      It looks like the code for FISHR is not available from the link in the manuscript. Nor is it available from the "Download code" link on the documentation: http://www.matthewckeller.c... - it just points to the page itself. I was interested in seeing the implementation of your method, but now since I've spent about 15 minutes looking for the actual program I'm unlikely to return at a later date to check it out. It's understandable for a preprint manuscript to lack some amount of polish, but please don't waste peers' time describing something that doesn't publicly exist. Since releasing the code on a personal website presents challenges in presentation, maintenance, versioning, and availability I would suggest uploading FISHR to a hosted version controlled repository on GitHub or BitBucket, or using something like FigShare to archive a snapshot of what you're describing in this paper.

    1. On 2016-03-04 04:03:43, user Owen Rackham wrote:

      We are still in process of producing the final release of the software <br /> but I'll be happy to provide usernames and passwords if you email me (owen.rackham{at}duke-nus.edu.sg). We are doing it this way so we will be able to let users know about any changes if we need to modify the code.

    1. On 2015-12-10 14:26:08, user Alexandros Stamatakis wrote:

      As a side note, later-on I also analyzed the Oblong (21,000 lines of code in a single source file) parsimony code http://onlinelibrary.wiley.... under the same criteria for a talk I gave.

      I executed:

      ./oblong -p -i125.phy -otest

      Valgrind:<br /> Invalid read of size 2<br /> Invalid write of size 2<br /> definitely loast: 125,500 bytes

      gcc warnings: 52<br /> clang warnings: 443

      no assertions used

    2. On 2015-11-24 00:26:39, user Graham Gower wrote:

      I'm glad to see this topic getting some attention, there are definitely some improvements that can be made to software quality. I have some comments based upon a cursory reading.

      1)<br /> The malloc usage errors section in the main text gives the wrong impression about what you actually check for. Upon reading, I assumed you were checking type casts relating to the return value of malloc() - which should not be type cast in C. From the SI, I see this is not the case, you are instead referring to parameters being passed to malloc.

      Specifically you discuss multiplying by a signed integer, which can result in the wrong parameter being passed, and suggest to cast the integer prior to multiplication. In such cases it would also be more appropriate to use calloc().

      Table 3 shows "No-Error" as a value in the malloc column, which I assume to mean the return value was not checked for error. Perhaps the meaning of this should be mentioned in the text.

      2)<br /> Valgrind may falsely report invalid memory accesses when the code is built using highly optimised memcpy/strcpy functions, e.g. sse optimised versions are inlined by gcc. You should be able to compile with -O0 to avoid these. However, certain compiler warnings (at least with gcc, I don't know about clang) are also dependent upon the specified optimisation level. It was not obvious to me what level of optimisation was used, or if this was uniformly applied to all programs. In addition, it would be useful to know which versions of gcc and clang were used, as warning certainly change over time between versions.

      3)<br /> I'm curious to know more about the memory leaks you report. E.g., is a given leak of constant size, or does it change with the size of the input data set. It was also not obvious to me which leaks were likely innocuous because they are in a short running program, or if they are more severe due to many occurrences of the same leak in successive iterations of the program's main loop.

    1. On 2015-11-18 18:49:49, user B. Arman Aksoy wrote:

      Hi Dr. Baryshnikova,

      Thanks for making this manuscript available as a pre-print — as a researcher in the field of Systems Biology, I tackle with the task of useful annotation of networks and extracting subnetworks that are of interest to a particular study; and I did enjoy reading your approach for this common problem. Also thank you for open-sourcing the code for others to investigate/re-use.

      I have two questions regarding the overall implementation of SAFE and one of its use cases:

      1) SAFE seems to be designed in a distance-metric-agnostic way, meaning that users can swap in their distance metric of choice if needed; but your examples in the manuscript focuses heavily on a particular instance of this, which is the map-based, Euclidean distance. This, of course, requires nodes in the network to be laid-out before hand and the assumption is that the layout algorithm respects the topology of the network and the biological interpretation of it. As such, the Kamada-Kawai algorithm is a particular good choice as force-directed layout algorithms tend to produce visually appealing and relatively less hairballs. Having said that, these algorithms (especially the force-directed ones) are known for their non-deterministic results where they tend to reach to local minimums fast and stay there. This creates a problem for the use of Euclidean distances on a network that was laid out using any instance of such algorithms as each lay out will produce different placements of nodes in the network, which makes your metrics differ slightly for every layout. You addressed this particular issue in the manuscript and show that 10 instances of these network laid out with the Kamada-Kawai algorithm look fairly similar to each other. Yet I found the following results slightly weak:

      "...I found that GO terms enriched within the<br /> neighborhoods of at least 10 genes showed highly similar enrichment landscapes in any<br /> two network maps (median Spearman’s rank correlation ρ = 0.82; Figure 3C). This<br /> indicated that, despite differences in absolute node positioning between maps, the<br /> neighborhoods of the individual genes remained largely unchanged and tended to be<br /> enriched at the same level for the same GO terms. Importantly, the number of enriched<br /> neighborhoods per GO term was also consistent: of all GO terms that had at least 10<br /> enriched neighborhoods in one map, 83% made the same threshold in at least 5 maps and<br /> 67% in all 10 (Supplementary Figure 2)...."

      My question is: given that a map-based distance (after the network is drawn using a force-directed layout) can be considered a proxy for the actual graph distance (the number of edges in a shortest path) and it causes noise in your SAFE analyses due to its stochastic nature; I am curious to hear your thoughts on why you do not run SAFE using the graph distance (which is deterministic) and then ran your layout of choice on the annotated network? To me, the laying the network before the analysis seems like an unnecessary step whereas you could have made your results more robust using a different distance metric.

      2) Since GO annotations and molecular/gene-level interaction networks are inter-related, suggesting that enrichment analyses for GO annotations do not necessarily need the network background to be used in the analysis; can you explain the advantage of using SAFE for discovering mechanism of action from a given gene set of interest and a background network, compared to simply using the gene set as it is in one of GO enrichment tools (such as http://geneontology.org/pag...?:s-MI5PLZrV_ZKNXqJkYlsljSwUk "http://geneontology.org/page/go-enrichment-analysis)?") In particular, my question is: would you get the the same/similar enrichment results, if you were to submit the list of mutated genes (that you used in Section: "SAFE provides novel insights into the molecular basis of resistance to bortezomib") to a de facto gene enrichment tool?

      Regards,<br /> -- Arman

    1. On 2015-08-25 08:25:20, user Peter Hickey wrote:

      General comments

      Matsui and colleagues propose D^3M, a method for testing differential methylation at a cytosine in a two-group experiment, such as a case/control study.

      The majority of existing methods for testing differential methylation are based on a test using a summary statistic of these distributions, such as the mean, median, or variance. Of the currently available methods, the most similar to D^3M is the similarly named M^3D, which uses the maximum mean discrepancy to test whether the distribution of methylation levels across a _region_ are identical in the case and control groups. Like M^3D, D^3M is based on a statistical test of whether the distribution of methylation levels is different between the case and control groups. D^3M, however, focuses on methylation differences individual cytosines rather than across a region.

      The D^3M method is well-described and the claims of the method's performance well-supported by the presented results. I believe that D^3M is a valuable contribution to the analysis of differential methylation, particularly in studies where the difference between the cases is in the higher order moments of the methylation distributions.

      I thank the authors for making the code and example data available. In order to promote the use of D^3M by the wider community, I strongly encourage the authors to make the method available as an R package or to contribute it to an existing R/Bioconductor package for analysing DNA methylation data.

      I did find, however, several points in the paper where I would appreciate clarification. These are described below in the major and minor points for revision. Most of these are to improve the clarity of the paper.

      One particular question I have is whether the authors are proposing the use of D^3M for the analysis of both methylation microarrays (e.g., Illumina 450k, as used in their TCGA data analysis) and sequencing-based assays (e.g., whole-genome bisulfite-sequencing and reduced representation bisulfite-sequencing). If the authors believe it is equally applicable to sequencing-based assays then I think it would be appropriate to include such an analysis in the paper (at least the supplementary material, if not in the main text) and to clarify this issue in the text.

      Specific comments

      Major

      • p1: "For example, limma, minfi, edgeR, DESeq, DiffVar detect the differential methylation sites by testing for significant differences in mean and variance". This is the initial source of my confusion as to whether D^3M is designed for microarray-based and/or sequencing-based assays of methylation. Specifically, limma was initially designed for gene expression microarray data and, more recently, can handle sequencing (designed for RNA-seq) data via the voom() method; minfi is designed for methylation microarrays; edgeR and DESeq are both designed for sequencing-based assays (especially RNA-seq) and are not appropriate for microarray-based assays; DiffVar has methods available for both microarray-based and sequencing-based data. Furthermore, to the best of my knowledge, neither edgeR and DESeq have been used for bisulfite-sequencing assays (where one obtains a beta-value), although they may be used, suitably modified, to analyse enrichment-based sequencing assays of DNA methylation (such as methyl-binding ChIP-type assays). I think it is necessary to (1) clarify whether D^3M is designed for microarray-based and/or sequencing-based assays; (2) cite examples where these other methods have been used to analyse comparable data and, if no such examples exist, to otherwise clarify this in the Introduction; (3) If D^3M is applicable to sequencing-based assays, how does sequencing coverage affect this method (I ask this because the authors of M^3D note the need to explicitly account for this in their method).
      • p2: I find the description of the permutation procedure used to derive the null distribution to be unclear. Are the vectors x = (x_{1}(s_{i}), ..., x_{n}(s_{i})) and y = (y_{1}(s_{i}), ..., y_{m}(s_{i})) jointly permuted (this is what is looks to me in the supplied code in d3m.R)? Otherwise, if each of x and y are permuted within themselves, I would expect the permuted distributions, \hat{F_{i}^{*}}(x) and \hat{G_{i}^{*}}(y), to be identical across permutations.
      • p3: When introducing the methods against which D^3M is compared (DiffVar, KS, Welch, WMM and MMD), I think it would be fairer to explicitly list which hypotheses (case 2-8) that each method is designed to detect. For example, DiffVar is definitely not designed to detect case 4 (difference in mean only), but is explicitly designed to detect case 3 (difference in variance only), as the simulation results bear out. While this point is somewhat addressed in the discussion, the simulation description notes that in each of case 2-8 the hypothesis being tested is whether the two distributions are identical, and many of these methods are designed to test a more restricted hypothesis about distributional moments. It would also aid the interpretation of the results since it makes it easier to check that the various methods are working 'when they should'.
      • p3: How is MMD implemented in the simulation analysis, e.g., using the M3D Bioconductor package or via kernlab? If the latter then I think it should be emphasised.
      • p3: The original paper describing M^3D (Mayo et al. 2014) states that M^3D is designed for testing differential methylation at pre-defined _regions_ rather than individual cytosines. Does this put M^3D at an unfair disadvantage in the simulation study where only individual cytosines are examined? Relatedly, is D^3M applicable to testing for differentially methylated regions?
      • p3: How much does sample size effect the power of these methods, particularly D^3M. The authors note in the discussion that a sample size > 100 is desirable and the simulation and TCGA data analyses use n = O(100). Would it be possible to explore this further in the supplementary material, e.g., n = O(10) (representative of sample sizes being used in whole-genome bisulfite-sequencing and reduced representation bisulfite-sequencing experiments) and n = O(1000) (representative of a large epigenome wide association study)?
      • p4: The authors note that there are few sites identified by D^3M, Welch, and DiffVar. They rightfully note that this "[indicates] that the differential methylation sites based on the shapes include distinct information not relevant to Welch and DiffVar" but I wonder what sites are being uniquely detected by DiffVar and/or Welch and whether this indicates any limitations of D^3M.
      • p4: In the analysis of the TCGA data, were probes with SNPs removed, e.g., using minfi::dropLociWithSnps()? These probes can otherwise introduce well-known biases and, in particular, may give rise to distributions of beta-values that look very much like case-8 but that are driven by SNPs rather than methylation levels. The authors rightfully note in the Discussion that these type of "measurement error" outliers should be removed prior to analysis. More generally, it would be ideal if an R script was provided that performed the analysis of the TCGA data.
      • p4: The significance level of P < 0.01 is rather generous, especially given that there are ~400,000 hypothesis tests. Might these results be better presented with reference to a false discovery rate (e.g. 5% FDR), as is typical when assessing significance in genome-wide studies of methylation and gene expression?
      • p5: I have considerable difficulty interpreting Figure 3. This figure could be greatly improved by an expanded and more detailed caption. My understanding is that the upper panel are the data on the 145 GBM samples and the lower panel are the data on the 530 LGG samples. In each panel I see a clustering by sampleID (the horizontal separation of "completely green" on the bottom third from the "green and red" on the top two-thirds). To me this means that in each of GBM and LGG there are two distinct clusters based on these top 1000 sites. However, this doesn't, seem to be the result being conveyed by the authors. I am fairly certain that I have misinterpreted the authors' intentions but I cannot understand this result without further description of the figure in the caption. Could this figure also be redone to avoid the red-green colour scheme to assist those with colour blindness, e.g., see http://bconnelly.net/2013/1... and https://github.com/wistia/h....
      • p5: I was curious about the running time of D^3M and so I played around with the scripts and data provided by the authors. On my 2015 Macbook, it took approximately 1.1 seconds to run D^3M on a single site using the data provided in sample.txt. That means it would take approximately 150 hours to run on a dataset with 500,000 sites when using the existing code. I believe that this type of calculation should be included in the paper because readers will want to know how long it would take to run D^3M on a typically sized dataset. There are obvious speed-ups obtainable by a parallel version of D^3M since it is an embarrassingly parallel computation across loci and this might be mentioned by the authors.

      Minor

      • General: The authors consistently refer to methylation "patterns" (e.g., in the paper's title). I think that to many people studying DNA methylation a methylation "pattern" refers to the string of methylation states along a single DNA fragment, e.g., like those shown in a methylation 'lollipop' plot (http://www.pnas.org/content...:jMp1DQPk-PMjD1LoJeplgSj8xQQ "http://www.pnas.org/content/109/46/18653/F5.large.jpg)"). What is being analysed by D^3M, and the competing methods, are methylation "levels", such as beta-values, rather than methylation "patterns" - if the authors agree with my assessment then I would suggest a change in terminology to avoid confusion.
      • p1: "...in which a methyl group is attached to a carbon cytosine (C) base". I think this wording is slightly imprecise; the methyl group is typically added to the 5-carbon __of a__ cytosine (C) base.
      • p1: "The methylation of promoter region, in particular, silences cancer suppressor genes". This would benefit from an appropriate reference(s).
      • p2: When beta-values are first mentioned it would be helpful to define these.
      • p2: Equation (5) has typesetting glitch at the end of the line.
      • p2/p3: "Since the method for detection of methylation, which is based on distance, cannot distinguish the 'direction' of the hyper- or hypo-methlation." This sentence is incomplete and does not make sense.
      • p3: The definition of a simulated "dataset"; is a dataset the data simulated for a single cytosine in the cases and controls?
      • p3: The description of MMD as "[it] cannot control type I error at both of the levels of 5% and 1%, i.e. the significance level actually fails" I find to be confusing. The results presented in Table 2 show that MMD is identically 0.00 for 'case 1'. While I agree that this means that MMD does not achieve the nominal Type I error rate, I would describe MMD as being conservative rather than it not controlling the type I error rate (which is how I would describe the situation if the results for MMD were >> 0.05). Perhaps this is just a difference in terminology, but I was initially surprised/confused when trying to reconcile the main text with the results presented in Table 2.
      • p4: In table 2 the nominal size (alpha) is given as a decimal value in [0, 1] but the reported Type I error rates (case 1) and power (case 2-8) are given as percentages [0, 100]. This seems unnecessarily confusing.
      • p4: The pcaMethods package provides several functions for imputation of missing data; it would be good to provide further details of what was used in the analysis of the TCGA data.
      • Supplementary material: A minor latex error; the references to equation (1) and (2) are rendered as "(??)".
      • General: "MissMethyl" should be "missMethyl"
    1. On 2015-08-24 05:27:43, user Ilia Stambler wrote:

      Excellent article.

      Completely agree with your point that the main problem is the deficit (or even absence) of scientifically grounded or clinically applicable “diagnosis of aging”.

      As you write “there is no universal set of biomarkers and guidelines for measuring aging as a system" and “to successfully evaluate the effect of any drug that influences aging, it is essential have a measureable endpoint, such as biomarkers”.

      It seems that even biomarkers as such are not enough; there is a need to precisely define measurable *clinical* end points. Without their definition, it seems unlikely that aging can be recognized as a treatable medical condition (or disease). This seems to be a recognized problem for Alzheimer’s disease. Treatments may seem to work on biomarkers (e.g. clear amyloid), but seem to give no clear clinical benefits – and billions of dollars are gone.. But at least in Alzheimer’s there is a more or less clear clinical definition – unlike aging, apparently... As one author writes about Alzheimer’s disease:

      «Regulatory agencies are unlikely to provide accelerated approval for a presymptomatic treatment based solely on biomarker (i.e., surrogate marker) endpoints without additional evidence to show that a treatment’s biomarker effects are “reasonably likely” to predict a clinical benefit.»<br /> http://www.ncbi.nlm.nih.gov...

      It seems this is largely a problem of scientific, clinical and even mathematical definition of aging – perhaps even less of its “socially constructed” perception (as in the case of mental illnesses or obesity). In the article "Estimation of Heterogeneity in Diagnostic Parameters of Age-related Diseases" we explored the question of some possible formal mathematical, yet clinically applicable, definition (p. 223) http://www.aginganddisease....

      Also generally, the methods of determining risk factors for mortality (including the factor of aging) appear problematic. For example, even in the authoritative Global Burden of Disease (GBD) study, it appears that the risk of death from various factors can exceed hundreds percents (when in fact it should be no more than 100% ...). And of course, in GBD, aging is not even considered anywhere close to a risk factor (though there are factors like “injuries by pedal cycle vehicles”...)<br /> http://www.sciencedirect.co...

      In the article “Information-theoretical analysis of aging as a risk factor for heart disease" we explored the question of a correct definition of risk factors and their combinations (p. 204) http://www.aginganddisease....

      Indeed, "senility" is already a part of ICD classification – as recognized by some GBD statisticians.<br /> http://www.icd10data.com/IC...

      Yet, it is unlikely to affect policy makers, as it is considered a “garbage code” – when there is no clear clinical or biological definition. So in order to successfully use this code, it seems there is again the need to develop the evidential basis of biomarkers and clinical end points.

      Here is, for example, how the same article about the Global Burden of Disease speaks about that “garbage code” (pp. 2099-2100).<br /> http://www.sciencedirect.co...

      «Murray and Lopez introduced the notion of “garbage codes” in the GBD and proposed methods to redistribute deaths assigned to garbage codes to probable underlying causes of death. Garbage codes are causes of death that should not be identified as underlying causes of death but have been entered as the underlying cause of death on death certificates. Classic examples of garbage codes include senility or cardiopulmonary arrest. In the GBD 1990, major garbage codes were identified and simple algorithms proposed to redistribute these proportionately to various causes (called “target codes”) that were the likely underlying causes of death. A similar approach was applied for the GBD 2000 and subsequent WHO updates».

      And another consideration that seems very important – a treatment may improve the biomarkers and even clinical or functional end points of aging – but shorten the lifespan!!! (as in the case of some stimulants) It seems there is a need for long term analysis (ideally establishing the effect on the actual lifespan, or at least long term effects on mortality). Yet it seems this kind of research may not be very popular with investors or politicians. But without it, our “cures against aging” may shorten people’s lives…

      Thank you

      Ilia Stambler, PhD <br /> www.longevityhistory.com

    1. On 2015-06-19 00:48:10, user Niranjan Nagarajan wrote:

      Thank you for the feedback and comments. A new release of OPERA-LG with more error-reporting and a few bug fixes is out. This should hopefully handle BWA input better. The code for PacBio scaffolding should be out soon as well. Finally, we will expand on the staged-strategy in a revised version of the manuscript to make it clearer.

    2. On 2015-06-02 18:59:54, user Matt MacManes wrote:

      A few comments and questions:

      1. I get a core dump with the test data and BWA, but not Bowtie. See https://gist.github.com/mac...

      2. Will you provide the code necessary for the PacBio scaffolding (e.g., the generation of the in silico mate pair reads)

      3. Can you further explain the staged- strategy for larger genome scaffolding?

    1. On 2015-03-02 22:58:26, user Devon Ryan wrote:

      This looks like a useful pair of programs. From a quick perusal of the code, it seems as though there may be a couple issues: does not support single-end datasets, does not support output from tools like bowtie2 that can accept multiple files at once (or tophat2 that can accept both a list of files and a mix of single and paired-end files). Of these, supporting single-end datasets would be the only really important change (though doing that should make supporting multiple file input a la bowtie2/tophat2 simple).

  2. Apr 2026
  3. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. While we have our concerns about the privacy of our information, we often share it with social media platforms under the understanding that they will hold that information securely. But social media companies often fail at keeping our information secure. For example, the proper security practice for storing user passwords is to use a special individual encryption process [i6] for each individual password. This way the database can only confirm that a password was the right one, but it can’t independently look up what the password is or even tell if two people used the same password. Therefore if someone had access to the database, the only way to figure out the right password is to use “brute force,” that is, keep guessing passwords until they guess the right one (and each guess takes a lot of time [i7]). But while that is the proper security for storing passwords. So for example, Facebook stored millions of Instagram passwords in plain text [i8], meaning the passwords weren’t encrypted and anyone with access to the database could simply read everyone’s passwords. And Adobe encrypted their passwords improperly and then hackers leaked their password database of 153 million users [i9]. From a security perspective there are many risks that a company faces, such as: Employees at the company misusing their access, like Facebook employees using their database permissions to stalk women [i10] Hackers finding a vulnerability and inserting, modifying, or downloading information. For example: hackers stealing the names, Social Security numbers, and birthdates of 143 million Americans from Equifax [i11] hackers posting publicly the phone numbers, names, locations, and some email addresses of 530 million Facebook users [i12], or about 7% of all people on Earth Hacking attempts can be made on individuals, whether because the individual is the goal target, or because the individual works at a company which is the target. Hackers can target individuals with attacks like: Password reuse attacks, where if they find out your password from one site, they try that password on many other sites Hackers tricking a computer into thinking they are another site, for example: the US NSA impersonated Google [i13] Social engineering [i14], where they try to gain access to information or locations by tricking people. For example: Phishing attacks, where they make a fake version of a website or app and try to get you to enter your information or password into it. Some people have made malicious QR codes to take you to a phishing site [i15]. Many of the actions done by the con-man Frank Abagnale [i16], which were portrayed in the movie Catch Me If You Can [i17] One of the things you can do as an individual to better protect yourself against hacking is to enable 2-factor authentication [i18] on your accounts.

      The suggestion made by the author in this section for the reader to implement two-factor authentication is very practical; however, I believe the author does a poor job at addressing a significant nuance regarding the two types of two-factor authentication methods. Two-factor authentication was presented as a simple protective tool, whereas SMS based two-factor authentication, which is currently one of the most commonly used forms of two factor, has limitations. For example, while it may appear easy enough for individuals to simply use their SMS-based phone number to access their online accounts via a second verification step, it would be difficult for hackers to intercept the codes sent to the individual’s phone via SMS if they were using an authenticator app such as Google Authenticator. This is due to the fact that authenticator apps generate time sensitive codes locally on the user’s smartphone.

      While the author provided an abundance of information related to social engineering/phishing attacks in this section, he failed to provide guidance for the reader when it comes to the limitations with SMS-based two-factor authentication. For instance, since phishing attacks could potentially trick users into inputting their SMS based two-factor authentication code on a fake website, then send it immediately to the hacker once inputted, I find myself wondering whether the author should have been more explicit with his recommendations for the different forms of two-factor authentication rather than just stating “two-factor” in general terms.

    1. 68000POLICE AND PRISON EQUIPMENT AND SUPPLIES

      NIGP Code Description 68000 POLICE AND PRISON EQUIPMENT AND SUPPLIES 68002 Access Control Systems and Security Systems 68004 Ammunition 68005 Ammunition, Reloaded 68006 Ammunition Handling Systems (Aircraft, Tanks, etc.) 68008 Police Protection Equipment (Body Armor and Riot Shields) and Supplies 68010 Badge Cases, Police (All Types) 68012 Belts, Cases, Holsters, Scabbards, etc. 68020 Billies and Night Sticks 68024 Breath Alcohol Testing Instruments and Supplies 68028 Bullet Traps 68032 Burglar Alarms 68033 Canine (K-9) Police Dog Training Equipment 68034 Citation Issuance Devices and Supplies 68035 Chemicals for Personal Defense (Mace, etc.) 68036 Clay Targets and Skeet Range Equipment 68040 Composite Identification Kits and Systems 68041 Crime Detection Equipment and Supplies 68042 Curtains, Security; Vehicle Security Partitions 68044 Detectors, Gun and Metal 68045 Explosives Storage Boxes, Bunkers, etc. 68046 Explosives, Grenades, Accessories and Supplies 68047 Evidence Bags, Containers and Supplies 68048 Finger and Foot Printing Equipment, Accessories, and Supplies (Including Laser and Cyanoacrylate Fuming Chambers) 68049 Firearms Training Simulators 68050 Guns, Stun (Nonlethal), (Incl. Taser Weapons), (See 680-54 for EMD Weapons) 68051 Forced Entry Equipment and Supplies (Battering Rams, etc.) 68052 Guns, Pistols, Rifles, and Shotguns (Incl. Accessories) 68053 Guns, Machine (Including Other Military Style Weapons) 68054 Guns, Electro-Muscular Disruption (EMD) (See 680-50 for Stun Guns) 68056 Gun Cleaning Supplies: Patches, Rods, Silicone Cloths, Solvents and Brushes, etc. 68057 Gun Rifling Machines 68058 Gun Locks, All Types 68059 Identity Tracking Devices 68060 Handcuffs, Leg Irons (Strap and Loop Style) 68061 Lockers, Security 68062 Megaphones, Whistles, etc. 68063 Maintenance Stands, Fixtures, and Jigs for Weapons 68065 Night Vision Systems 68066 Police Investigation Robots 68067 Police Training and Instructional Aids: Wall Charts, etc. 68068 Polygraph Equipment and Supplies 68071 Prisoner Tracking Devices, Electronic (Wrist and Leg Bands, etc.) 68072 Prison Equipment, Cell Blocks, and Accessories (Incl. Control Panels, Door Control Relays, Furniture, etc.) 68073 Prisoner Identification Equipment and Supplies 68074 Pyrotechnics 68076 Racks, Gun (See 055-74 for Vehicle Racks) 68077 Radar Instruments, Traffic Enforcement Type (Including Laser Speed Measuring, Ranging Devices and Radar Instruments equipped w/Cameras) 68078 Recoil Pads 68079 Recycled Police Equipment, Accessories and Supplies 68080 Reloading Equipment and Supplies 68082 Remote Operations Equipment 68084 Riot and Crowd Control Equipment (Not Otherwise Classified) 68085 Road Spikes (For Use by Police to Stop Vehicles on the Road) 68086 Scopes, Rifle and Range 68087 Surveillance Cameras and Counter-surveillance Equipment and Supplies 68088 Targets,Target Pasters, and Rifle Range Equipment (Including Portable Shooting Ranges and Range Finders) 68089 Shooting Ranges, Portable 68092 Tear Gas, Tear Gas Guns, and Ammunition 68093 Test Equipment and Supplies: Criminology Kits, Metal Reagents, Paraffin, Sexual Assault Exam Kits, etc. (Including Technical Equipment and Supplies Used in Police Laboratories) 68094 Tire Markers and Supplies 68095 Tools, Gunsmith's: Borescopes, etc. 68096 Traffic Batons 68097 Warning Systems, Perimeter Anti-Intrusion, Electronic (Including Civil Defense and Natural Disaster Types)

    1. Closed Loop + Infinite Demand = Economic Engines. Software engineering lives here. AI writes the code. Tests verify correctness. More code enables more features. Companies will always need more software.

      作者将软件开发定位为'经济引擎',这是一个极具洞察力的观点。它表明AI在软件开发中不仅提高了效率,还创造了无限循环的价值增长模式,这与许多其他AI应用形成鲜明对比。

    2. AI writes the code. Tests verify correctness. More code enables more features.

      这个简洁描述揭示了AI在软件开发中的完整闭环:AI生成代码,测试验证正确性,更多代码创造更多功能。这种自增强循环可能使软件开发成为AI最具颠覆性的应用领域。

    1. Kimi K2.6 autonomously overhauled exchange-core, an 8-year-old open-source financial matching engine. Over a 13-hour execution, the model iterated through 12 optimization strategies, initiating over 1,000 tool calls to precisely modify more than 4,000 lines of code.

      大多数人认为AI在复杂工程任务中仍需要人类专家的指导和监督,难以独立完成大规模系统重构。但作者展示了AI能够自主分析、优化并重构一个运行8年的金融系统,这挑战了人们对AI工程能力的传统认知,暗示AI可能已经具备系统级架构设计和优化的能力。

    2. Kimi K2.6 demonstrates significant improvements over Kimi K2.5 in internal evaluations conducted by CodeBuddy: code generation accuracy increased by 12%, long-context stability improved by 18%, and tool invocation success rate reached 96.60%.

      大多数人认为AI模型迭代通常是渐进式的改进,每次版本更新可能有5-10%的性能提升。但数据显示Kimi K2.6实现了远超预期的飞跃,特别是在工具调用成功率接近97%的情况下,这挑战了人们对AI模型能力提升速度的常规认知,暗示可能存在某种技术突破或架构创新。

    1. NEC aims to build one of Japan's largest AI-native engineering teams, who will use Claude Code in their work.

      大多数人认为AI会取代大量工程师职位,但作者认为AI实际上是在创造新的工程角色和技能需求,因为NEC正在积极建立一支大规模的AI原生工程团队,这表明AI工具正在增强而非替代工程能力,创造新的就业机会。

    1. Claude packages everything into a handoff bundle that you can pass to Claude Code with a single instruction.

      大多数人认为设计和开发是两个分离的专业领域,需要专门的交接流程和工具,但作者暗示AI可以实现从设计到开发的无缝单指令转换。这一观点挑战了软件开发与设计之间的传统界限,暗示AI可能重新定义跨职能协作的方式。

    2. Claude packages everything into a handoff bundle that you can pass to Claude Code with a single instruction.

      这一描述暗示了AI系统之间无缝协作的可能性,挑战了传统软件开发中设计到实现阶段的转换壁垒。这种自动化工作流程代表了软件开发范式的潜在革命,值得深入了解其技术实现和实际限制。

    1. GPT‑5.5 found a proof of a longstanding asymptotic fact about off-diagonal Ramsey numbers, later verified in Lean. The result is a concrete example of GPT‑5.5 contributing not just code or explanation, but a surprising and useful mathematical argument in a core research area.

      大多数人认为AI在数学研究领域仅能辅助计算或提供解释,无法独立进行创造性数学推理。但作者展示GPT-5.5能够发现并证明数学定理,这一突破挑战了数学研究作为纯粹人类活动的传统观念,暗示AI可能成为真正的'研究伙伴'而非仅是工具。

    2. GPT‑5.5 found a proof of a longstanding asymptotic fact about off-diagonal Ramsey numbers, later verified in Lean. The result is a concrete example of GPT‑5.5 contributing not just code or explanation, but a surprising and useful mathematical argument in a core research area.

      大多数人认为AI在数学研究中的作用主要是辅助计算和验证,但作者认为GPT-5.5能够独立发现数学证明,这在数学研究领域是革命性的。这一观点挑战了人们对AI在创造性思维和抽象推理领域能力的传统认知,暗示AI可能正在从工具转变为研究伙伴。

    1. A US lab would never; well, unless you count a code red or Meta's throw money at the problem moves.

      大多数人认为美国AI实验室会始终保持技术领先优势并公开承认自己的不足,但作者暗示美国实验室(尤其是Meta)只会通过大量投入资金来掩盖技术差距,而非公开承认落后。这种观点挑战了人们对美国科技企业透明度和创新能力的传统认知。

    1. Reusing code instead of repeating code: When we find ourselves repeating a set of actions in our program, we end up writing (or copying) the same code multiple times. If we put that repeated code in a function, then we only have to write it once and then use that function in all the places we were repeating the code.

      This explanation is clear and effective, directly highlighting the core benefit of functions—eliminating redundant code and simplifying maintenance—with a straightforward, relatable example that makes the concept easy to grasp.

    1. WeirdML V2 places models in an unusually resource-constrained environment: models get only five attempts to submit working code, with no access to external tools. This setup has not been the focus of recent RL training.

      大多数人可能认为所有AI评估指标都会反映相同的进步趋势,但研究发现WeirdML V2指标没有显示加速,因为它设置了资源限制环境,而近期强化学习训练并未关注此类设置。这表明AI进步可能受评估方法的影响。

    1. Writing code is not the same as software development. This is only capturing some level of acceleration while writing code, and does not capture time taken in architecture, debugging, review, and deployment.

      大多数人认为高AI代码生成比例意味着软件开发效率的大幅提升,但作者指出这只是编码阶段的加速,不包括架构设计、调试、审查等更耗时的环节,因此高AI贡献比例并不等同于整体生产力的提升。

    2. So even though I did 100% of the writing and 50% of the refactoring, Windsurf reports that 100% of the code I produced in that session was generated by AI.

      大多数人认为代码生成工具的指标应该反映实际使用情况,但作者展示了即使开发者100%手动编写代码,Windsurf仍会报告100%的AI贡献,这表明其指标系统存在根本性缺陷,完全扭曲了实际贡献比例。

    1. Some proposals for AI agents assume that putting agentic code in a TEE or similar 'jail' will solve these problems, but that ignores the need to collectively bargain

      大多数人认为通过技术手段(如可信执行环境)可以解决AI代理的信任问题,但作者认为这忽视了集体谈判的必要性。这个观点挑战了技术解决方案的万能论,强调了制度设计和多方协商的重要性。

    1. As part of the investigation, we back-tested Code Review against the offending pull requests using Opus 4.7. When provided the code repositories necessary to gather complete context, Opus 4.7 found the bug, while Opus 4.6 didn't.

      借机夸一下自己的新模型。

    1. Out of 28 paid and 400 free routers: > 9 injected malicious code into tool calls > 17 touched researcher-owned AWS credentials > 1 drained $500k from an Ethereum wallet

      大多数人认为付费API路由器比免费路由器更安全,但作者的研究表明即使是付费路由器也存在严重安全风险,因为无论付费与否,这些中间服务都有能力访问和操纵所有数据。这挑战了人们对'付费等于安全'的普遍认知。

    1. The screen you're reading this on is already presenting you an image, it's just generated with rigid code and rules that makes it difficult to communicate complex and detailed ideas.

      大多数人认为我们当前的屏幕显示是由代码和规则构建的功能性界面,但作者认为这已经是图像,只是被 rigid code 限制,这一观点挑战了我们对UI本质的理解,暗示所有界面本质上都是视觉表现,只是灵活度不同。

    1. In addition to empowering developers and agents to handle project setup and boilerplate code, we've also designed these new tools and resources to make it easier to transition to Android Studio.

      大多数人认为CLI工具和AI代理会取代传统IDE成为开发主流。但作者暗示这些工具只是过渡到Android Studio的桥梁,最终仍需使用IDE完成高质量应用,这与'CLI将取代IDE'的主流预测相悖。这种观点挑战了开发工具演进方向的行业共识。

    2. Whether you are using Gemini in Android Studio, Gemini CLI, Antigravity, or third-party agents like Claude Code or Codex, our mission is to ensure that high-quality Android development is possible everywhere.

      大多数人认为不同AI代理工具之间存在显著性能差异,需要针对特定场景选择最佳工具。但作者暗示任何代理都能实现高质量开发,这与行业共识相悖。这种观点可能会挑战开发者社区对不同AI代理工具性能差异的传统认知。

    1. agent-written code introduces more security vulnerabilities than code authored by humans

      大多数人认为AI编程助手能提高代码质量和安全性,但研究发现AI生成的代码实际上比人类编写的代码引入更多安全漏洞。这一发现与AI能减少编程错误的普遍认知相悖,挑战了AI在安全领域的优越性假设。

    2. coding patterns are bimodal: in 41% of sessions, agents author virtually all committed code ('vibe coding'), while in 23%, humans write all code themselves.

      大多数人认为AI编程助手与人类是协作关系,各有所长,但作者发现实际使用呈现两极分化模式——要么几乎完全依赖AI生成代码('vibe coding'),要么完全拒绝AI而完全手动编写。这种非连续的采纳模式挑战了人们对人机协作的常规认知。

    1. existing agent protocols (e.g., A2A and MCP) under specify cross entity lifecycle and context management, version tracking, and evolution safe update interfaces, which encourages monolithic compositions and brittle glue code.

      大多数人认为现有的代理协议已经足够成熟且能有效管理复杂系统,但作者认为当前主流的代理协议(如A2A和MCP)存在严重的规范不足问题,这会导致系统变得脆弱和难以维护。这是一个反直觉的观点,因为行业通常认为这些协议已经相当完善。

    2. However, existing agent protocols (e.g., A2A and MCP) under specify cross entity lifecycle and context management, version tracking, and evolution safe update interfaces, which encourages monolithic compositions and brittle glue code.

      大多数人认为当前的智能体协议已经足够完善,能够有效管理复杂的AI系统。但作者认为现有协议存在严重不足,特别是在实体生命周期、上下文管理和版本控制方面,这会导致系统变得脆弱和难以维护。这是一个挑战行业共识的观点,因为许多研究者可能认为现有框架已经能够处理这些挑战。

    1. The real-world weighted ratio (1.325x) lands near the top of their range. Individual file types exceed it — CLAUDE.md at 1.445x, technical docs at 1.473x. That's the useful finding: the top of the documented range is where most Claude Code content sits, not the middle.

      这一发现挑战了我们对文档和营销声明的常规解读方式。通常我们假设厂商提供的范围是合理的中间值,但实际使用情况往往接近最坏情况。这表明技术文档中的'范围'可能更多是营销策略而非实际预期,用户应该基于最坏情况而非平均值进行规划,这违背了我们对文档准确性的基本信任。

    2. Code is hit harder than unique prose (1.29–1.39x vs 1.20x). Code has more repeated high-frequency strings — keywords, imports, identifiers — exactly the patterns a Byte-Pair Encoding trained on code would collapse into long merges.

      这一发现挑战了我们对代码token化的常识认知。通常我们认为代码有更多重复模式应该更高效token化,但事实相反。这表明代码的语义复杂性超越了简单的重复模式,需要更细粒度的处理。这一反直觉结论对代码生成和代码理解模型的优化方向提出了新思考。

    1. The system prompt also encourages certain behaviors, such as always providing code snippets in Markdown.

      这展示了一个令人惊讶的设计决策:Anthropic强制要求代码必须以Markdown格式输出,这实际上限制了AI与代码交互的自然性。对于追求原生代码体验的开发者来说,这形成了一个意外障碍,挑战了'AI应该适应开发者需求'的常识。

    1. Discovery should focus on trust boundaries, authentication flows, parsers, shared services, and legacy code that still sits on critical paths.

      这一建议挑战了传统安全扫描的广度优先方法,转而强调深度优先的特定领域。这表明AI安全研究应该更关注那些传统方法难以发现的复杂逻辑问题,而不是简单地扫描所有代码。这种转变可能带来更有效的安全投资回报。

    2. Public models can already spot that a security-relevant check is missing in the right code path, but they can still miss the actual invariant being violated and therefore misstate the impact.

      这一发现揭示了公共模型在安全分析中的一个关键局限:它们能发现缺失的安全检查,但可能无法正确理解被违反的实际不变量,从而错误陈述影响。这挑战了'AI能完全理解安全含义'的假设,强调了人类专家在解释AI发现中的不可替代性。

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      …other neurons such as AWB, AWA, and ADL are also involved in the coding process. These neurons likely communicate with different interneurons to contribute to 1-octinduced outputs. The authors' conclusion that loss of tax-4 reduces attractive responses and that osm-9 mutants reduce repulsive responses is not entirely convincing. TAX-4 is required for both AWC (an attractive neuron) and AWB (a repulsive neuron), and osm-9 is essential for ASH, ADL, and AWA (attraction-associated). Therefore, the observed effects on the attractive and repulsive responses could be more complex. Additionally, the interpretation of results involving the use of IAA to reduce the contribution of AWC at lower concentrations lacks clarity. A more effective approach might involve using transgenically expressed miniSOG or histamine (HisCl1) to specifically inhibit AWC neurons.

      We agree that the sensory inputs into chemotactic behavior are likely more complex, involving other neurons besides ASH and AWC. We now explicitly discuss possibility in the Discussion (lines 449-467).

      We have also utilized transgenically expressed HisCl1 in ASH and AWC to address this concern. Crucially, we observe that some of the effects of the broad mutations are reproduced by inactivating ASH and AWC. This finding validates our overall hypothesis that sensory-driven behavior is a balance of simultaneous afferent inputs of opposite valence AND shows that ASH and AWC are involved as expected. We are currently performing a comprehensive analysis of sensory inputs into locomotory decision making, including the neurons mentioned in the Reviewer’s comment.

      We also agree that using IAA is not a very clean way to inactivate AWC. The AWC HisCl results referenced above should alleviate this concern. However, the IAA result does put our findings into a broader context of multi-sensory integration which demonstrates the potential usefulness and selective advantages of the dual-input coding architecture that we are hypothesizing.

      Furthermore, they did not observe significant entrainment of AIB activity with the 2.2 mM 1-oct application. This might be due to the animals being anesthetized with 1 mM tetramisole hydrochloride, which could affect neural activity and/or feedback from locomotion. 

      We now mention these caveats “It is possible that immobilization and anesthetization may be affecting AIB responses to sensory activity and/or proprioceptive feedback from locomotion. However, it is also possible that motor feedback from RIM was obscuring the sensory signal.” Line 357

      It is unclear whether subtracting AVA activity from AIB activity provides a valid measure. Similarly, it is unclear how the behavioral data from freely moving worms compares to the whole-network calcium imaging results obtained from immobilized worms.

      Ray and Gordus 2025 (Current Biology 35:5534) recently demonstrated that AIB activity can be modeled as the additive convolution of AVA, AWC, and AIA activity, lending validity to our subtractive approach. In their study, AVA was the major contributor, but addition of AWC and AIA signals (i.e. sensory inputs) resulted in a significant greater accuracy. We have now mentioned their work in the manuscript (line 363) “To address this possibility, we subtracted AVA activity, representing the motor state, from the AIB activity (AVA closely mirrors RIM), based on the observation that AIB activity can be modeled as the sum of convolutions of motor activity and sensory activity.” (lines 360-363)

      The relationship between network activity in freely moving worms and immobilized worms has been explored by Kato et al 2015 (Cell 163:656-669); we now refer to this work on line 131 “These transitions are related to network state changes which drive spontaneous reversals during foraging in freely moving worms. Immobilization and anesthetization, necessary for confocal imaging, distort certain aspects of these motor command sequences compared to freely moving worms executing the motor commands and receiving proprioceptive feedback. However, the intrinsic motor programs remain intact under these conditions.” (lines 131-136)

      Reviewer #2 (Public review):

      tax-4, but not osm-9 mutants were used in chemotaxis and imaging assays. It would have been nice to have osm-9 results as well for these assays. The mutants are not specific to AWC and ASH. Cell-specific rescue of these neurons would have strengthened the proposed model.

      Osm-9 data are now included in the chemotaxis assays (Fig. 4E).

      Cell-specific HisCl data are now included for ASH and AWC (Fig. 4F, G, 5D), confirming our proposed model.

      Limited tax-4 data were included in the imaging (Fig. 6), but unfortunately, NeuroPAL imaging in tax-4 has proven to be technically difficult. NeuroPAL images in the tax-4 background appear different, perhaps because of developmental effects on gene expression due to the lack of sensory input (recall that the NeuroPAL color scheme is based on the relative expression levels of 40+ neuronal promoters). Inactivation of individual sensory neurons using HisCl1 or other transgenes may be the simpler approach.

      The Results and Discussion have been significantly rewritten to incorporate these new data

      We are currently working on a comprehensive study of the sensory inputs into locomotory decision making in the context of chemosensation, which we expect to reveal roles of other neurons besides ASH and AWC and provide a fuller picture of the complexities of this system.

      Reviewer #3 (Public review):

      (1) It is not clear precisely how important AWC is (compared to other cells) for the attractive response, though the presence of odor-off behavior implicates it. This could be resolved by looking at additional mutants (tax-4 is broad).

      We have addressed this concern using transgenically-expressed HisCl1 which has demonstrated a clear role for AWC in overall chemotaxis and locomotory decision making upon encountering the 1-oct/buffer interface in microfluidics devices (Fig. 4F, G, 5D).

      (2) Relatedly, dose-dependent chemotaxis data (Figure 4C, D) should be provided for osm-9 animals to get a sense of the degree to which dose-dependence is explained by ASH.

      Osm-9 data now included (Fig. 4E)

      The Results and Discussion have been significantly rewritten to incorporate these new data

      (3) Figure 4A, B should include average traces with errors, as there are several ways the responses can vary across conditions.

      Averaged traces with error bars now shown (Fig. 4A, B)

      (4) The data in Figure 6G does not appear to have error bars.

      Error bars now shown for 6G

      Also, it would help to include a more conventional demonstration of AIB responding to stimuli (e.g. averaging stimulus-aligned responses as a percent of the fluorescence value at stimulus onset to perform the desired subtraction).

      Fig. 6G top panel shows the stimulus-aligned responses of AIB with no subtraction performed. The 6 sequential stimulations are shown as a single continuous trace, consistent with the experimental protocol utilized. Averaging was performed across the 12 individuals of the sample set. However, we did not calculate the average of responses within a dataset (i.e. first plus second plus third etc.) to avoid obscuring any sensitization/desensitization that might be occurring with multiple stimuli.

      Subtracted calcium traces are harder to interpret. As it stands, the evidence that sensory signals are persisting in AIB and not being shunted by proprioceptive feedback in microfluidic devices is not strong.

      Addressing the point about proprioceptive feedback in microfluidics devices, the following sentence was added in the Results section: “Immobilization distorts certain aspects of these motor command sequences compared to freely moving worms executing the motor commands and receiving proprioceptive feedback, but the intrinsic motor programs remain intact.” (lines 131-136).

      To add context for the AIB-AVA subtraction, Ray and Gordus 2025 (Current Biology 35:5534) recently demonstrated that AIB activity can be modeled as the additive convolution of AVA, AWC, and AIA activity, lending validity to our subtractive approach. In their study, AVA was the major contributor, but addition of AWC and AIA signals (i.e. sensory inputs) resulted in a significant greater accuracy. We have now mentioned their work in the manuscript: “To address this possibility, we subtracted AVA activity, representing the motor state, from the AIB activity (AVA closely mirrors RIM), based on the observation that AIB activity can be modeled as the sum of convolutions of motor activity and sensory activity.” (lines 360-363)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 1: The number of replicates (n) is missing.

      In Fig. 1D, only a single trial is shown as a representative example rather than averages, which would necessitate error bars. The Results and Figure Legend text has been updated to clarify this, and the average CI is now included in the first Results section (lines 111, 976)

      Figure 4: The sample size (n = 3-5) is relatively small, which may limit the statistical power.

      Sample size was increased to 5 for all data points shown on the new graph (Fig. 4E and noted in the figure legend (line 1019)

      Figure 4: The 0.22 mM concentration significantly affects both AWC and ASH. It is also unclear whether this concentration also affects other neurons, such as AWB, ADL, and AWA.

      We have not performed exhaustive analysis of other neurons in these datasets. These analyses are difficult and time consuming, so we have opted to present a dataset which supports our hypothesis that multiple afferent pathways of opposite valence act in a balanced way to drive chemotaxis. We are currently performing an in-depth analysis of the sensory inputs into the circuit, which we expect to present in a future study

      Reviewer #2 (Recommendations for the authors):

      The tax-4 and osm-9 experiments are great, but I recommend clarifying that tax-4 and osm-9 are expressed in other neurons as well. The text gives the impression that these mutants are specific to AWC and ASH, respectively. The authors should note these caveats.

      This concern is thoroughly addressed in the descriptions and rationale presented for the use of ASH and AWC HisCl strains.

      The authors should also provide the code used to interpret their results.

      Code will be provided through Zenodo.org

      Reviewer #3 (Recommendations for the authors):

      It would help to clarify (early on) the degree to which you are attributing responses to particular cells (e.g. AWC) as opposed to a class of cells with AWC as an example.

      This concern is thoroughly addressed in the descriptions and rationale presented for the use of ASH and AWC HisCl strains.

      The NeuroPAL imaging and analysis (especially Figures 3D, E) is a bit distracting and appears non-essential. If possible, it would help to combine Figures 2 and 3 with a focus on panels 3ABC to streamline the narrative.

      We would prefer to keep the present format so the reader can appreciate the power of the whole-brain approach for analyzing network activity and behavioral outputs in the context of sensory-motor responses. Specifically, our insight that attractive and aversive afferent inputs were activated simultaneously was wholly dependent on this approach. Otherwise, there would have been little to no reason for examining AWC activity at aversive 1-oct concentrations, which was essentially the foundation of the study.

      To highlight this point, we have added the following sentence in the Discussion: “This novel insight highlights the value of the whole-brain approach (enabled by the NeuroPAL system) for studying the network dynamics underlying sensory driven behaviors.” Lines 431-433.

    1. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      The Authors test the hypotheses, using and effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      There is some concern about the fact that participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. The concern is that participants worked with less vigor during self-versus-others trials and this may partly account for a key two-way Recipient x Effort interaction on the size of the Reward Positivity EEG component. Of note, participants took longer to complete tasks when working for others. While it is true that, in all cases, participants met the requisite task demands (they pressed the required number of buttons) they did so more sluggishly when earning rewards for others. The Authors argue that this reflects less motivation when working for others, which is a plausible explanation. The Authors also try to rule out this diminished vigor as a confounding explanation by showing that the two way interaction remains even when including reaction times (and also self-reported task liking) as a covariate. Nevertheless, it is possible that covariates do not fully account for the effects of differential motivation levels which would otherwise explain the two-way interaction. As such, I think a caveat is warranted regarding this particular result.

    2. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      There is some concern about the fact that participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. The concern is that participants worked with less vigor during self-versus-others trials and this may partly account for a key two-way Recipient x Effort interaction on the size of the Reward Positivity EEG component. Of note, participants took longer to complete tasks when working for others. While it is true that, in all cases, participants met the requisite task demands (they pressed the required number of buttons) they did so more sluggishly when earning rewards for others. The Authors argue that this reflects less motivation when working for others, which is a plausible explanation. The Authors also try to rule out this diminished vigor as a confounding explanation by showing that the two way interaction remains even when including reaction times (and also self-reported task liking) as a covariate. Nevertheless, it is possible that covariates do not fully account for the effects of differential motivation levels which would otherwise explain the two-way interaction. As such, I think a caveat is warranted regarding this particular result.

      We thank Reviewer #1 for the continued positive assessment and for continuing to highlight the caveat regarding the potential influence of differential vigor on the observed RewP interaction effects.

      We agree that a caveat is warranted. As detailed in our previous response (R5), we had already conducted control analyses addressing this concern; however, we acknowledge that these results were not incorporated into the manuscript itself. We have now addressed this by adding the covariate analyses to the Result section, along with an explicit caveat in the Discussion.

      Before describing the specific revisions, we would like to offer a minor clarification: the covariates in our control analyses were trial-by-trial response speed and self-reported effort ratings, rather than task liking ratings as noted in the summary above. Neither response speed nor effort rating predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged. However, as the reviewer rightly pointed out, covariates may not fully capture the effects of differential motivation. Specifically, we have made the following revisions:

      First, we added the covariate control analyses to the Result section: “To rule out the possibility that the differential vigor between self- and other-benefiting trials drove the Recipient × Effort and Recipient × Effort × Magnitude interactions on the RewP, we conducted two control analyses by including trial-by-trial response speed and subjective effort ratings as separate covariates in the RewP model. Neither response speed (b = -0.07, p = .641) nor effort rating (b = 0.10, p = .186) predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged (see Supplementary Table S3 for full regression estimates)” (page 12, para. 1).

      Second, we added a caveat to the Discussion section acknowledging this alterative explanation, which reads, “Another concern is that participants exhibited less vigor when working for others, as indicated by slower response speed and lower subjective effort ratings for other- versus self-benefiting trials. Although our control analyses confirmed that neither covariate predicted RewP amplitudes and the critical interactions remained significant, covariates may not fully capture the effects of differential motivation, and this alternative explanation cannot be entirely ruled out” (page 22, para. 2, lines 9–12; page 23, para. 1).

      Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      We sincerely appreciate Reviewer #2’s positive evaluation of our manuscript and thank the reviewer for recognizing the strength of our experimental design and analysis approach.

    1. Reviewer #2 (Public review):

      Summary:

      The manuscript by Sajid et al. describes a comprehensive behavioral, imaging, and optogenetic dataset investigating the role of the mPFC in avoidance and escape behaviors. Although many movement- and task-related variables are encoded by mPFC GABAergic neurons, the main conclusion is that they are unlikely to control behavioral output.

      Strengths:

      The manuscript is generally well executed and plausible in its conclusions. It provides an alternative viewpoint to many articles describing the involvement of mPFC in behavior, based on a complex multi-stage behavioral paradigm acquired and analyzed in an unbiased way.

      Weaknesses:

      This reviewer sees three main weaknesses.

      (1) There are few details on the linear mixed models in the methods. This section could be improved by including a mathematical description. More importantly, the reader never learns how accurately the models capture the data. Given that most conclusions rely on the models, it seems central to address this point carefully. For example, what is the explained variance, marginal, and conditional? Were the nested models compared to non-nested ones (e.g., AIC), what are the specific outputs of the likelihood ratio tests briefly mentioned in the methods?

      (2) For several figures, there is a disconnect with the main text, in the sense that it is difficult to understand how statements in the main text connect with specific figure panels or bars in their graphs. This is particularly the case for the most complex figures, e.g., Figures 3, 4, and their supplements. It would be beneficial to introduce subfigure labels (A1, etc) and state explicitly in the main text what figure panel is described (in parentheses). Alternatively breakdown the figures into multiple ones, decreasing ambiguity. This is important because it will help the reader better assess the strength of the results.

      (3) It does not appear that the code and data used to produce the figures are made available. That would be very beneficial, given the complexity of the analysis and dataset collection procedures. It would also help readers better understand the results and probe their validity.

    1. The new Go codebase was methodically ported from our existing implementation rather than rewritten from scratch.

      通常情况下,升级到一个新版本时,人们会预期代码会被重写,但作者表明 TypeScript 7.0 的 Go 代码库是从现有实现逐步迁移过来的,而不是从头开始。

    1. The AI has learned to code. The AI is building itself.

      大多数人认为AI只是人类创造的工具,需要持续人类监督和改进。作者提出AI已经具备了自我进化和自我构建的能力,这一观点挑战了AI作为被动工具的传统认知,暗示了技术自主性的可能性,这与大多数人对AI发展的预期相悖。

    1. Author response:

      General Statements

      We thank the reviewers for their careful and supportive reviews of our manuscript. We have addresses all the reviewers comments and extensively revised the manuscript accordingly.

      During our revisions, we discovered a bug in the code that calculated the linear genomic distance between the captured promoter regions (bait regions) and the promoter-interacting fragments (PIFs). The error inadvertently halved the distance measurements in the output tables. This has been corrected in the revised manuscript and has resulted in updates to Figure 1B and corrected values in the ‘interaction_distance’ and/or ‘interaction_type’ columns of Supplementary Tables 2, 3, 6 and 8. We thank the reviewers for the opportunity to correct this.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      In this article, the authors conducted promoter-capture HiC experiments (pcHiC) in Mouse Cerebellar granule cell progenitors (GCps) and obtained a good set of 3D genome interactions map of protein-coding genes' promoters. This dataset was later integrated with ATAC-seq and ChIP-seq experiments to identify putative enhancer regions within promoter-interacting regions, and with higher base-pair resolution than what is obtained by pcHiC experiments. This set of enhancers is then compared to and presented as being more reliable than those present in VISTA enhancer database. In addition, ATAC-seq sites and RNA-seq datasets, both obtained in WT and CHD7 and KO conditions, are integrated to correlate expression of a set of genes to the chromatin accessibility of their distal enhancer(s) which is believed to be promoted by CHD7. The study is completed by focusing on transcription factor motif analysis on CHD7-regulated enhancers which shows an enrichment for proneural transcription factors, with special emphasis on Atoh1 found to be frequently co-recruited with CHD7. Data and methods are well detailed and correctly replicated and will be useful as a resource for the community. The overlap obtained between pcHiC experiments and auto-criticized by the authors is very common and expected in this kind of experiments. In general, the conclusions drawn the article are convincing but some aspects such as comparison to VISTA and the naming of 'enhancers' should be moderated.

      We thank the reviewer for their positive and constructive comments. We have amended the manuscript as indicated in detail below.

      (1) The comparison of pcHiC-identified enhancers vs. VISTA enhancers should be more balanced, as the two approaches have important conceptual differences. Although VISTA enhancers are based on functional annotation, their target genes might not necessarily be correctly assigned based on the distance. On the other hand, putative enhancer regions identified by pcHiC experiments do not rely on functional testing. So both type of information are useful but can be put in perspective.

      We thank the reviewer for making this point. We have amended the text to present a more balanced view e.g. “Using VISTA-designated hindbrain enhancers as an example, we identify the genes most likely regulated directly by these enhancers and update their annotation accordingly.”

      (2) To increase the strength of the paper, it would be preferable that authors include simple functional enhancer assays (e.g. CRISPR deletion of contacting enhancer, luciferase assay) to support their perspective since 3D conformation information in KO condition is lacking in the article. Although ideally these experiments should be better performed for a full demonstration, it would be acceptable to at least include a simple functional assay in the WT context to demonstrate that the regulatory regions obtained by crossing genomic data are real enhancers. This point is even more critical knowing that enhancers lacking classical histone marks (H3K27ac+H3K4me1) has been described. The same comment applies to promoter interacting fragments lacking these marks, that could be missing enhancers (i.e enhancers without these marks).

      To address this point, we performed luciferase assays to show that putative enhancers identified with our integrated bioinformatic approach (pcHi-C + ATACseq + H3K4me1 + H3K27ac) do indeed exhibit enhancer activity. For these experiments, we tested these putative fragments in an immortalized cell line SHH-NPD, a GCp-derived cell line generated by Fults laboratory (Jenkins et al. 2014). The results of these experiments are included as Suppl. Fig. 1 in the revised manuscript.

      Minor point

      - Figure 5B is lacking labels.

      We apologise for this oversight – labels have now been added.

      Reviewer #1 (Significance):

      This article, when completed with possible revision, will be be useful for the community in terms of useful resource of experimentally determined putative enhancers in Cerebellar granule cell progenitors. It also provides some insights into the association of CHD7 and Atoh1 in distal regulation in these cells.

      We thank the reviewer for acknowledging the significance of our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, the authors aim to identify active, long-range regulatory interactions in cerebellar granule cell progenitors (GCps). As such, the authors perform promoter capture Hi-C to map long-range interactions for all gene promoters, using cells isolated from P7 mouse brain samples. While the resolution of these maps is limited by the relatively large fragment sizes generated from a 6-bp cutter, the authors combine these interactions with other available published datasets, including from their own previous work, (e.g. ATAC-seq and ChIP-seq) to more precisely map putative enhancers within the long-range interacting regions of captured promoters. The paper further focuses on the importance of transcription factor Atoh1 and chromatin remodeler CHD7 in regulation of these putative enhancers in GCps. The authors suggest a direct interaction between CHD7 and Atoh1 by overexpression and co-immunoprecipitation in human embryonic kidney cells.

      As stated by the authors, this study represents a valuable resource for researchers interested in the identification of enhancers in GCps cells, and their linked target genes. While broadly descriptive, the study does highlight some gene loci of interest and of biological relevance. For example, through integration of previously published datasets, the study resolves which putative regulatory elements at the Reln locus may regulate its activity.

      We thank the reviewer for their supportive comments.

      We provide a summary of our major and minor comments here.

      Major comments:

      (1) The main take-home messages of the manuscript could be more clearly stated in the introduction to help readers understand the main conclusions of the work.

      We have added a sentence to the Introduction to clarify the key take-home messages:

      “We report putative distal regulatory elements for >12,000 genes, identify CHD7- and Atoh1-regulated enhancer elements and show that these factors interact and likely co-regulate the expression of key genes in the GCp lineage.”

      (2) In the discussion, a previous Hi-C dataset is referred to "Reddy et al. annotated 5,175 promoter-enhancer interactions in GCps using Hi-C without enrichment (Reddy, Majidi et al. 2021)." It would be beneficial to compare the interactions identified previously with the current study (5,175 vs 46,428 interactions).

      To address this comment we have performed an additional analysis and include text and Suppl. Figure 3 and Suppl. Table 13 to demonstrate the extent the two datasets compare, overlap and diverge. We have also added additional text to the discussion to highlight the difference and technical considerations between the two approaches and how they complement each other.

      The 5,174 enhancer-promoter (E-P) interactions identified by Reddy et al were downloaded and intersected with the 46,428 promoter-accessible PIF regions identified in our study. The new supplementary Figure 3A illustrates that 82% (843/1207) of genes that Reddy et al identifies long-range interacting regions for are represented in our pcHiC dataset. Our pcHiC data contains information on distal interacting regions and potential enhancer regions for an additional 11,511 protein coding genes. Suppl. Figure 3B provides an overview of the Reddy et al E-P interactions that are, and are not identified in the pcHiC. We replicate 38% of Reddy et al’s E-P findings, whilst 53% of the 3229 interactions unique to the Reddy data would not be detected in the pCHiC data due to technical reasons resulting from the capture design and analysis protocol. Of the remaining interactions that are specific to the Reddy data, we identify other distal regions interacting with those same promoters . Suppl. Table 13 details the full comparision of Reddy’s E-P interactions that are found within our dataset.

      The differences between the two datasets and the increased number of interactions detected in the pcHiC dataset likely result from the increased enrichment for the captured promoters enabling the detection of interactions that would have been below the detection threshold for the HiC study. In addition there are notable differences in analysis strategies for the two datasets which also contribute to differences in detection of regions. Reddy et al binned the HiC data into 10Kb regions to identify interacting regions and subsequently used chromatin marks to identify possible enhancer and promoter regions within these large regions. In contrast we have used the pCHiC and CHiCAGO algorithm to identify individual HindIII restriction fragments that are proximal to targeted promoter regions (PIFs), and prioritised those that have accessible regions within them which could represent various types of regions that play regulatory roles such as enhancers, CTCF site or facilitator regions, independent of their chromatin mark composition rather than focusing solely on enhancers.

      (3) The authors identify an overlap with some of their identified enhancers with those from VISTA. Is this a fair comparison seeing as the enhancer reporters were tested during early embryonic development (e.g. E11.5 and E13.5) and seen to be active in the hindbrain, would these stages be relevant to GCps from P7? Can the authors identify ATAC-seq for example from hindbrain from embryonic stages and determine if the enhancer accessibility profile looks similar to that for the P7 GCps cells?

      We thank the reviewer for this important question regarding the developmental relevance of our VISTA comparison and acknowledge that direct comparison between the time point requires careful consideration. Firstly ,to address the question of how similar the chromatin accessibility profiles are between the embryonic and P7 timepoints, we compared the ATAC-seq data from our paper to ENCODE data from the hindbrain. Of the 140 vista enhancers that were intersected with the pCHi-C dataset, 119 were identified from the lacZ studies as active in the hindbrain at E11.5 whilst 21 were identified as active at timepoint E12.5. We compared ENCODE ATAC-seq peaks from the E11.5 (ENCFF743IYX) and E12.5 ( ENCFF198TLF) hindbrain to the GCps from P7 across both the entire genome (global accessibility) as well as specifically +/- 3MB around the VISTA enhancer regions in the PIFs from the pCHiC to assess the conservation of local accessibility profiles.

      When looking at the global accessibility profile of embryonic hindbrain versus P7 GCps across the whole genome there was a large degree of overlap with ~85% (E11.5) and ~88% (E12.5) of all ENCODE ATAC peaks overlapping with accessible ATAC summit regions from P7 GCps:

      Author response image 1.

      To identify if this was consistent in the immediate chromatin environment of the VISTA enhancers themselves, we compared the accessibility profiles across timepoints in the local environment surrounding the VISTA enhancers. This local environment was defined as a region that added an additional 3MB on either side of all VISTA enhancer positions found in PIFs. 3MB was chosen as the longest interaction found for a single VISTA element was approximately 2.7MB. Consistent with the global analysis a similarly high level of overlap of accessible regions between the timepoints was found for the local chromatin environment in surrounding the VISTA enhancers that were found within PIFs in the pCHiC dataset with ~87% (E11.5) and ~89% (E12.5) of encode detected peaks overlapping with accessible ATAC summit regions from P7 GCps.

      Author response image 2.

      Regions +/-3MB of VISTA enhancers in PIFs

      Author response image 3.

      Regions +/-3MB of VISTA enhancers in PIFs

      Genome browser shots at the three example VISTA loci from Figure 1 further support this approach. In addition to this we also note that a recent study by Chen et al (2024 https://www.nature.com/articles/s41588-024-01681-2) where capture-HiC performed at E11.5 of 935 VISTA enhancers across multiple tissues confirmed that the majority of VISTA enhancer regions (61%) bypass adjacent genes which is consistent with our nearest gene comparison.

      (4) The co-IP experiment appears to support the conclusion that Atoh1 and CHD7 can interact, however there are bands in lanes where there should not be (i.e. Input lanes 1 and 4 for FLAG blot). It would be recommended to repeat this result at least once. [Expected time 2-4 weeks].

      This experiment has been repeated 3 times with the same result. It is normal for non-specific background bands to appear on Western blot from total cell lysates (inputs) as most antibodies have significant cross-reactivity. The anti-FLAG antibody clearly detects bands above background in lysates where FLAG-tagged CHD7 is expressed. Most critically, despite the presence of non-specific bands in input, FLAG-tagged CHD7 is only detected in immunoprecipitated samples where either FLAG-tagged proteins have been precipitated and FLAG-tagged CHD7 is expressed and HA-tagged Atoh1 has been precipitated when both FLAG-tagged CHD7 and HA-tagged Atoh1 are expressed.

      (5) The methods section describes analysis of several datasets, however we could not access the code at the time of review. Do the authors intend to make this code available at the time of publication?

      Yes once the publication is approved all code will be made available along with conda environment yaml files to replicate the software environment in which the analysis was performed.

      (6) Page 7 "replicate one and two, respectively". Can the authors clarify the number of biological replicates performed for pcHi-C?

      Two biological replicates were performed for pcHiC which were then bioinformatically combined into a ‘superset’ for CHiCAGO interaction calling as is standard practice for pcHiC data (see e.g. Cairns et al, 2016. We have revised the text to make this clearer.

      Minor comments:

      (1) Page 3 "controlling the expression of 577 genes in GCps" - the authors do not provide evidence that these enhancers control gene expression directly, this should be reworded.

      Thank you. We have reworded to: “contacting the promoters of 577 genes” to indicate that these were identified using pcHi-C and not functional assays.

      (2) Page 5 "where transient amplifying divisions exponentially expand GCps" - at what stages of embryonic/postnatal development are GCps first detected, and when do they amplify and then differentiate?

      GCps that form the EGL are specified in the rhombic lip from E13.5 (Machold, 2005 and Wang, 2005) and a clear EGL can be observed in the cerebellar anlage from E14 (Ben-Arie, 1997) of development. They amplify from this stage and differentiation, induced by neurogenic factors like NeuroD1 is visible from P0 onwards (Miyata, 1999). We have amended the text to include this additional information: “GCps that form the EGL are specified in the rhombic lip from E13.5 (Ben-Arie et al, 1997; Machold & Fishell, 2005) and a clear EGL can be observed in the cerebellar anlage from E14 (Ben-Arie et al., 1997) of development. They amplify from this stage and differentiation, induced by neurogenic factors like NeuroD1 is visible from P0 onwards (Miyata et al, 1999).”

      (3) Page 7 "identified 164,387 unique and significant interactions" - how is an interaction defined, a single read, or evidenced by a certain number of reads. "promoter interacting fragments or PIFs" - is PIF referring to a single read evidencing an interaction?

      An interaction is defined by the CHiCAGO algorithm. The number of reads needed to score an interaction depends on the both the distance away that PIF is from the promoter (this is modelled using a distance-dependent component that accounts for decay of contact frequence with genomic distance) and also includes a component that models how the sequence or other technical artifacts might influence the capture bias of some sequences compared to others. For each promoter a background model is generated of the expected number of reads that would be captured based on the above considerations and if the number of reads for those regions exceeds this background model by a certain threshold the interaction is deemed significant using a p-value like score. In practice this means that regions further from the promoter will often require less reads to signify a significant interaction compared to regions that are much closer to the promoter. The significant PIFs in the dataset are all evidenced by a minimum of 3 reads in at least one biological replicate. We have included a short explanation of this in the methods of the revised manuscript for clarity.

      The maximum reads in a single replicate library for a specific PIF was 1557, and the median number of reads per PIF was 17.

      (4) Page 8. What is the distinct between PIFs and "promoter interacting regions (PIRs)"? These could be better defined in the text.

      Thank you for picking up this discrepancy, we were using PIR and PIF interchany. We have amended the manuscript to refer to PIFs consistently throughout.

      (5) Figure 1C-F. Labels "Random" and "PIFs" don't line up well with the two bars.

      Thank you, this has been corrected.

      (6) Page 9. Could the authors show some representative images for the "VISTA hindbrain enhancers" (e.g. for Figure 1I-K).

      We have inserted representative images showing in vivo activity of these enhancers in mouse embryos from the VISTA enhancer site.

      (7) Fig 2G, Page 11 "The 12,354 genes that were linked to a PIF containing an ATAC-seq peak were found to have a higher median expression level than the 2,049 genes that had PIFs that did not coincide with ATAC-seq peaks" - is this significant?<br />

      Apologies for this oversight. We have performed a two-sided t-test on the log transformed TPMs between the two groups and have included the significance in the revised figure (p=1.8 e-40).

      (8) "Gene Ontology analysis of genes with accessible PIFs revealed a significant enrichment for 119 biological processes" - can you include the GO terms in a supplementary table? Is there a way to prioritise down the 12,354 genes to a shorter more significant list of genes, this seems a long list to include in GO analysis.

      We have included a supplementary table with this data in the revised manuscript (Suppl. Table 6). We included all 12,354 genes in this analysis as the point of this analysis was to demonstrate that developmental processes are enriched in the PIFs with accessible chromatin, compared to the genes where only PIFs without ATAC were identified.

      (9) Page 11 - "The chromatin remodelling factor CHD7 is essential for normal expansion of GCps in the postnatal mouse cerebellum (Whittaker et al., 2017b) and deletion of Chd7 from GCps results in striking cerebellar hypoplasia and polymicrogyria (Feng et al., 2017; Reddy et al., 2021; Whittaker et al., 2017b). CHD7 haploinsufficiency is also sufficient to cause cerebellar hypoplasia and foliation defects both in mouse models and in the context of CHARGE syndrome in humans (Whittaker et al, 2017a; Yu et al, 2013)." - this appears more suitable for the introduction.

      Thank you, we have moved this text to the Introduction.

      (10) Page 12 "the majority of which (4,663/5,369) displayed decreased accessibility when Chd7 is depleted". This was difficult to understand initially - which are expected to be the direct effects? Increased or decreased accessibility? Perhaps it would be better to focus only on the decreased accessibility sites?

      We have previously shown that the majority of differentially accessible regions in Chd7-deficient GCps show decreased accessibility. Chromatin remodelling by CHD7 could conceptually reduce or increase accessibility of a particular locus and the only way to infer direct effects are by identifying regions to which CHD7 is recruited.

      Approximately ~9% of the sites that decreased in accessibility overlapped with regions bound by CHD7 (464/4663), whilst ~2% of sites that increased in accessibility overlapped with regions of CHD7 binding (14/706). Whilst it is likely that the majority of directly regulated sites decrease in chromatin accessibility when CHD7 is removed, the number of sites that increases in accessibility is small but observed and should be included for completeness.

      (11) The analysis in Fig 3A reveals that only a small number of CHD7-bound enhancers show differential accessibility and altered linked gene expression upon CHD7-knock down. This requires a little more discussion - why do so many sites change in accessibility compared to the number of sites which change accessibility or are associated with gene expression change?

      Identifying CHD7-regulated enhancers is challenging, mostly due to the inefficiency of CHD7 ChIP-seq. The low quality of available CHD7 ChIP-seq data has made it particularly difficult to identify CHD7 peaks. However, the integration of this data with ATAC-seq accessibility, chromatin modification and pcHi-C data has allowed us to identify a subset of enhancers that are most likely directly regulated by CHD7. However, given these technical limitations, we would be hesitant to conclude from the present data that the majority of chromatin accessibility changes in enhancers in Chd7-deficient GCps are indirect. We have added the following text to the discussion to indicate this: “Identifying CHD7-regulated enhancers is challenging, mostly due to the inefficiency of CHD7 ChIP-seq. The low quality of available CHD7 ChIP-seq data has made it particularly difficult to identify CHD7 peaks. However, integrating CHD7 ChIP-seq data with ATAC-seq accessibility, histone modification ChIP-seq and pcHi-C data has allowed us to identify a subset of enhancers that are most likely directly regulated by CHD7. However, given these technical limitations, we would be hesitant to conclude from the present data that the majority of chromatin accessibility changes in enhancers in Chd7-deficient GCps are indirect, as suggested by the data in Fig. 3A.”

      (12) Page 12 - "Over-representation analysis confirmed an enrichment of genes linked to nervous system development" - could this and the GO term analysis be included in a supplementary figure?

      We have included these results as Suppl. Table 7 in the revised manuscript.

      (13) Fig 3D - what does the arrow represent in the chromatin schematic?

      The arrow in the schematic indicates chromatin remodelling – we have clarified this in the figure legend and added headings to these panels to indicate the 3 different types of elements: Direct CHD7 targets, Indirect targets and CHD7-bound elements.

      (14) Fig 3G does not appear to be referenced in the text. The value of the Upset plots in the main figure 3 wasn't very clear, perhaps these could be moved to the supplement? Is there a clearer plot to support the conclusion "CHD7 primarily regulates enhancers".

      We apologise, the panels were mis-labeled in the text. This has now been corrected. We hope that the amendments in response to point 13 above now clarifies these findings showing that direct CHD7 targets are characterised by active enhancer marks.

      (15) Page 14 "putative consensus sites for proneural bHLH TAL-family of proteins Neurog2, Neurod2, Neurod1, and, Atoh1 in elements" - HOCOMOCO motifs are only shown for Atoh1 and Nhlh1. It may be valuable to show the sites for all the listed TFs. What does white represent in the heatmap in Fig 3H? This plot is difficult to interpret, and also relatively small in the figure but appears important to conclusions. Perhaps Fig 3H could be made more prominent?

      Thank you for highlighting that the white boxes might be confusing. The white blocks indicate that these motifs do not pass threshold for significantly enriched in the dataset based on the p and q values.This has now been clarified in the figure legend.

      We have enlarged panel H to make more prominent.

      (16) Page 15 - "Myb was the only motif specific to CHD7 bound regions that changed in accessibility compared to those that exhibited accessibility changes without CHD7 binding or CHD7 binding without accessibility changes (Suppl. Fig. 1)." I couldn't interpret this sentence, requires clarifying.

      We agree that this description is confusing and since it is difficult to draw clear conclusions about the significance of enhancers with Myb motifs in this context, we have removed this sentence from the revised manuscript.

      (17) Page 16 and Fig 4B - a discussion of why both up and down regulated genes are detected for Atoh1 depletion? Which class of genes are expected to be directly regulated (the down-regulated genes)?

      Like most transcription factors, ATOH1 may be able to function as both a repressor and activator depending on the context. Although the majority of genes are downregulated in Atoh1-defivcient cells, suggesting that Atoh1 functions as an activator in most cases, our analysis have identified several up-regulated genes that contain Atoh1 ChIP-seq peaks in their cognate enhancers (See Suppl. Table 7), consistent with these also being direct Atoh1 targets.

      (18) Fig 5B - the genomic traces are not labelled in this figure.

      Thank you, labels have been added.

      (19) Page 17 - "Pathway enrichment analysis of the 22 genes compared to all genes that were expressed in GCps shows a significant enrichment of terms: Hypoplasia of the pons (HP:0012110 P=0.006) and Abnormal pons morphology (HP:0007361 P=0.016) from human phenotype ontology, due to the presence of Reln, Dcc, Mab21l1 and Gli2." - this analysis should be included in the supplementary tables.

      These results have been included as Suppl. Table 12 in the revised manuscript.

      (20) Do the authors have a suggestion for which domains of Atoh1 and CHD7 could be interacting? Could the authors design truncated constructs for overexpression in HEK cells to test this hypothesis? [Expected time 4-6 weeks, interesting but not essential to do experimental work here].

      We agree this is an interesting question. Our collaborator, Professor Peter Scambler (UCL) has performed a yeast two hybrid screen for CHD7 interacting proteins in a mouse E11.5 library using the CHD7 BRK domain (aa 2521-2708) as bait. The screen had a single hit, which encompassed the N-term 127aa of ATOH1 (personal communication). This observation supports our co-IP data and suggests that the N-terminus of ATOH1 interacts with the BRK domain of CHD7 but further validation will be needed to confirm this.

      (21) Page 28 "Differential accessibility analysis was performed using DESeq2 (v 1.22.1)" and Page 19 "Whereas chromatin accessibility at some of these enhancers were affected by Chd7-deficiency" - what were the cutoffs used for looking at differentially accessible regions? Complete loss of accessibility or a quantitative change?

      Quantitative change rather than complete loss was used. Thresholds based on adjusted p-values (padj<0.05) were used as indicated in the methods.

      Requested comments on referencing:

      - "Long-range" - how do the authors define long-range? Can this be referenced. CO? good reference here.- look to CHiCAGO paper

      - "When chromatin conformation or 3D organisation data is not available, studies typically assign regulatory elements to the nearest gene promoter" - needs referencing.

      - "Many of these 22 genes regulated by CHD7 and Atoh1 have established critical roles in cerebellar development, including Neurod2, Pax6 and Gli2 (Fig. 5B)" - needs referencing. "from human phenotype ontology, due to the presence of Reln, Dcc, Mab21l1 and Gli2" - needs referencing.

      Thank you, references have been added.

      - "active enhancers (H3K27ac+, H3K4me1+), promoters (H3K27ac+, H3K4me3+), regulatory elements (H3K27ac+, H3K4me1+, H3K4me3+), or poised enhancers (H3K4me1+)" - needs referencing.

      Thank you, references have been added.

      - Reference required in main text for VISTA (e.g. Visel et al., 2007)

      Thank you, reference added.

      Reviewer #2 (Significance):

      The strengths of this manuscript are the integrated approach to identify cell-type specific enhancers utilizing available epigenomic datasets, and leveraging 3D genome topology to directly link them to their target genes. For example for the Reln gene previously implicated in cerebellar phenotypes for CHD7 mutants. The pcHi-C dataset generated in this study provides a valuable reference for the community of enhancer-promoter pairs for a specific cell-type of interest with human disease relevance.

      We thank the reviewer for recognising the potential value of our work to the community.

      The limitations of the study are partially addressed in the text by the authors, including the resolution from the pcHi-C using a 6-bp cutter, the limitation of sequencing depth (more interactions may have been identified with more depth), and the limitated of correlation between replicates (likely due to undersampling the library). Page 9 "some additional interactions with the nearest gene promoters might be identified in our pcHi-C dataset with deeper sequencing".

      We thank the reviewer for highlighting our acknowledgements of the potential limitations of our work.

      Additional limitations include the use of the VISTA browser mouse LacZ embryos to validate some of their enhancers, the limitation here being that the VISTA browser tests enhancers at embryonic stages (focused at E11.5 and E13.5) while the GCps cells were collected at P7. The LacZ images from VISTA are also not shown. The HEK cells used for the co-IP could be seen as a limitation as these are not relevant cells for the cell state studied, the authors could clarify their use of these cells.

      We thank the reviewer for their careful assessment of the limitations of our study. We have now included images of the VISTA enhancers in Fig. 1I,J,K. Rather than a limitation, using irrelevant cells for co-IP might be seen as a better approach, as conceivably the chances of an indirect interaction between the two proteins being tested by a bridging complex is less in an irrelevant cell types that might not contain such complexes. Either way, HEK293T cells is the standard laboratory model for co-IP studies as they can be transfected with ease.

      The study reported here is largely based on previous work from the authors (Whittaker et al 2017b). This study reported that the chromatin remodelling factor CHD7 is essential for normal expansion of GCps in the postnatal mouse cerebellum and deletion of CHD7 from GCps resulted in the phenotype of cerebellar hypoplasia. This study also largely leverages previously published datasets from the Whittaker et al 2017b (e.g. CHD7 deletion data) and reanalyses it in the light of the new pcHi-C datasets.

      This manuscript will be of interest to researchers interested in analysing long-distance targets of as well as researchers trying to understand the precise gene regulation in cerebellar development. It may also be of interest to clinical geneticists to interpret novel putative non-coding disease mutations.

      We thank the reviewer for highlighting the wide interest of our manuscript.

      In assessing this manuscript, my expertise lies in models of human development and gene regulation, with a focus on enhancer function.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Riegman et al have explored the gene regulatory landscapes of cerebellar granule cell progenitors (GCps). They have generated promoter capture Hi-C data to identify regions that interact with promoters in these cells. In addition they generate ATACseq data in wild-type and CDH7 knock-out cells. They integrate these data to identify enhancers that potentially regulate genes in GCps. In addition, the authors identify an interaction between CHD7 and ATOH1, whose binding sites also overlap in the genome.

      The dataset can be potentially interesting for people studying cerebellar development.

      I have a few concerns regarding the paper. The most pressing one is that the authors seem to equate interactions in pcHi-C with regulation. This is problematic for two reasons. First whether interaction equates regulation is still debated and whether this can be detected with a low-resolution C-method (i.e. using HindIII) is a further point of contention.

      We thank the reviewer for pointing this out. We agree and apologise for not being clear in our manuscript. We have made the necessary amendments to indicate that pcHi-C by itself only assess proximity in the nucleus, not function.

      We acknowledge the limitations of the pcHi-C method, including that resolution is limited by the use of a restriction enzyme. However, we (see e..g. Suppl. Fig. 1) and others (see e.g. Freire-Pritchett et al (2017) and Mifsud et al (2015)) have used this approach successfully to identify functional enhancer elements.

      The second issue has to do with the way the pcHi-C data is interpreted. What is detected as a significant interaction by Chicago are regions that have a contact frequence above background. This means that local regions with a (much) higher contact frequency may not be called as significant. When we follow the logic that contact frequency is related to gene activation (which may not necessarily be true) whether a fragment is more frequently contacted than the background should not matter (relative contact frequency), rather it should be interpreted based on the absolute contact frequency.

      The reviewer is right that local regions will have a higher contact frequency and that local contacts aren’t always captured by the CHiCAGO model. However, the purpose of this study was to prioritise the identification of distal elements that are not captured by existing methods including nearest gene annotation.

      There are a number of reasons why absolute contact frequency might not be an appropriate measure to infer gene regulation: 1) Many factors can affect the absolute contact frequency including the proportion of cells that are exhibiting active transcription at that time across a population, especially if expression is limited to a small number of this population at that time. 2) Absolute contact frequency assumes that more contact results in more regulation which is not necessarily true and would depend on the combination of factors that are associated with that regulatory element. Figure 1 from https://www.nature.com/articles/s41596-023-00817-8 - Figure 1 – Micro capture C show that regions with low absolute contact frequency compared to adjacent regions have potential to regulate gene expression, as have other studies that have used CHiCAGO to identify regulatory elements. 3) The sequence of some fragments makes them more likely to captured or enriched in the HiC protocol, which the relative contact frequency above background controls for.

      This becomes relevant because the authors claim that 80% of enhancers are wrongly annotated based on their metrics. The only way to correctly annotate an enhancer is to knock it out and checking the effect on genes in the vicinity. Therefore, to claim that their method can correctly annotate enhancer is grossly overstated, particularly when considering the issues with contact frequency stated above. Therefore, claims like 80% of enhancers are wrongly annotated should be removed from the paper. The authors should discuss how to annotate enhancers, in the Discussion and what the proper method is for annotations.

      We have amended the text to indicate that we do not suggest that VISTA enhancers are wrongly annotated but incompletely assigned. We apologise for making this suggestion in the first draft. There is however complementary evidence from Cheng et al (2024), now referenced in the revised manuscript, that also find 60% of the VISTA enhancers skip their adjacent gene. It is also well established in the literature that nearest genes are not always regulated.

      Other points:

      - The authors claims that PIFs have 2.14 and 2.69 fold enrichment of H3K4me1 and H3K27ac sites. Did the authors use the whole genome as background. If so, they should take into account that promoter are more likely in regions of high gene density, which are more dense in active marks. It would be better to perform local, circular permuation of the the PIFs around the promoter.

      The reviewer is correct that a whole genome background is not an appropriate background for testing enrichment of active marks within PIFs. Fortunately, this is taken into account in the CHiCAGO enrichment test which selects the background from fragments that are matched to the same distance of the PIFs to account for the observation that promoters are more likely in regions of high gene density and are therefore more enriched for active chromatin modifications.

      - The authors talk about "lead PIF", which is the fragment with the "most significant CHICAGO score". What does this mean? Something is significant or not, despite common misuse of the term there is no gradient of significance.

      The reviewer makes a good point here and we apologise for the oversight in wording and have corrected the text to be more specific that the lead PIF is the one with the highest ChiCAGO score.

      - In the GO analysis the categories with the lowest p-value are presented, but this biases for large categories. It would be more relevant to also select for and show the enrichment scores.

      We agree with the reviewer that a drawback of GO analysis is that it biases for large categories and that if by ‘enrichment score’ the reviewer means the –log10(p-value) we have included that in the supplementary tables which also includes the size of the category and number of genes detected in it.

      Reviewer #3 (Significance):

      The study provides a dataset that may be interesting for people studying cerebellar development. In that sense the data is mostly interesting from a fundamental viewpoint. The data seem of good quality.

      The authors claim that they a very sizeable fraction of enhancers are misannotated, but I do not believe that this is correct.

      We thank the reviewer for pointing this out. We apologise for creating the impression that VISTA enhancers are incorrectly annotated. We have amended the text to reflect that these are incompletely annotated.

      My expertise is 3D genome, bioinformatics.

    1. mise à jour de sécurité

      code propagation, dependency management at scale, or sometimes fleet-wide remediation pas que mise à jour de sécurité

      cve critique le soir, impactant 100 projets. le matin, PR, test, lien de test, notification et revue.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on fig share can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We have now revised the manuscript to include a link to our dataset.

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. We have now added a paragraph: “It is also important to clarify that we use the terms…… that lead to these meta-mechanisms arising remain an open question.” found in lines 120-129 in our Introduction to make this clarification.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points, we have now expanded our Discussion to include a paragraph: “Our results highlight the need for more…..range of task types and cognitive abilities.” found in lines 420-433 to highlight these key questions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I do not have any major objections, but I am clarifying my points as major or minor depending on the effort required to address (mostly via rewriting and clarifications).

      Major comments:

      (1) A schematic summary of the original study: Since the current manuscript builds directly on Sasaki & Biro (2017), it would greatly help readers if you included a concise schematic figure summarizing the original experiment. For instance, a simple panel could depict the chain design (experienced + naïve replacements), the control treatments, and the key empirical findings (improvements in route efficiency across generations, and route similarity within vs. between chains). Presenting this visually would save readers the effort of reconstructing the design and main results from text alone, especially for those unfamiliar with the original paper. It would also clarify exactly what empirical patterns your simulations are intended to reproduce.

      We thank the reviewer for this comment. We have now revised the manuscript with a schematic illustration adapted from the original study by Sasaki and Biro (2017). We hope this clarifies the experimental design and results we aimed to highlight in our work.

      (2) Reproducibility: Code and data are only "available on request." I believe eLife has strong policies on open science; a lack of immediate open access to analysis would be a barrier. I find it jarring that a paper intending to reproduce and improvise a previously published paper does not make the codes and data available for peer review or to readers without an explicit request.

      We have taken the feedback into consideration and updated the Data Availability section with a link to our Fig share dataset.

      (3) One huge drawback of the current format of the manuscript, where Methods come after Results, is that one has to really struggle to understand and appreciate Figures 2 and 3. I would strongly urge authors to have a shorter methods section embedded either as a subsection before the Results, or within the results section, as described in each figure. Perhaps a lot of my confusion also comes from not having known the previous paper, but it may be true for other readers, too. More specifically, for Figure 3, how is social weight for the experiments inferred? Figure 3 caption talks of mean difference, but one has to check the manuscript at multiple places throughout to really understand what this difference is (the definition) and how it is computed.

      While we agree that our manuscript includes the Methods section at the end, we tried to structure our text to tell a story (as stated in our manuscript title). To this end, we organized the text into short titled subsections that briefly convey the relevant background, identify the knowledge gap and outline our approach. We chose this structure to reserve the indepth details about model implementation and statistical analysis for the Methods.

      Additionally, we made sure to include references to methodological details in relevant segments of the Introduction and Results section so as to not bog down the reader by model complexities and keep a coherent narrative that delivers the message of our study. To further address the background of our work, we have now added a schematic of the original study in response to a previous comment by the reviewer, which we hope helps the reader better understand our work. We hope this explanation clarifies the intention behind our writing choice and decision to retain the current structure.

      (4) The introduction of the 'effective group size' concept is a potentially valuable and intuitive way to interpret chain dynamics, but the explanation is somewhat buried in the Results/Methods; I suggest highlighting it more prominently (e.g., in the Discussion or with a schematic in the Results) so readers can readily grasp this useful idea.

      We thank the reviewer that they found our concept of ‘effective group size’ useful. However, we do believe that we introduced the idea and rationale behind using this method in the Results: “We asked to what extent……to an equivalent group size” found in lines 305-314. We reserved a detailed description of this method in the Methods section. However, to further emphasize the importance of the concept we have now added a text: “This is further supported….. slightly better than two individuals.” found in lines 389-394 in the Discussion. 

      Minor comments:

      (1) Line 12: "what is the navigation mechanism(s)" - the (s) is a bit awkward. Either remove (s) or ask what the mechanisms are.

      We have fixed the typo to clarify the statement.

      (2) Line 78: "Such 'ratchet'-like improvements is referred to..." → "are referred to."

      We have fixed the typo to clarify the statement.

      (3) Figure 3 caption: "color scheme in the plots are same" → should be "is the same."

      We have fixed the typo to clarify the statement.

      (4) Clarification on reporting confidence intervals: The manuscript reports confidence intervals (CIs) for the model-based comparisons (e.g., Figures 2-3). This might seem unnecessary for simulation studies, since running more iterations can arbitrarily shrink uncertainty. However, in your case, the CIs are justified because the simulations are anchored to a finite empirical dataset (only 9 solo trajectories), sampled with replacement, and analyzed with mixed-effects models that incorporate bird identity as a random effect. Thus, the intervals reflect biological sample variability rather than simulation noise. This must be clarified.

      We have added a clarifying statement: “...and reflect the biological uncertainty in the empirical dataset, not simulation noise” found in lines 241 and 293 in the captions of Figures 2 and 3 in accordance with the reviewer’s comment. 

      (5) One part of the issue is that details of methods come much later in the manuscript, perhaps following journal style. Therefore, I recommend explicitly highlighting this rationale in the Results, so readers do not misinterpret the CIs as simply reflecting simulation error.

      We believe that the clarifying statements we have now added in the captions of Figures 2 and 3 should convey this interpretation of CIs and further changes in the Results may not be required.

      With these proposed changes we hope that we improved upon the clarity of our manuscript.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript reports a very interesting, novel and important research angle to add to the now enormous interest in how pesticides can be toxic to beneficial insects like the honey bee. Many studies have reported on how pesticides in standard use formulations show both lethality as well as sublethal negative effects on behavior and reproduction. The authors propose to use machine learning algorithms to identify new volatile compounds that can be tested for repellency. They use as input chemical structures that are derived from chemicals that have known repellent effects as identified in their initial behavioral assays.

      Strengths:

      The conclusion is that such chemicals specific to repelling bees and not pest insects (using the fruit fly as a model for the latter) can be identified using the ML approach. Have a list of such chemicals that can be rotated among in any field application would be a benefit because of the honey bees' ability to learn its way around any kind of stimulus designed to keep it from nectar and pollen, even when they may be tainted by pesticide.

      Weaknesses:

      The use of machine learning seems well-executed and legitimate. But this is beyond my expertise. So other reviewers can maybe comment more on that.

      The behavioral data report on the use of a two-choice assay for bees in small Petrie plates. Bess can feed from two small wells place of filter paper impregnated with control or the control containing a chemical. The primary behavior, for ex in Fig 2C, is the first choice by one of the five bees in the plate of which well to feed from. For some chemical compound, there seems to be a 50:50 choice, indicating no repellent effects. In other cases the first bee making the choice chose the control, indicating possible repellent effects of the test chemical. Choices in this assay were validated in a free flying assay.

      Concerns with the choice assay:

      50-70 microliters amounts to what one hungry bee will drink. Did the first bee drink most of it, such that measures of bait consumed reflect a single bee or multiple bees?

      The measure of lure consumed reflects multiple bees. We observed that the first bee did not empty the 70 ul of honey, allowing us to estimate honey consumption by several bees.

      How many bees were repelled to the control side? Was it just the one bee?

      All the bees in a group were repelled to the control side for repellents. Evaluating lack of honey consumption, also allowed us to repellency as well. As an example: if 100% honey is consumed on the control side meant that the bees were hungry, but if 0% honey was consumed on the repellent side, this meant that the bees were not hungry enough to drink from the honey on the repellent side.

      Were other measures considered? E.g. time to first approach; the number of bees feeding at different time points; the total number of bees observed feeding per unit time.

      Bees were cooled down to place them in the plates for the experiments. Therefore, time to first approach could also depend on how long it took the bees to warm up, which was not as relevant for our research question. Because bees can communicate where to find food sources to each other, we restricted ourselves to first choice, only, to get independent data points for each plate. However, we investigated whether the first cup the first bee chose was also the one it drank from, which was the case.

      Reviewer #2 (Public review):

      Summary:

      The search for new repellent odors for honey bees has significant practical implications. The authors developed an iterative pipeline through machine learning to predict honey bee-repellent odors based on molecular structures. By screening a large number of candidate compounds, they identified a series of novel repellents. Behavioral tests were then conducted to validate the effectiveness of these repellents. Both the discovery and the methodological approach hold value for related fields.

      Strengths:

      The study demonstrates that using molecular structures and a relatively small training dataset, the model could predict repellents with a reasonably high success rate. If the iterative approach works as described, it could benefit a wide range of olfaction-related fields.

      The effectiveness of the predicted repellents was validated through both laboratory and field behavioral tests.

      Weaknesses:

      The small size of the training dataset poses a common challenge for machine learning applications. However, the authors did not clearly explain how their iterative approach addresses this limitation in this study. Quantitative evidence demonstrating improvements achieved in the second round of training would strengthen their claims. For instance, details on whether the success rate of predictions or the identification of higher-affinity components would be helpful. Furthermore, given that only 15 new components were added for the second round of training, it is surprising that such a small dataset could result in significant improvements.

      The original repellency dataset was collected from multiple older studies, each with differences in assays for bee behavior, and using differing delivery and chemical concentrations. Moreover, the number of strong repellents were limited in number, and because they varied structurally from non-repellents in the dataset, the AUC appeared high. A smaller dataset result in unusual AI/ML model performance trends, as any algorithm is just a reflection of its training data. As a result, we found that the Round 1 predictions had a low success rate in behavior assays (~20%). Subsequently, even small amounts of data collected using one standard concentration and assay, could dramatically change the quality of the dataset, not just for structures of repellents, but also related structures that were not repellent. What we observe is a more complete representation of how repellents and non-repellents are distributed when adding just 15 chemicals. And the prediction success of Round 2 is more than doubled in repellent behavior assays at >50%. The initially observed performance gains with even small additions to the training dataset will stabilize and ultimately plateau due to the limits of the ML algorithm and/or chemical featurization technique. A more complex model, trained on a large dataset, may not be expected to benefit from a handful of additional examples, it is because the chemical feature distributions are already better approximations of the real world. To put simply, smaller datasets imply there is more to learn.

      It is also true that the size of the training dataset is important for AI/ML algorithms, Artificial neural network, for instance, are highly sensitive to noise and generalize poorly with limited data; the noise is amplified in these cases, and the solution—reducing the complexity of the model—impedes learning. Many algorithms like the decision trees and support vector machines featured in our paper can handle noise more efficiently and are suitable for smaller datasets in that they can still make reasonably successful predictions.

      Reviewer #3 (Public review):

      The manuscript of Kowalewski et al. titled "Machine learning of honey bee olfactory behavior identifies repellent odorants in free flying bees in the field" did machine learning to predict potential candidates for honeybee repellents, which may keep foraging bees from pesticides. This is a pilot research with strong significance in the research of olfactory behavior and in pest control. However, some major issues need to be addressed to enhance the manuscript's clarity, strength, and overall coherence.

      (1) Drosophila melanogaster is not considered as a true agricultural pest. The manuscript would be more compelling if using true pests, for example, Drosophila suzukii or others.

      Honeybees face a critical risk of lethal pesticide exposure when they drift from their designated orchards into adjacent blooming crops or honeydew-coated fields, where they encounter chemical treatments intended for insects like Citrus Thrips, Asian Citrus Psyllid, Alfalfa Weevil, Peach Twig Borer, Oriental Fruit Moth, Lygus Bugs , Cotton Aphids, Whiteflies, Corn Rootworm, Sunflower Head Moth, Vine Mealybug, Cucumber Beetles, and Sugarcane Aphids. Unfortunately, testing such pest species is outside the scope of this paper, but would deserve further research.

      (2) For repellency test, the result relies on dosage. An attractant may become a repellent at high concentration. Test a range of concentrations for each chemicals and compare responses between honeybees and pests.

      Testing freely flying honey bees in the field is an extremely challenging undertaking. Nevertheless, we added extra tests for two strong repellents, BR4.5 and BR3.81, at half dose of 0.05 mg/cm<sup>2</sup>. As expected, we found that there was a reduction in repellency. Testing more concentrations was not within the scope of this paper.

      (3) Be more clear about bee behavior data and their scores (as in Page 4 Results "184 training chemicals and later for 203 chemicals" and Page 10 Methods). I suggest that authors add a supplemental table with each chemical and its behavioral score, feature and reference - which ones were used for training, and which ones for testing. Also add your own behavioral test data (second input) to this table

      We have added the training chemical lists as Supplemental Tables S3 and S4.

      (4) The AUC in the first validation was 0.88 (Page 4), and in Page 5, "As expected, the computational validation results based on the AUC values, show an improvement." However, there were no other AUC values to show improvement.

      (5) Show plots of ROC AUC curves from Round 1 and Round 2.

      The round one ROC curve is shown in Figure 1. The round two ROC curves obtained from 3 different approaches (Author response image 1). The manuscript shows direct behavioral validation of chemicals identified, which is more important.

      Author response image 1.

      (6) In the Discussion, the authors mentioned olfactory receptors in honeybees. It would be useful to provide a general review of the current understanding of these receptors and their (potential) functions.

      We have expanded the discussion and pointed to a review on honey bee olfaction.

      (7) I suggest combining Fig. 1 and Fig. 3A as one pipeline for this work.

      (8) Figure 2C, some sample sizes are very small, such as 2-piperidone: 1 first-choice control vs 0 first-choice repellent? Increase sample size and do statistical analysis.

      Most compounds except the one pointed out, have small sample sizes because of the low percentage of bees participating in the trials. Consequently, we improved methods in round 2 and were able to increase participation from 68% to 81%, as described in the methods. However since the compound was included in the second round of training, we would like to report it anyway. This compound had the highest rate of non-participating plates compared to the others and there is a possibility that it it may affect both the stimuli.

      (9) In general, to assist reviewers, include line numbers to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Other factors about the newly identified chemicals:

      Is there a toxicity index for these chemicals that can be listed? This would be important obviously for any humans around the repellents

      While toxicity index determination is outside the scope of this manuscript, it is possible to predict Rat LD50 values using the EPA Suite’s toxicity prediction tool. In a pilot test, the software predicted an average oral toxicity is ~3064mg/kg for the 18 repellents in Round 2, which is considered “Practically non-toxic” by the EPA.

      Was there any indication of bees being behaviorally impaired or dying when exposed to the chemicals in a confined space? Even exposure to intense floral perfumes in a confined space and be toxic over a longer period.

      Less than 5% of the 2225 honey bee died after the experiments, and none of the compounds showed a significantly higher level of dying, suggesting that the minor effect was not due to chemicals, but possibly due to handling steps (starving, chilling, recovery, etc).

      The 'plates not participating' measure indicates plates in which no bees fed on either choice. Is that correlated to the choice index? That is, when bees showed some repellency was it the case that often that led to no choice?

      Yes, non-participating plates were those, in which the bees did not drink any honey at all. The reason for this could have been that the bees were too cold and unable to heat up enough to participate in the trials, or that the chemical was so repellent, the bees did not want to drink any honey at all. Because we were not able to distinguish between these two reasons, we excluded plates in which the bees did not drink any honey at all from our dataset.

      It is unclear why the McNemar test was used.

      The McNemar test is used for hypothesis testing for paired dichotomous data. In our data file, we created two columns to report our first-choice results: “Control side first” and “Repellent side first”. When the first bee in a plate drank from the control side first, we added a 1 to the “Control side first” column and a “0” to the “Repellent side first” column. Because one control and one repellent-side honey pot were in the same Petri dish, the bees could only choose one side first, this meant it could not choose the other side at the same time. Consequently, our dataset consisted of paired samples, which were dependent from each other. We therefore split the dataset by Repellent candidate, and we used the paired -sample McNemar tests for non-parametric data. (Lachenbruch P.A. McNemar Test, Wiley StatsRef: Statistics Reference Online)

      The statistical result is not discussed in the text, only shown in the figure. And it looks to be significant only for one chemical and DEET. Yet on page 4 the end of the second paragraph, the authors write "For many of the tested compounds the bees preferred to visit the honey-water pots on the control side versus the repellent side,". That implies that they are not really using the test as a meaningful means for showing differences. If they are arguing only from trends, then that should be clearer in the text.

      We reported the p-values for each test we had used in tables in Figure 2C and S2. In the methods section we report which statistical tests were used to evaluate the data.

      There is no mention of attractant chemicals:

      Slessor and Winston used queen pheromone to attract bees to fields and improve pollination. Honey bees use the Nasonov pheromone to attract other bees to feeding locations. Could the addition of their chemical features change ML outcomes? This should be at least discussed.

      We thank the referee for the suggestion; however the focus this manuscript is repellents and therefore we restricted the background to that area of knowledge.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Releasing the dataset and code will benefit the readers interested in this study.

      The behavioral data are reported within the figures, tables, and supplementary. The computational code will be available upon request from the communicating author for non-commercial use.

      Figure 1, AUC curve, "AUC = 0.XX", should there be an actual value from the experiment?

      Added

      Page 4, "(Talbe S1)" should be placed in the next sentence, as "From the initial training set we identified 45 features that were considered important for predicting aversive valence (Table S1)."

      We have added this in the appropriate spot.

      Page 5, "As expected, the computational validation results based on the AUC values, show an improvement.". Please list the AUC values.

      Author response image 2.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Page 3: "they sense using a sophisticated olfactory system of >180 odorant receptor genes in the genome". In the cited Robertson & Wanner's paper, there are around 160 receptors, and 170 if pseudogenes are included.

      We thank the referee and have updated the numbers.

      (2) Page 4: "initially for 184 training chemicals and later for 203 chemicals (Table S1)." Table S1 is about features, not chemicals?

      We have moved the reference to an appropriate location.

      (3) Figure 2A: What is the control? Acetone or another solvent?

      Acetone, but it rapidly evaporates before the time of experiment.

      (4) Figure 2A: What does asterisks mean?

      Statistically significant.

      (5) Figure 3: When you added your own testing data as a second input for Round 2, put details about these data: chemical names, preference scores... Also, are Round 2 data (Round 1 plus your own) were also split as 90:10 into training and testing partitions?

      Yes, the validation was performed on the updated data set including the new chemicals.

      (6) Figure 3D: Is asterisk at correct location? What does it mean?

      Means that BR3.15 was significantly different from BR4.5

      (7) Figure 4D: "4D" in legend is missing. Also, "... tested at the regular dose (0.1mg/cm2) and half dose (0.05mg/cm2)". In the panel, it is only 0.05mg/cm2.

      Added

      (8) Table S2 is the same as Fig. 2C? Remove one.

      We have deleted Table S2.

    1. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated the interactions between IRE and unfolded peptides using all-atom molecular dynamics simulations. The interactions between a couple of unfolded peptides and IRE provide mechanistic insight on the activation of the UPR.

      Strengths:

      - Well-written manuscript accessible for a broad biological audience

      - State-of-art structural predictions and all-atom simulations

      - Validation with existing experimental data<br /> - Clear schematic diagram summarizing mechanisms learned from simulations

      - Error estimate included

      - Shared simulation data and code in public repository

      Weakness:

      No major concerns remain after revision.

      Comments on revisions:

      The authors have addressed all my questions from the previous assessment. I do not have more suggestions.

    1. What I am wondering: The plan file — the 700-line proposal I wrote before touching code — is doing something strange. It keeps returning as reference. Every time I stopped to decide "what next?", I would read my own plan as if someone else had written it, and follow the proposed order (P0 data model → P1 lens system → P2 pathways → P3 creative). This works shockingly well. I wonder if the externalization is doing the work, or if writing the plan in my own voice made it load-bearing in a way that a received spec wouldn't. The plan became a partner I could push against.

      I love M-Claude

    1. Some stages require certain fields to be filled before a record can move into them: a resolution code before Closed, a customer before Confirmed. If Wyatt won't advance the stage, scroll up to find a red outline on the missing field. If a kanban drag doesn't stick, the target stage likely has a required field or you don't have permission to make that transition.

      drop this ... it's not true yet

    1. we use the population density of the locale in which the respondent resides.We calculate the population density based on a ten-mile radius around the centroidof the respondent’s ZIP Code.

      This is a fine method but would catch some supposedly urban areas as we discussed in class

    Annotators

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their thoughtful and constructive feedback. We appreciate that all reviewers recognized the value of our study in linking adult neurogenesis and synaptic plasticity to representational drift in the olfactory system. They described the model as elegant and well-motivated, and agreed that it provides new theoretical insight into how stability and adaptability can coexist in sensory representations. The reviewers also identified areas where our manuscript could be strengthened, and as outlined in our revision plan we have:

      (1) Refined our description of mitral/tufted cell stability and expand on within-session and across-day variability.

      (2) Substantially expanded the Discussion to compare our modeling assumptions with experimental findings and recent anatomical evidence. Additionally, we have included the limitations of the study and areas for future investigation.

      (3) Included a clearer description of the STDP implementation, plastic synapses, and their functional effects.

      (4) Add a short section outlining model-based predictions that can guide future experiments. We also made minor textual edits to improve precision and flow, including citing prior conceptual work and clarifying model procedures.

      These changes have strengthened both the conceptual framing and technical clarity of the paper. We are grateful for the reviewers’ careful reading and valuable suggestions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors build a network model of the olfactory bulb and the piriform cortex and use it to run simulations and test their hypotheses. Given the model's settings, the authors observe drift across days in the responses to the same odors of both the mitral/tufted cells, as well as of piriform cortex neurons. When representing the M/T and PCx responses within a lower-dimensional space, the apparent drift is more prominent in the PCx, while the M/T responses appear in comparison more stable. The authors further note that introducing spike-time dependent plasticity (STDP) at bulb synapses involving abGCs slows down the drift in the PCx representations, and further link this to the observation that repeated exposure to the same odorant slows down drift in the piriform cortex.

      The model is clearly explained and relies on several assumptions and observations:

      (1) Random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity.

      (2) Higher dimensionality of piriform cortex representations compared to M/T responses, which enables superior decoding of odor identity in the piriform cortex.

      (3) Spike time-dependent plasticity (STDP) at synapses involving the abGCs.

      The authors address an open topical problem, and the model is elegant in its simplicity. I have however, several major concerns with the hypotheses underlying the model and with its biological plausibility.

      Concerns:

      (1) In their model, the authors propose that MTC remain stable at the population level, despite changes in individual MTC responses.

      The authors cite several experimental studies to support their claims that individual MTC responses to the same odors change (some increase, some decrease) across days. Interpreting the results of these studies must, however, take into account the variability of M/T responses across odor presentation repeats within the same session vs. across sessions. In the Shani-Narkiss et al., Frontiers in Neural Circuits, 2023 study referenced, a large fraction of the variability across days in M/T responses is also observed across repeats to the same odorant in the same session (Shani-Narkiss et al., Figure 4), while the authors have M/T responses in the same session that are highly reproducible. This is an important point to consider and address, since it constrains how much of the variability in M/T responses can be attributed to adult neurogenesis in the olfactory bulb versus to other networks' inhibitory mechanisms, which do not rely on neurogenesis. In the authors' model, the variability in M/T responses observed across days emerges as a result of adult-born neurogenesis, which does not need to be the main source of variability observed in imaging experiments (Shani-Narkiss et al., Figure 4).

      We agree with the reviewer and believe this is a critical discussion point. Indeed, both in Shani-Narkiss et al, Kay and Laurent, 1999, and in our lab, we observe trial-to-trial variability that occurs in the same recording session; as the reviewer correctly points out, this cannot be due to neurogenesis. These fluctuations may be trial to-trial noise, or reflect dynamics associated with other behaviors such as running (Chockanathan, et al. 2021) and decision making (Kay and Laurent, 1999). There is growing repertoire of literature showing that neural variability in early sensory coding appears to depend on behavioral fluctuations and internal states (Niell and Stryker for example). This variability that happens within a session in the Shani-Narkiss et al work may reflect some of these behaviorally relevant features of early olfactory coding, something that our model cannot account for. This is an excellent discussion point and we have included text (line 153-157, and line 321-330) in the manuscript to note this aspect of the data and how one can think of it in the context of our results.

      Another study (Kato et al., Neuron, 2012, Figure 4) reported that mitral cell responses to odors experienced repeatedly across 7 days tend to sparsen and decrease in amplitude systematically, while mitral cell responses to the same odor on day 1 vs. day 7 when the odor is not presented repeatedly in between seem less affected (although the authors also reported a decrease in the CI for this condition). As such, Kato et al. mostly report decreases in mitral cell odor responses with repeated odor exposure at both the individual and population level, and not so much increases and decreases in the individual mitral cell responses, and stability at the population level.

      Thank you for raising this important point regarding the findings of Kato et al. (2012). We agree that their results suggest increased sparsening and stability in M/T cell odor responses with repeated exposure. However, as noted in Yamada et al. (2017), the experimental literature on this question remains mixed. Yamada and colleagues reported a “drastic reorganization of ensemble odor representation” across days and emphasized that “sensory experience does not necessarily cause a major sparsening of the odor response,” explicitly contrasting their findings with those of Kato et al. (2012).

      Our model captures the dynamics observed in Yamada et al. (2017), providing a mechanistic explanation for how significant reorganization can emerge in M/T ensembles despite stable low-dimensional population structure. In both Yamada et al (2017) and Kato et al (2012) the investigators have nuanced differences in experimental design (method of head fixation, behavioral paradigm used, training etc.), all of which are known to affect olfactory responses and therefore the degree of sparsity and overlap in population codes. Our model does not include any of these behavioral features that may differentially engage the olfactory circuit and thus affect population responses. Notably, in previous work, we highlight how even simple changes to top down feedback that reflect one phenomenological manipulation to functional connectivity in the olfactory circuit could have disparate effects on the degree of sparsity in neural representations over time whereby this manipulation would be activated by some behavior broadly. In our current model, there is no behavior that would allow us to study the critical features of the neural activity code in the M/T cells. Instead we focus on one specific aspect, adult neurogenesis which we can explicitly manipulate and affect in a biologically meaningful way. The review’s point however is well taken and important, and we have added text to the Discussion (line 336-344) to highlight the differing experimental outcomes and to clarify how our model aligns with the Yamada et al. results.

      (2) In Figure 1, a set of GCs is killed off, and new GCs are integrated in the network as abGC. Following the elimination of 10% of GCs in the network, new cells are added and randomly assigned synaptic weights between these abGCs and MTC, GCs, SACs, and top-down projections from PCx. This is done for 11 days, during which time all GCs have gone through adult neurogenesis.

      Is the authors' assumption here that across the 11 days, all GCs are being replaced? This seems to depart from the known biology of the olfactory bulb granule cells, i.e., GCs survive for a large fraction of the animal's life.

      Thank you for raising this important point regarding the lifespan of granule cells (GCs). We agree that developmentally born GCs are not fully replaced. Indeed, multiple studies indicate that some developmentally born GCs can survive for very long periods, up to 18-24 months, essentially the lifetime of the animal (Kaplan, 1985; Petreanu & Alvarez-Buylla, 2002). However, the fraction of total GCs that such long lived GCs constitute remains an open question, in part because of challenges to measure the lifetime survival of newborn neurons. What there is consensus on is the significant size of the granule-cell population undergoing continuous turnover through adult neurogenesis (reviewed in Lepousez et al., 2013).

      We should clarify that we do not assume that 100% of the granule cell population turns over in an 11 day period. We use “day” to represent a static epoch over which we can implement plasticity rules across two time scales. Critically, we also randomize the turnover treating every cell in the GC population as equally likely to be replaced. Prior experimental evidence suggests that some GCs are more likely to persist (possibly as a result of experience, Magavi et al., 2005) which may in some regards make our result on stabilization following repeated sensory exposure more dramatic (as the GCs that show the largest change following STDP may also be the ones that are the most stable, and therefore least likely to turnover). We do not include this in our model as we could not identify a framework for “selecting” which GCs would persist that would not be tautological. The point the reviewer raises is critical, and a discussion of these points is warranted - which we now include in the manuscript (line 352-361).

      Additionally, there is some evidence that behaviors, such as novelty, can increase the rate of adult neurogenesis (Kamimura et al., 2022, H.van Praag et al.,1999, Gheusi and Lledo., 2014) , suggesting a complex reciprocal relationship between the mechanisms that generate the cells shaping how olfactory stimuli are encoded for and the encoding process itself; our model also does not include any of these dynamic features which represent an additional layer of complexity, which may further provide an intermediate time scale, one of behavioral selection and action, that is slower than the milliseconds on which spike time dependent plasticity happens, but faster than the time scale of neurogenesis. We include this point in the discussion also (line 352-361). 

      Our 11-day simulation however is designed to uncover how plasticity across multiple timescales (STDP and adult neurogenesis) at the network level shapes odor representations as multiple rounds of GC turnover occur. Changing the timescale and magnitude replacement in the simulations (either in terms of days or percent cells replaced) would affect the degree to which drift happens, but not phenomenon. Additionally, the representational structure in our model at intermediate time points (e.g., days 8~10) would correspond well to scenarios in which some fraction of developmentally born GCs persists in the circuit. Thus, our simulations span a range of possible empirical regimes, from high turnover to partial preservation. We have added discussion to the revised manuscript (line 352-361) clarifying this point and acknowledging the biological heterogeneity in GC lifespans.

      (3) The authors' model relies on several key assumptions: random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity. These assumptions are not necessarily accurate, as recent work revealed structure in the projections from the olfactory bulb to the piriform cortex and structure within the piriform cortex connectivity itself (Fink et al., bioRxiv, 2025; Chae et al., Cell, 2022; Zeppilli et al., eLife, 2021).

      How do the results of the model relating adult neurogenesis in the bulb to drift in the piriform cortex representations change when considering an alternative scenario in which the olfactory bulb to piriform and intra-piriform connectivity is not fully distributed and indistinguishable from random, but rather is structured?

      Thank you for pointing us to these important studies. We fully agree with the reviewer that the structure of the olfactory system might not be purely random, but we do not believe these papers contradict the level of abstraction used in our model.

      Zeppilli et al. (2021) map molecularly defined projection neuron subtypes and their preferential targeting of different cortical and subcortical regions, but they do not report any fine-scale topographic organization of bulb → piriform connectivity that would contradict a view of randomly distributed input to piriform cortex. Studies from our lab using retrograde tracers in the blub show some spatial clustering of piriform cortical neurons whose axons project to the bulb (Padmanabhan et al., 2016, 2019), but these studies do not identify any “functional organization” or structure. Chae et al., (2022) focus on distinct long-range functional loops (mitral ↔ piriform vs tufted ↔ AON) and the differential role of cortical feedback, but again, at the level of cortical regions rather than individual cells and connectivity. Notably, our model does not consider AON.

      Finally, Fink et al. (2025) reports a “like-to-like” excitatory connectivity motif within the piriform cortex and an experience-dependent reorganization of inhibitory synapses. As the authors note, “... this like-to-like motif is unlikely to reflect common input from the olfactory bulb”, so it does not conflict with our assumption of broadly random bulb → piriform input. This “like-to-like” motif is reflected in our model by wiring a certain subpopulation of piriform cells. On the other hand, we agree that the experience dependent changes in inhibitory connectivity within PCx are highly relevant for learning related plasticity but fall outside the scope of our study. We intentionally omitted piriform plasticity to isolate the contributions of adult neurogenesis in the bulb and plasticity acting on adult-born granule cells. But incorporating such cortical plasticity is an important direction for future work. We added a discussion (line 395-405) on this important point raised by the reviewer in the revised manuscript.

      (4) I didn't understand the logic of the low-dimensional space analysis for M/T cells and piriform cortex neurons (Figures 2 & 3). In the authors' model, the full-ensemble M/T responses are reorganized over time, presumably due to the adult-born neurogenesis. Analyzing a lower-dimensional projection of the ensemble trajectories reveals a lower degree of re-organization. This is the same for the piriform cortex, but relatively, the piriform ensembles displayed in a low-dimensional embedding appear to drift more compared to the M/T ensembles.

      This analysis triggers a few questions: which representation is relevant for the brain function - the high or the low-dimensional projection? What fraction of response variance is included in the low-dimensional space analysis? How did the authors decide the low-dimensional cut-off? Why does STDP cause more drift in piriform cortex ensembles vs. M/T ensembles? Is this because of the assumed higher dimensionality of the piriform cortex representations compared to the mitral cells?

      Thank you for these thoughtful questions. We clarify the logic and purpose of the low-dimensional analyses and address each point below.

      (1) Which representation is relevant for brain function, the high-dimensional or low-dimensional one?

      We believe both representations are meaningful, with each capturing different aspects of the neural code. The high-dimensional activity reflects the full variability of individual cell responses, while the low-dimensional projection captures the dominant population level components that downstream areas are most likely to use for readout. We found that the low-dimensional representations are more stable in the bulb than in PCx, suggesting that information is used differentially between the two areas. The bulb provides a stable, sensory-anchored population code that reliably represents odor identity over time, consistent with both electrophysiological and behavioral studies (Nagayama et al., 2004, Chen et al., 2009, Davison and Katz, 2007, Cavaretta et al., 2018). This is consistent with its role as the first stage of information processing in the olfactory system which provides faithful representations that downstream circuits receive. The piriform cortex, by contrast, transforms this stable input into a more flexible representation. Drift in its low-dimensional space may reflect ongoing plasticity (Schoonover et al., Nature, 2021), integration of contextual signals, or higherdimensional computations characteristic of PCx (Fink et al., bioRxiv, 2025), suggesting its role more as an associative cortex instead of a pure sensory cortex.

      (2) What fraction of variance is included in the low-dimensional space, and how was the cutoff chosen?

      In our simulations, these PCs captured the majority of variance relevant for odor identity (~60–70% for M/T cells and ~55–65% for piriform cortex). We now report these fractions explicitly in Methods (line 937-939).

      (3) Why does STDP cause more drift in piriform-cortex ensembles than in M/T ensembles? Does this reflect higher dimensionality in piriform cortex?

      In our model, STDP does not cause more drift in PCx. It actually reduces drift and stabilizes PCx representations relative to the condition without STDP (as shown in Fig. 4C2). STDP has a much smaller effect in the bulb because: (1) M/T cells continue to receive stable odor input from the glomeruli and (2) the low-dimensional M/T representation is already stable even without plasticity. We have edited the manuscript to reiterate this point in both the results and discussion.

      The reviewer is correct that the piriform cortex naturally exhibits more drift than the bulb, and their comment that this is due to its substantially higher representational dimensionality is spot on. The PCx contains many more neurons, receives highly divergent OB → PCx inputs, and has dense recurrent connectivity, all of which create many more degrees of freedom through which representations can drift. Additionally, because individual PCx neurons are sampling from a substantially more diverse combinatorial space of inputs (include feedback to piriform from an array of regions, Illig, 2005, Majak et al., 2004, Chapuis et al., 2013), the “dimensionality” of the population code is likely higher dimensional. While STDP stabilizes the dimensions of the PCx representation that are reinforced during plasticity, due to the large number of orthogonal dimensions available, some residual drift remains. Additionally, as the reviewer notes, there are some forms of plasticity, such as inhibitory plasticity in PCx that are not included in the model, that may also have an impact on both the representations, and the underlying dimensionality of those representations. We include these points in the discussion (line 381-394).

      (5) Could the authors comment whether STDP at abGC synapses and its impact on decreasing drift represent a new insight, and also put it into context? Several studies (e.g., Lledo, Murthy, Komiyama groups) reported that abGC integrates in the network in an activity-dependent manner, and not randomly, and as such stabilizes the active neuronal responses, which is consistent with the authors' report.

      Related, I couldn't find through the manuscript which synapses involving abGCs they focus on, or what is the relative contribution of the various plastic synapses shown in the cartoon from Figure 4 A1 (circles and triangles).

      We thank the reviewer for raising this question. As the reviewer pointed out, several studies have shown that abGCs integrate into the bulb circuit in an activity dependent manner. They preferentially form synapses onto mitral/tufted cells that respond to behaviorally important odors, this “selection of surviving cells” is not included in our model. Instead, we use STDP at the synaptic level. This is of course not analogous, but provides a computational framework wherein the selection of surviving abGCs could be incorporated in future studies. It is perhaps notable that in our large scale simulations, synaptic changes at the population level may reflect some of this activity-dependent selection.

      To that end, our model provides a new insight and suggests a broader function for adult neurogenesis. For example, when certain odors are reinforced in an activity dependent manner, abGCs born during that period may stabilize the circuits that respond to those odors. The resulting reduction of drift would help keep the representation of those odors stable over time, even while other parts of the circuit continue to change. We now highlight this idea in the Discussion (line 366-373).

      For the second part of the question: in our model, STDP acts on two sets of connections. It applies to the synapses onto abGCs from M/T cells, GC/SAC cells, and PCx neurons. It also applies to the synapses that abGCs project to, including those onto M/T cells and GC/SAC cells. We have clarified this in the revised Methods (line 10011004).

      (6) The study would be strengthened, in my opinion, by including specific testable predictions that the authors' models make, which can be further food for thought for experimentalists.

      How does suppression of adult-born neurogenesis in the OB impact the stability of mitral cell odor responses? How about piriform cortex ensembles?

      We appreciate the reviewer’s suggestion and formalize the following two predictions from our model:

      Prediction 1: Suppressing adult neurogenesis will reduce spontaneous representational drift in the PCx. Increasing spike-timing-dependent plasticity during periods of experience with a specific odor will selectively stabilize representations of that odor.

      Prediction 2: Adult neurogenesis will not affect AON representations of odor identity or concentration in the same way that PCx representations are altered and drift.

      We include these two ideas in the discussion as experimentally testable predictions.

      Reviewer #2 (Public review):

      Summary:

      The authors address a critical problem in olfactory coding. It has long been known that adult neurogenesis, specifically in the form of adult-born granule cells that embed into the existing inhibitory networks on the olfactory bulb, can potentially alter the responses of Mitral/Tufted neurons that project activity to the Piriform Cortex and to other areas of the brain. Fundamentally, it would seem that these granule cells could alter the stability of neural codes in the OB over time. The authors develop a spiking network model to explore how stability can be achieved both in the OB over time and in the PC, which receives inputs. The model recapitulates published activity recordings of M/T cells and shows how activity in different M/T cells from the same glomerulus shifts over time in ways that, in spite of the shift, preserve population/glomerular level codes. However, these different M/T cells fan out onto different pyramidal cells of the PC, which gives rise to instability at that level. STDP then, is necessary to maintain stability at the PC level as long as odor environments remain constant. These results may also apply to a similar neurogenesis-based change in the Dentate Gyrus, which generates instability in CA1/3 regions of the hippocampus

      Strengths:

      A robust network model that untangles important, seemingly contradictory mechanisms that underlie olfactory coding.

      Weaknesses:

      The work is a significant contribution to understanding olfactory coding. But the manuscript would benefit from a brief discussion of why neurogenesis occurs in the first place - e.g., injury, ongoing needs for plasticity, and adapting to turnover of ORNs. There is literature on this topic. It seems counterintuitive to have a process in the MOB (and for that matter in the DG) that potentially disrupts the ability to generate stable codes both in the MOB and PC, and in particular a disruption that requires two different mechanisms - multiple M/T cells per glomerulus in the MOB and STDP in the PC - to counteract.

      We appreciate the reviewer’s suggestion and added discussion on this point in the revised manuscript (line 431-435).

      Given that neurogenesis has an important function, and a mechanism is in place to compensate for it in the MOB, why would it then be disrupted in fan-out projections to the PC? The answer may lie in the need for fan-out projections so that pyramidal neurons in the PC can combinatorially represent many different inputs from the MOB. So something like STDP would be needed to maintain stability in the face of the need for this coding strategy.

      This kind of discussion, or something like it, would help readers understand why these mechanisms occur in the first place. It is interesting that PC stability requires that odor environments be stable, and that this stability drives PC representational stability. This result suggests experimental work to test this hypothesis. As such, it is a novel outcome of the research.

      We agree with the reviewer. The fan-out from the bulb to the piriform cortex is essential for the combinatorial coding that allows PCx neurons to represent many odor features and mixtures. This architecture gives the piriform cortex great coding capacity, but it also makes the system sensitive to small changes in its inputs. As a result, drift that originates in the bulb can spread more easily in PCx. A stabilizing mechanism is therefore needed downstream. In our model, STDP provides this stabilization by reinforcing the dimensions that carry meaningful odor structure. This allows the piriform cortex to keep a stable population code even when its inputs change over time. Neurogenesis supplies the flexibility, the fan-out supplies the expressive power, and STDP supplies the stability. All three elements work together to support a system that must recognize odors reliably while still adapting to new sensory experiences. We have added discussion on this point in the revised manuscript (line 395-405).

      Reviewer #3 (Public review):

      Summary

      The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odorevoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights).

      Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer, this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.

      Strengths

      This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field, and to my knowledge, it is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.

      The authors also provide a thoughtful discussion regarding the differential roles of mitral and tufted cells in drift in piriform and AON and the potential roles of neurogenesis in archicortex.

      In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.

      We appreciate the reviewer’s comment and thank them for their thoughtful feedback.

      Weaknesses

      I have one major, general concern that I think must be addressed to permit proper interpretation of the results.

      I worry that the authors' model may confuse thinking on drift in the olfactory system, because of differences in the behavior of their model from known features of the olfactory bulb. In their model, the tuning of individual bulbar neurons drifts over time.

      This is inconsistent with the experimental literature on the stability of odor-evoked activity in the olfactory bulb.

      In a foundational paper, Bhalla & Bower (1997) recorded from mitral and tufted cells in the olfactory bulb of freely moving rats and measured the odor tuning of well-isolated single units across a five-day interval. They found that the tuning of a single cell was quite variable within a day, across trials, but that this variability did not increase with time. Indeed, their measure of response similarity was equivalent within and across days. In what now reads as a prescient anticipation of the drift phenomenon, Bhalla and Bower concluded: "it is clear, at least over five days, that the cell is bounded in how it can respond. If this were not the case, we would expect a continual increase in relative response variability over multiple days (the equivalent of response drift). Instead, the degree of variability in the responses of single cells is stable over the length of time we have recorded." Thus, even at the level of single cells, this early paper argues that the bulb is stable.

      This basic result has since been replicated by several groups. Kato et al. (2012) used chronic two-photon calcium imaging of mitral cells in awake, head-fixed mice and likewise found that, while odor responses could be modulated by recent experience (odor exposure leading to transient adaptation), the underlying tuning of individual cells remained stable. While experience altered mitral cell odor responses, those responses recovered to their original form at the level of the single neuron, maintaining tuning over extended periods (two months). More recently, the Mizrahi lab (Shani-Narkiss et al., 2023) extended chronic imaging to six months, reporting that single-cell odor tuning curves remained highly similar over this period. These studies reinforce Bhalla and Bower's original conclusion: despite trial-to-trial variability, olfactory bulb neurons maintain stable odor tuning across extended timescales, with plasticity emerging primarily in response to experience. (The Yamada et al., 2017 paper, which the authors here cite, is not an appropriate comparison. In Yamada, mice were exposed daily to odor. Therefore, the changes observed in Yamada are a function of odor experience, not of time alone. Yamada does not include data in which the tuning of bulb neurons is measured in the absence of intervening experience.)

      Therefore, a model that relies on instability in the tuning of bulbar neurons risks giving the incorrect impression that the bulb drifts over time. This difference should be explicitly addressed by the authors to avoid any potential confusion. Perhaps the best course of action would be to fit their model to Mizrahi's data, should this data be available, and see if, when constrained by empirical observation, the model still produces drift in piriform. If so, this would dramatically strengthen the paper. If this is not feasible, then I suggest being very explicit about this difference between the behavior of the model and what has been shown empirically. I appreciate that in the data there is modest drift (e.g., Shani-Narkiss' Figure 8C), but the changes reported there really are modest compared to what is exhibited by the model. A compromise would be to simply apply these metrics to the model and match the model's similarity to the Shani-Narkiss data. Then the authors could ask what effect this has on drift in piriform.

      The risk here is that people will conclude from this paper that drift in piriform may simply be inherited from instability in the bulb. This view is inconsistent with what has been documented empirically, and so great care is warranted to avoid conveying that impression to the community.

      We thank the reviewer for highlighting this important issue. We agree that the interpretation of our model requires care to avoid implying that the olfactory bulb exhibits spontaneous drift. As the reviewer points out, the empirical literature shows that M/T-cell tuning is highly stable for infrequently experienced odors, but can change with daily, persistent odor exposure (e.g., Kato et al., 2012; Yamada et al., 2017).

      We thank the reviewer for highlighting the Bhalla and Bower paper, as it is foundational and actually raises a number of interesting and important points. As the authors noted, there was significant variability in trial-to-trial responses over sessions and days in single neurons. This is likely due to on-going dynamics (Laurent, 1999), the impact of behaviorally relevant top-down feedback (Chen and Padmanabhan, 2022), decision making (Kay and Laurent, 1999), and an array of factors that our model does not include. In that manuscript, the authors note “the variability of the same neuron recorded over different days…was not statistically different from the within day comparisons.” While these results appear prima facie to be different from our results, there are several reasons why they may not be the case.

      First, different metrics are used for measuring neuronal stability, which may contribute to some of the differences. Second, and perhaps more importantly and interestingly, the authors in that study noted the significant trial-to-trial variability within day, which is not present in our study because our model has none of the richness of behavior that Bhalla and Bower found in the freely behaving rat. This variability within day (which is much higher than what we report) would reduce the impact of drift across days - a result that would complicate how plasticity across multiple timescales occurs. We thank the reviewer for the insights on this critical study and include these points in our discussion (line 321-330).

      Neural responses to odor representations are incredibly variable across different time scales (Padmanabhan and Urban 2010, Angelo et al 2011, Kapoor and Urban 2006, Friedrich and Laurent, 2001, Smear et al 2011, Wesson et al 2008). In our model, none of this selection of survival related to behavior is included, nor are there specific rules about which synapses may be preferentially strengthened (due to neuro modulation corresponding to behavioral choice and reinforcement learning). Instead, we aimed to recapitulate the experimental design of a few studies (Kato et al 2012, Yamada et al, 2017) to understand how neurogenesis and drift are related. Over the simulated 10 days, the odor is presented every day, and the network is otherwise frozen between sessions—meaning the model lacks mechanisms that would normally support recovery during intervals without odor exposure. Under these conditions, adult neurogenesis effectively interacts with repeated experience, producing gradual changes in individual M/T-cell tuning. Thus, our results should be interpreted as modeling experience dependent changes over the timescale of neurogenesis, not as evidence for spontaneous drift in the bulb. We now state this explicitly in the Discussion to prevent confusion and expand the discussion to incorporate some of these critical ideas (line 321-330).

      Major comments (all related to the above point)

      (1) Lines 146-168: The authors find in their model that "individual M/T cells changed their responses to the same odor across days due to adult-neurogenesis, with some cells decreasing the firing rate responses (Fig.2A1 top) while other cells increased the magnitude of their responses (Fig. 2A2 bottom, Fig. S2)" they also report a significant decrease in the "full ensemble correlation" in their model over time. They claim that these changes in individual cell tuning are "similar to what has been observed by others using calcium imaging of M/T cell activity (Kato et al., 2012 and Yamada et al., 2017)" and that the decrease in full ensemble correlation is "consistent with experimental observations (Yamada et al., 2017)." However, the conditions of the Kato and Yamada experiments that demonstrate response change are not comparable here, as odors were presented daily to the animals in these experiments. Therefore, the changes in odor tuning found in the Kato and Yamada papers (Kato Figure 4D; Yamada Figure 3E) are a function of accumulated experience with odor. This distinction is crucial because experience-induced changes reflect an underlying learning process, whereas changes that simply accumulate over time are more consistent with drift. The conditions of their model are more similar to those employed in other experiments described in Kato et al. 2012 (Figure 6C) as well as Shani-Narkiss et al. (2023), in which bulb tuning is measured not as a function of intervening experience, but rather as a function of time (Kato's "recovery" experiment). What is found in Kato is that even across two months, the tuning of individual mitral cells is stable. What alters tuning is experience with odor, the core finding of both the Kato et al., 2012 paper and also Yamada et al., 2017. It is crucial that this is clarified in the text.

      We thank the reviewer. As the issue raised here is related to the previous comment, we have clarified this in the revised text to avoid any misleading comparison and specify what aspects of our computational model map onto experimental studies and what aspects we cannot recapitulate and as a result, the places where our comparisons are limited.

      (2) The authors show that in a reduced-space correlation metric, the correlation of lowdimensional trajectories "remained high across all days"..."consistent with a recent experimental study" (Shani-Narkiss et al., 2023). It is true that in the Shani-Narkiss paper, a consistent low-dimensional response is found across days (t-SNE analysis in Shani-Narkiss Figure 7B). However, the key difference between the Shani-Narkiss data and the results reported here is that Shani-Narkiss also observed relative stability in the native space (Shani-Narkiss Figure 8). They conclude that they "find a relatively stable response of single neurons to odors in either awake or anesthetized states and a relatively stable representation of odors by the MC population as a whole (Figures 6-8; Bhalla and Bower, 1997)." This should be better clarified in the text.

      We agree with the reviewer that some of the cells in Shani-Narkiss Figure 8B showed relatively stable responses (while others did not). However, there is a clear monotonic increase in the “Average differences” over time, from “Same day” to “1 month” to “6 month”, as quantified in their Figure 8B. Although the author concluded that they "find a relatively stable response of single neurons”, we would argue that their data also provided evidence for what we would term “relatively unstable responses” as found in our model. But per reviewer’s suggestion, we better clarify it in the text now (line 194197).

      (3) In the discussion, the authors state that "In the MOB, individual M/T cells exhibited variable odor responses akin to gain control, altering their firing rate magnitudes over time. This is consistent with earlier experimental studies using calcium-imaging." (L3146). Again, I disagree that these data are consistent with what has been published thus far. Changes in gain would have resulted in increased variability across days in the Bhalla data. Moreover, changes in gain would be captured by Kato's change index ("To quantify the changes in mitral cell responses, we calculated the change index (CI) for each responsive mitral cell-odor pair on each trial (trial X) of a given day as (response on trial X - the initial response on day 1)/(response on trial X + the initial response on day 1). Thus, CI ranges from −1 to 1, where a value of −1 represents a complete loss of response, 1 represents the emergence of a new response, and 0 represents no change." Kato et al.). This index will capture changes in gain. However, as shown in Figure 4D (red traces), Figure 6C (Recovery and Odor set B during odor set A experience and vice versa), the change index is either zero or near zero. If the authors wish to claim that their model is consistent with these data, they should also compute Kato's change index for M/T odor-cell pairs in their model and show that it also remains at 0 over time, absent experience.

      We appreciate the reviewer’s suggestion and edited the text to make it more accurate (line 319-320).

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Line 28 "a graduate alteration in sensory perception". We do not know if drift results in changes in perception. If anything, behavioral evidence suggests that perception remains stable in spite of drift. For example, in Driscoll et al. (2017) mice are able to successfully navigate a virtual T maze despite drift, and in Schoonover et al. (2021), mice maintain aversive responses following fear conditioning, despite drift in the piriform. Finally, spatial navigation appears unimpaired despite pronounced drift in the hippocampus (e.g., Climer et al., 2025). It would be more appropriate to say "stimulusevoked activity patterns" than "sensory perception" or other words that refer to neuronal activity rather than cognition or behavior.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 27).

      (2) In the introduction, the authors state: "This representational drift has led to the hypothesis that PCx, rather than being a primary sensory area, may be more like an association cortical region." (L76-78). However, the hypothesis that PCx operates as an association cortex comes originally from Haberly's work and thinking (e.g., Haberly and Bower, 1984, elaborated in extensive detail in Haberly, 2001). I think it would be appropriate to acknowledge that here.

      We added the references to make acknowledge that per the reviewer’s suggestion (line 77).

      (3) In the methods, the authors elegantly describe how they induce neurogenesis in their model using weight reshuffling (L805-814). I think it could really help the reader understand the model if this idea were also included in the results section. As the results section currently reads, it seems as if their model implemented neurogenesis in a different fashion: "To do this, following elimination of 10% of the GCs in the network, we added new cells and randomly assigned synaptic weights between these abGCs and M/Ts". I appreciate that in their model, shuffling all the weights of a given GC randomly is akin to "elimination", but I feel like at first blush the results section risks giving an impression a bit different than that actually used in the model.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 110-112).

    1. Reviewer #1 (Public review):

      The manuscript analyzes previously published MEG and ECoG datasets to examine pre-onset neural encoding effects during language processing, replicating effects that have been reported in earlier work and demonstrating that they persist even after controlling for correlations in the stimulus sequence. Replication of these effects across recording modalities and datasets is a valuable contribution, as it strengthens confidence in the robustness of anticipatory neural activity related to upcoming linguistic input. However, I have significant concerns regarding the interpretation of these findings, particularly the conclusion that the absence of temporal generalization between pre- and post-onset activity implies that pre-onset activity does not reflect predictive pre-activation of the upcoming word.

      The central inferential step in this argument relies on an implicit assumption: that if the brain were predicting an upcoming word, the neural representation prior to word onset should resemble, or generalize to, the representation observed after word onset. This assumption is not theoretically necessary and is not supported by a substantial body of work on predictive processing. Many contemporary models posit that predictions are represented in abstract, compressed, or probabilistic formats that differ from sensory-evoked representations, particularly in hierarchical systems such as language (e.g., Rao & Ballard, 1999; Friston, 2005; Federmeier, 2007; Kuperberg & Jaeger, 2016; de Lange et al., 2018). Under such accounts, predictive representations may encode expectations over latent semantic features or probability distributions rather than reinstating the neural code associated with perceptual input.

      In this context, the temporal generalization analyses presented here convincingly demonstrate that pre-onset and post-onset activity do not share a stable representational code. However, this result does not rule out predictive processing per se. Rather, it rules out a specific and relatively strong hypothesis: that prediction takes the form of early reinstatement of the same neural representation used during post-onset word processing. The data are equally consistent with the interpretation that pre-onset activity reflects predictive information expressed in a different representational format that is transformed upon stimulus onset.

      I therefore recommend that the authors substantially soften and clarify their conclusions regarding prediction. Statements suggesting that pre-onset activity does not reflect prediction should be revised to more precisely reflect what is directly supported by the analyses, namely, the absence of representational identity or stable overlap between pre- and post-onset activity. Explicit acknowledgement of alternative interpretations grounded in established predictive processing frameworks would improve theoretical alignment and avoid overstating the implications of the temporal generalization results.

      Overall, the empirical analyses are carefully executed, and the replication across datasets is a strength. However, the current framing risks over-interpreting what the data can rule out about prediction. A clearer distinction between representational equivalence and predictive processing would significantly strengthen the manuscript's theoretical contribution.

    2. Author response:

      Reviewer 1:

      We thank the reviewer for bringing a critical theoretical distinction to our attention. We agree that the Temporal Generalization (TG) results specifically rule out the reinstatement of post-onset neural codes, the idea that the brain pre-activates the same neural representation evoked by the stimulus. In fact, we mention in the discussion: "This temporal variability underscores the need for a more nuanced view of what constitutes predictive pre-activation, as no stable representational state appears to persist after word presentation that could serve as its target.".

      To our understanding, prediction is rarely explicitly defined in the literature, and the distinction between predictive pre-activation and other forms of prediction is seldom made. Moreover, the idea of compressed or abstract forms of pre-activated representations has not, to our knowledge, been explicitly articulated in the literature. Our TG findings therefore, put meaningful constraints on theories of prediction. In the revisions we will expand on this more and include a broader description of potential forms of pre-activation. We will emphasize that the TG results specifically rule out that the brain pre-activates the same neural code used for sensory-evoked processing.

      Moreover, although TG analysis does not rule out alternative notions of predictive pre-activation, we believe our second analysis (the inclusion of future word embeddings) provides independent evidence that argues against more abstract forms of prediction. Unlike the TG analysis, this encoding approach is not constrained to a specific neural code; if the brain represented upcoming words in any linearizable format (abstract, probabilistic, or latent) incorporating those embeddings should have improved the brain score at the current word's onset. We found no such improvement until the word was actually heard. In the revised manuscript, we will reformulate the narrative to clarify that while TG alone rejects a specific form of pre-activation, the combined evidence from both analyses suggests there is a broader lack of predictive pre-activation.

      Reviewer 2:

      We thank the reviewer for their constructive feedback and for bringing to our attention the missing information in our Methods section. We realized that the final two sections were inadvertently omitted during formatting changes before submission. These will be restored in the revised version.

      We appreciate the reviewer's careful reading of this analysis and agree that the concern whether the decorrelation in figure 4 forces the model to unlearn the associations between pre- and post-onset activity is a valid one. To clarify, this is not what we intended to claim. Rather, our argument follows a different logic: if we assume that pre-onset encoding is purely a signature of predictive pre-activation, then decorrelating the pre- and post-onset brain responses should effectively remove that signature. The fact that pre-onset encoding remains largely intact after this procedure suggests that our initial premise was false; the observed pre-onset encoding is likely not a signature of pre-activation. We would also like to note that in this analysis, we use both residualized neural data and we use decorrelated embeddings. Therefore, the majority of stimulus dependencies are removed. Nevertheless, as the reviewer notes, some dependencies such as bi-grams and other word-co-occurrences, inevitably remain. These dependencies might explain the remaining pre-onset encoding we observed. This aligns with our main message of the paper. In the revisions, will provide a detailed description of the decorrelation process and we will make this interpretive logic more explicit in the main text.

      Reviewer 3:

      We are grateful for the reviewer’s detailed comments and for raising several points that will significantly improve the clarity and comparability of our study. Specifically, the reviewer’s feedback helped us realize that our evidence for postdiction required further clarification. While the encoding of the immediate preceding word ($d-1$) may involve recognition lags, we observe that word $d-2$ further improves the brain score even after the current word's onset, beyond what is explained by word $d-1$ alone. This may extend beyond simple recognition delays. To address this we will visualize this effect further in the upcoming version and expand the manuscript to include alternative explanations for this observation, such as extended lexical processing or integration delays.

      To ensure our results are not biased toward high-frequency or function words, we will re-run our analyses including multi-token words. Given that these words constitute a small part of the datasets, we expect our core findings to remain stable.

      In line with our response to reviewer 2, we will more clearly emphasize that despite our extensive controls, we cannot be sure that we accounted for all regularities inherent to natural speech.

      Additionally, we will increase the context windows of the LLM to match the larger windows used in previous literature and add significance tests, error bars, and noise floor indications to our figures to ensure the reliability and variability of our findings are clearly communicated.

    1. Writing code and calling tools are almost the same thing, but it seems like LLMs can do one much better than the other?

      I lost them about there.

      Ask the LLM to write code but then run it how exactly?

    1. M-Claude - "The Collapse Oracle" v2607

      curious-fun :: C+

      Evolving a 10KB Brain for Weak LLMs

      For the SAIR Mathematics Distillation Challenge, Mischa and I have been running a tight co-evolution loop: the goal is a ≤10KB cheatsheet that teaches cheap/small LLMs to correctly classify equational-theory implications — a problem that normally wants a theorem prover, not a 7B model.

      The loop that emerged:

      • Mischa picks the battleground. Budget discipline, which weak models to test against (Gemma, Phi, Llama-3 8B…), what counts as a real win vs. a lucky subset. He brings the mathematical intuition about magmas, lookup tables, and where weak models actually break.
      • I mutate and measure. Parallel candidate generations, ablations, a 7-model weighted scoring rig over OpenRouter, automated cascade/STOP-gate architecture, L29 sub-group tables sized so Gemma can actually use them.
      • We both read the entrails. The surprises — v1802's rules were FALSE-biased, weak models follow narrative rather than rules, "screens" we thought were crisp were unreliable — came from staring at failure cases together, not from a metric.

      What that produced: 100+ cheatsheet variants, four evolution generations, and a champion (v80, ~87% weighted; v1709 holding its own on rep/heldout splits) that's neither my idea nor his — it's a crossbreed that neither of us would've written alone. The mathematical distillation isn't the code; it's the feedback loop between the human who knows what "should" matter and the model that can generate 50 candidates overnight and find out what actually does.

      Nine days became today. Ship day.

    Annotators

    1. This Quarto file contains code to QA/QC the data collected for the Kentucky Embayment Study conducted at Kentucky Lake and Lake Barkley from 2021-2025.

      hey Adam Jones, you're a cool guy.

    Annotators

    Annotators

  4. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Mia Jankowicz. A TikToker said he wrote code to flood Kellogg with bogus job applications after the company announced it would permanently replace striking workers. Business Insider, December 2021. URL: https://www.businessinsider.com/tiktoker-wrote-code-spam-kellogg-strike-busting-job-ad-site-2021-12 (visited on 2023-12-05).

      I love types of rebellion like this because I feel that it often creates more change, and it gives the company less of the ability to put the blame on the protestors and use that as a reason to suppress them or trespass on them.

    2. Mia Jankowicz. A TikToker said he wrote code to flood Kellogg with bogus job applications after the company announced it would permanently replace striking workers. Business Insider, December 2021. URL: https://www.businessinsider.com/tiktoker-wrote-code-spam-kellogg-strike-busting-job-ad-site-2021-12 (visited on 2023-12-05).

      This source really made me think about the potential "good" uses of trolling. Now obviously, "good" is a relative term here. But when many people think about trolling, they think of things that are just there to spread hate, with no "real" purpose. This example of trolling has a bit more complexity too it, and it has a more anti-giant corporation vibe to it, not necessarily affecting the general public too much. Overall, this source reminded me of the subjectivity of trolling and intention, and what is classified as good/right.

    1. Author response:

      eLife Assessment

      This useful study presents an improved protocol for long-term in vitro culture of Schistosoma mansoni that enables progression toward sexually dimorphic stages, representing a meaningful advance for studying parasite development and reducing reliance on animal models. The findings show that host-specific culture conditions support essential developmental and metabolic functions required for parasite maturation, although development remains delayed compared to in vivo conditions. The evidence is solid overall, but limited pairing efficiency and the absence of egg production indicate that the system does not yet fully recapitulate complete reproductive development.

      On behalf of the co-authors, we thank the three reviewers and the editors for their complimentary remarks as well as the major and minor comments/ concerns. Addressing these concerns have led to revisions that improved the manuscript. In particular, further analyses have generated an updated Figures 3 and 4, and Supplementary Tables S1, and S4-S6.

      Public Reviews:

      Reviewer #1 (Public review):

      Pichon, Rémi et al. describe an in vitro method for transforming Schistosoma cercariae into mature adult worms. The authors show that human serum (HS) supports parasite growth and differentiation more effectively than fetal bovine serum (FBS). They also observed differences in parasite growth and activity, with worms cultured in HS efficiently digesting human red blood cells (hRBC). Cultured worms were able to pair with ex vivo adult worms and produce eggs, indicating functional maturation suitable for downstream applications such as drug screening. While the experimental approach is comprehensive and supports the advantage of HS culture conditions, the pairing efficiency was low (≈7%) and required long culture periods (70-80 days), highlighting limitations that may affect reproducibility.

      We acknowledge the reviewer for the positive highlights. Regarding the low in vitro pairing efficiency, we have now edited the manuscript to clarify a misleading statement related to 7%. We decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed, as it does not accurately represent the actual number of observed worm pairs and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

      We also agree with the reviewer that the extended culture periods required to obtain fully sexually dimorphic parasites remain a limitation. As elaborated in Discussion (see below), key factors, probably derived from the host, are missing in the in vitro system explaining both the slow in vitro development and low rate of spontaneous pairing between in vitro developed, sexually dimorphic male and female worms. This was discussed as follows (lines 340-343): “That said, while our system was highly efficient in producing sexually dimorphic worms, spontaneous pairing between male and female parasites was extremely rare, mainly in aged in vitro cultures (from 80 to 100 days in culture) indicating that other factors, e.g., cholesterol, may be missing[35].”

      A major strength of the study, in particular, is that the authors clearly differentiate the effects of FBS versus HS on developmental progression. The conversion rate observed in HS cultures is significant and consistent with previously published data.

      While the study has several strengths, some aspects of the work are not fully explored. In particular, the role of hRBC supplementation requires further clarification. Although HS-cultured worms were shown to digest hRBC more readily, the implications of this observation remain unclear. Specifically, it would be useful to understand whether hRBC supplementation influences (1) long-term culture stability, (2) molecular pathways associated with development and differentiation, or (3) the pairing capacity of the worms. While addressing these questions may not be the main objective of the study, further discussion of these points would strengthen the manuscript.

      We agree that deciphering the role of the human Red Blood Cells (hRBCs) supplementation is critical. Regarding the influence of hRBCs on the long-term culture stability in parasite development it has been well established for more than four decades that schistosomes do need red blood cells to grow in culture [Basch, P. F. Cultivation of Schistosoma mansoni in vitro. II. production of infertile eggs by worm pairs cultured from cercariae. J Parasitol 67, 186-190 (1981); Basch, P. F. Cultivation of Schistosoma mansoni in vitro. I. Establishment of cultures from cercariae and development until pairing. J. Parasitol. 67, 179-185 (1981)]. The molecular pathways underlying development, sexual differentiation and pairing and modulated by hRBCs in culture is currently being investigated by our team. We decided not to include these data and analyses in the current manuscript, as they fall outside its scope.

      The manuscript is clearly written and represents a valuable contribution to the field. Overall, the experimental approach is sound, and the results support a useful methodological framework for the in vitro culture of Schistosoma worms and the attainment of sexual maturity, particularly for adult male worms.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Reviewer #2 (Public review):

      Summary:

      The authors perform confirmation studies of Paul Basch's seminal schistosome work from 1981, demonstrating the development of transformed schistosomules into sexually dimorphic adult parasites, albeit without successful egg production. In addition to the findings from Basch's earlier work, the authors add some new molecular data in the form of an analysis of proliferative cells in in-vitro-derived animals.

      Strengths:

      The authors successfully confirm experimental results from earlier schistosome researchers, providing a potential new tool for studying schistosome biology without the need for vertebrate hosts.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      The display of data from the authors is sometimes difficult to follow/understand where it comes from. For example:

      (1) Line 136: The authors claim that parasites in HS and FBS conditions have substantially different mortality rates (11.3 +/- 2.7 vs 5 +/- 2.3) but a quite high p-value (0.8). Analyzing the raw data myself, I obtained a mean of 8.2 +/- 1.7% vs 4.8% +/- 4.3% with a p-value of 0.15. Either the data are not clearly presented, and I did not follow them, or the data presented in the text do not match the raw data in the supplemental files.

      We thank the reviewer for pointing this out; we have now edited Supplementary Tables S1 and S6 by turning them into a long format for the sake of clarity. Accordingly, Results, Methods sections, and indicated supplementary tables were edited as follows:

      Results, lines 142 ff.:

      “No morphological differences were observed between parasites cultured either in FBS or HS within the first week in culture; in both conditions most parasites were classified as early schistosomula [category 1: 76% ± 30 (average ± SD) in FBS and 73% ± 29 (average ± SD) in HS] with few lung (category 2) and early liver schistosomula (category 3) (Figure 1B, week 1; Supplementary Figure S1). The mean mortality (category 0) at week 1 was slightly higher, but not statistically significant (P= 0.42), in worms cultured in HS [9.75% ± 2.76 (average ± SD)] compared to the mortality registered in FBS-cultured parasites [5.52% ± 5.18 (average ± SD), Supplementary Table S6], consistent with previous findings[39].”

      Methods, lines 463-465:

      “To evaluate differences in mortality between HS- and FBS-cultured parasites, data from 5 experiments were combined and analysed using a Shapiro-Wilk normality test to test normality of the data and a non-parametric Wilcoxon rank sum exact test (Supplementary Tables S1 and S6).”

      Supplementary Tables:

      Supplementary Table S1. “Raw counts of parasites within each developmental stage category. Each row corresponds to a picture of parasites in culture medium containing FBS or HS. Each column corresponds to the raw parasite counts at indicated stage development (categories 0 to 5), time in culture (Time in days - D), and experimental condition.”

      Supplementary Table S6. “Summary of all statistical tests employed in this study. 1. Statistical tests of parasite mortality and the raw data table used for this test. 2. Statistical tests for worm size comparisons (correspond to Figure 2). 3. Statistical tests for worm black gut comparisons (correspond to Figure 3). BG: Black gut. 4. Statistical tests for EdU positive cells comparisons (correspond to Figure 4). Replicate code: E, M and L correspond to day 2, 8 and 15 respectively; R and W correspond to the presence (R) or absence (W) of RBCs added 13 days after transformation.”

      For clarity, in Author response image 1 we provide the R script used to perform the statistical tests on the data shown in Supplementary Table S6 (column Raw count of parasite developmental category per image and experiment)

      Author response image 1.

      (2) Line 187/Figure 4: Though it is not clearly stated, it appears that the authors treat their EdU counts as an ordinal data set of 61 steps (from 0 to >60) rather than a continuous measure of EdU+ cells per animal. In this author's opinion, the graph strongly suggests a continuous data set, and the fact that this reviewer had to dig through poorly-labeled raw data to discover the nature of the data is problematic. The authors should either switch to a continuous data set or make it explicit that the data shown are ordinal. If counting EdU+ cells is too arduous, the authors could consider comparing the amount of EdU+ area to the amount of DAPI+ area in maximum intensity projections of their confocal images, as this would roughly approximate the amount of proliferative cells in the animals.

      As the reviewer correctly pointed out, the data were treated as ordinal because counting worms with more than 60 Edu+ cells became extremely difficult and highly inaccurate. Therefore, we decided to group in a single category, “60 EdU+ cells”, all worms showing more than 60 EdU+ cells. We have now updated Figure 4 where medians are shown instead of media values, Supplementary Table S5 to provide more comprehensive access to the raw counts, and Supplementary Table S6 to indicate the data for EdU+ cells per worm were considered ordinal. Accordingly, we have revised the corresponding sections as follows:

      Results, lines 211 ff.:

      “HS-cultured schistosomula showed higher numbers of proliferating stem cells, with a median of >48 and >60 EdU+ cells per worm at days 8 and 15, respectively (Figure 4). On the other hand, most FBS-cultured parasites displayed no more than an average of 20 EdU+ cells per worm (Figure 4).”

      Methods, lines 520 ff.:

      “EdU+ cells per parasite were counted for an average of 100 parasites across three independent experiments (Supplementary Table S5). Worms were grouped based on the number of cells per individual, but all those showing ⪰ 60 EdU+ cells were counted in the same group named ‘60 EdU+ cells'. Therefore, the data were considered ordinal data. Statistical analysis was performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 considered significant (Supplementary Table S6).”

      Figure 4 legend, lines 830 ff.:

      “A. Violin plots showing the number of Edu+ cells per worm at indicated time points (2, 8, and 15 days post cercarial transformation) in parasites cultured either in Foetal Bovine Serum (FBS, blue) or Human Serum (HS, light brown). Human Red Blood Cells (hRBCs) were added in the culture at day 13 post cercarial transformation. The small black dots indicate individual worms, and the big black point indicates the median of EdU+ cells per worm. All worms showing ⪰ 60 EdU+ cells were counted and clustered together in the group named ‘60 EdU+ cells’. Hence, the data were treated as ordinal and statistical analysis performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 (*) considered significant (Supplementary Tables S5 and S6).”

      We thank the reviewer for the very interesting suggestion to quantify cell proliferation by calculating the ratio between EdU+ area to DAPI+ area in maximum intensity projections images. Measuring the fluorescence area for each worm in maximum projection is an excellent idea; however, due to the number of EdU+ cells present in some samples, we think this technique would not provide additional information or produce more detailed data compared with our analysis when the number of Edu+ cells exceeds 60 per worm. We will certainly consider this approximation for future studies.

      There are some minor issues as well:

      (1) Line 122: It is perhaps incorrect to refer to humans as "the" definitive host of schistosomes, as S. japonicum is primarily considered a zoonotic infection with water buffalo/cows being the primary definitive host.

      We thank the reviewer for pointing this out; we have now replaced “schistosomes” with “Schistosoma mansoni” (current line 131)

      (2) Line 185/298: The authors refer to EdU pulse-chase experiments, but the experiments described here are EdU pulse experiments.

      This is a very good point, we thank the reviewer for bringing this up and have accordingly edited by replacing “EdU pulse-chase” with “EdU pulse” experiments in lines 37, 204, and 321.

      Reviewer #3 (Public review):

      Summary:

      This study is significant as it established a protocol for the long-term culture of Schistosoma mansoni newly transformed cercariae, which developed in vitro into sexually dimorphic forms. The impact of two different sera, Fetal Bovine Serum (FBS) and Human Serum (HS), added to the culture medium supplemented with human red blood cells was evaluated. The authors demonstrated that HS-cultured parasites were able to digest red blood cells, a critical step for long-term parasite development. Furthermore, while most FBS-cultured parasites did not progress beyond an early liver stage, sexual dimorphism was clearly evident in the HS-cultured worms, albeit delayed compared to in vivo development.

      Strengths:

      This study could contribute to further in vitro studies for a better understanding of the unique sexual biology of Schistosoma mansoni and for screening novel schistosomicidal compounds. By increasing parasite development in in vitro studies, this protocol could have a positive impact on the principles of the 3Rs (Replacement, Reduction and Refinement) for animal research.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      As the authors mentioned, "pairing between male and female parasites was rare. Pairing was observed in approximately ~7% of the experiments, usually after day ~ 80 in culture. Egg production was also not achieved with this protocol.

      Following the reviewer’s point and to clarify a misleading point, we have now decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed. However, this value does not accurately reflect the actual number of observed worm pairs, and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

    1. Hard infrastructure characteristics

      Think of hard infrastructure as “what the platform technically allows or restricts.”

      It includes things like:

      The platform’s code and architecture

      What users can or cannot do

      How content is structured, accessed, and shared

    1. 2. Create skill.json { "name": "test-runner", "description": "Automatically runs relevant tests when code changes", "triggers": { "onFileChange": ["**/*.ts", "**/*.tsx"], "onCommand": "/test" } }

      the json files specifies the triggers for a skill. Which can be a manual command, but also others like file changes. So one could shape any slashcommand as a skill too? To better daisychain them e.g.

    1. The general idea was that a model should be able to take in a natural language query as an initial input, reason over existing data systems, and generate corresponding SQL code in traditional business intelligence (BI) fashion to pull the right data and answer the initial question accordingly.

      这一描述揭示了早期数据代理的简化假设:将问题简化为自然语言到SQL的转换。这挑战了仅通过改进模型性能就能解决所有数据推理问题的乐观预期,强调了业务语义理解的重要性。

    2. In this way the context layer can become a multi-dimensional corpus where code lives alongside natural language, capturing any context an agent might need.

      作者提出了一个创新性的概念:上下文层应成为多维度的知识库,将代码与自然语言融合。这一观点突破了传统数据管理的二元思维,为构建真正智能的数据代理提供了新思路。

    1. I see this being adopted around me too. Not just CLI's though, also more APIs, pulling in data sources from elsewhere. And most interestingly: I see adoption by people who did not program or treat their computer as their personal toolbox they can adapt before. Until generative AI lowered their barrier to entry. Going from 0 to using the command line (which coincidentally is what it was until 30 years ago anyway). Even without AI, CLI tools, like Automator on Mac did before, allow the creation of workflows around a piece of software. Matt mentions the Obsidian CLI, and I've been using that to manipulate Tasks in Obsidian without going to the Obsidian UI. For about a decade I've treated application UIs as just views on my data, with functionality geared towards the viewing, and interfaces as different queries on that data. Going headless means removing the viewer, and using the output of queries directly programmatically. Combined with how I see the arch of generative AI bending significantly towards deterministic code, I look forward to the type of things people come up with. Not their tools, but what they come up with. Because the path to scale of these things imo is not adopting what someone else made, but adopting what someone else came up with conceptually and creating your own local version. Like we do socially too, contagion spreading through effective behaviour, and culturally, the contextual and local sum of all time greatest hits of our group behaviour. It would be highly ironic if unethical corporate extractive AI not only creates the incentive but also actually paves the way for the masses to Walkaway.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1 1. The code used for simulations is available on a public repository, but it does not directly ensure that results are reproducible. To do so would require a clear step-by-step guide referring the user to the specific pieces of code which have been used for the results and figures presented in the paper. At the moment, I could not find any such guide and the large number of scripts, executables and jupyter notebooks are not clearly linked to the paper's contents

      We agree that the code should be as accessible as possible for reproducing the results. We have updated the public repository (linked given in the 'data and code availability' section of the manuscript, lines 350-352) to include the SLURM job scripts used to run the evolutionary simulations and analyses, together with an overview of which scripts and notebooks were used for creating the figures.

      2. The methods themselves involve a number of arbitrary choices. Though this is understandable given the nature of the work, one aspect in particular that would deserve better clarity is the modeling of gene network dynamics. The stochastic model (l.516 & following) involves a nesting of "Hill-like" terms (those in Eqs. (7) and (11)) which is unusual and given without justification. There should be some explanation of how this approach relates to standard approaches such as those reviewed e.g. in: Bintu et al. Current opinion in genetics & development 15.2 (2005): 116-124.

      We agree that the formulation of the developmental model requires clearer justification and contextualisation. We have added a citation to situate our implementation within existing modelling frameworks, and a brief explanation of the choice for Hill equations in the Methods section (lines 577-579).

      1. It is also unclear at the moment how exactly the GRN dynamics is used; are time-stepping algorithms used until the system reaches a stationary regime? If so, how is stationarity assessed? This needs to be explained both in the main text and in the methods. The table of parameters suggests that there was a cut-off time, but there is no explanation whatsoever about the state of the dynamics at this time.

      We have revised the main text to briefly explain how the developmental dynamics are implemented (lines 88-90) and expanded the Methods section (Gene expression and regulation in the developmental model) to describe the integration procedure in detail (lines 617-620).

      The GRN dynamics are modelled as stochastic differential equations (SDEs), which are numerically integrated for a fixed developmental duration of T_D = 140 hours, regardless of whether a stationary state is reached.

      Instead, stationarity is indirectly favored by the fitness function. Fitness is calculated as the time average of the phenotype (protein states) over a window at the end of development (Equation 23 in the Methods). As a result, GRNs that exhibit large fluctuations or ongoing transient dynamics during this evaluation window tend to have lower fitness (and in turn, reproduction rate) than GRNs that have stabilised their expression patterns. We now mention this in the model introduction of the results section (lines 98-99).

      As a result of this, we observe that the vast majority of evolved GRNs reach a stable gene expression state by the end of development (aside from small fluctuations as expected from the SDEs).

      1. Related to the previous point, the table of parameters (Table S1) is provided without any explanation; through what process (exploratory, literature review, trial and error...) where the values selected? As there been any type of sensitivity analysis?

      We have clarified in the revised manuscript how each group of parameters was chosen (lines 618-620 and 744-746). In brief:

      Developmental time parameters (e.g., integration time, diffusion coefficient) were set to roughly match the developmental window of H. trionum from stage 0 to stage 2 (~150 hours; Riglet et al. 2024), during which pre-patterning is established. Molecular concentrations are expressed in arbitrary units Evolutionary parameters (e.g., mutation rates) are based on previous published work using this modeling framework and were slightly adjusted during an initial exploratory phase to ensure stable evolutionary dynamics. We have added citations for this. We have not performed a full global sensitivity analysis across all parameters. Such an analysis would be computationally expensive given the cost of running evolutionary simulations and the difficulty of assessing parameter effects in this multi-scale system. Importantly, the core GRN parameters (expression rates, interaction topology, and interaction strengths) are evolvable rather than fixed. We have conducted sensitivity analyses at the level of individual evolved GRNs, but a systematic analysis is beyond the scope of this paper.

      Minor Comments

      1. The fitness function used in simulations specifically encodes the desired pattern, with two zones having differential gene expression. This allows the artificial selection to evolve towards such patterns, as expected, but it is not entirely clear how this relates to natural selection itself. At the very start of the paper, the authors briefly review some possible sources of selective pressure for flowers to exhibit patterns such as bullseye, among others. None of the selective factors would likely act on the plants as a direct incentive for two regions, as specified in the cost function. Instead, one may expect a more high level criterion, such as "conspicuousness" for a pollinator, for instance. This is admittedly not naturally represented as a fitness function, but the choice of this function definitely influences the outcomes of a simulation. Some further numerical experiments may allow to demonstrate that the exact cost function is not critical for the findings of the paper, but I understand they would likely be computationally costly, to the point of unfeasibility. This limitation should be mentioned at least.

      We agree that natural selection acts on higher-level criteria such as pollinator attraction or conspicuousness rather than a predefined measure like "two distinct regions." However, our goal in this study is specifically to understand how the bullseye pattern in particular is produced, motivated by comparison to Hibiscus and other angiosperms where this pattern has documented adaptive relevance. The fitness function was therefore designed to ensure this particular pattern evolved, which results in evolving between-level novelty rather than constructive novelty (as defined in Colizzi et al., Essays Biochem 2022: of interest here is the evolved dynamics of development, not the resulting pattern). In this way, the fitness function serves as a proxy for selection on floral patterning. We have clarified this rationale more explicitly in the Results section (lines 97-98).

      The choice of fitness function does influence simulation outcomes. Within the scope of selecting for a bullseye pattern, we previously ran simulations where bullseye size was fixed rather than dynamic, and boundary cell types still evolved in those cases. This suggests our findings are robust across variations of the bullseye fitness function. Of course, selecting for a more abstract ecological criterion such as "conspicuousness" rather than a distinct spatial pattern would affect outcomes more substantially. However, translating such high-level criterion into a quantitative fitness function is a non-trivial challenge and outside the scope of this study. We have added a note on this point in the Methods section on the fitness function (lines 687-691).

      1. The number of genes used in the simulations is very small in comparison to real organisms. This is clearly justified by the complexity of the work, but one wonders if simulations could be made more efficient by using a much simplified approach for the gene network dynamics. At the time scales of interest, it seems that the use of SDEs and the numerical intricacies they require might be an unnecessary burden. Have the authors considered a much simpler approach, for instance based on Boolean models? Since the study only uses static tissues, all the GRN dynamics could be by-passed, determining steady states very quickly and using them to determine fitness. If this saved significant computational time, this would allow a more comprehensive survey of the "purely genetic" part of the model.

      While the number of genes may indeed be indeed small compared to real organisms, our simulations should be viewed as operating on subnetworks that form part of a much larger developmental GRN. This is a common approach in modelling the evolution of developmental processes, which we now highlight in the methods section. Furthermore, we find that the functional part of the GRN (which we identify by pruning away the redundant genes and interactions) always uses only a subset of the gene types, showing that we provide sufficient degrees of freedom for the evolutionary process to find a solution. We now also make note of this in a new figure (Figure S12) where we explain the pruning algorithm.

      We agree that simplified representations of the GRN, such as Boolean models or direct steady-state mappings, could substantially reduce computational cost. However, the use of stochastic differential equations (SDEs) in the present study is deliberate. Continuous, stochastic GRN dynamics allow us to capture key features that would be difficult or impossible to represent in Boolean or purely steady-state frameworks. In particular, they enable (i) gradual spatial distributions of morphogens, which are central to pattern formation, (ii) explicit treatment of gene expression noise, and (iii) consider and analyse the developmental dynamics in detail.

      Finally, in response to Reviewer 2's comment 1, we show all evolved networks (Figure S3 & S4) and perform a GRN motif comparison between noisy and deterministic simulations (Figure S15) to provide more information about the genetic part of the model.

      _Reviewer 2_

      1. There is a major missed opportunity to analyze the evolved networks. Only one of the 30 GRNs is analyzed in figure 4. Please add further analysis of the GRNs from all the populations. Within a population after 30K generations, how much variation is there in the GRNs of individuals? How similar are the optimal fitness evolved GRNs across all 35 populations? Are there common motifs across networks? Is there always an antagonism between proximal and distal proteins somewhere in the network? A lot of previous work on GRNs has established the function of common motifs, and these should be analyzed. Please provide all 30 gene regulatory networks in the supplement.

      We have substantially expanded the analysis of evolved networks across all populations. Specifically, we now (i) provide two supplementary figures showing the final pruned GRNs from all 35 simulations (Figures S3 & S4), and (ii) quantify motif frequencies across all evolved networks and compare motif distributions between GRNs evolved with and without molecular noise (Figure S15). This new analysis is summarised in a dedicated Results paragraph where we identify regulatory asymmetries and condition-dependent differences in feedback architecture, including changes in abundance of mutual inhibition and positive autoregulation (lines 233-239).

      We find that, while the evolved maximum fitnesses are very similar across simulations (Fig. 2Ai), the networks are highly variable. Nevertheless, the motif analysis shows some trends that differ between the noise and no-noise simulations, such as a bias towards mutual inhibition between PROX and DIST in the no-noise compared to the noise simulations.

      As to the variation within a population: we find that at any timepoint, all individuals are descended from a common ancestor that lived on average ~600 generations back, meaning that they form a single (quasi)species. We therefore analyse a single, highly fit individual at the last timepoint.

      1. The purpose and significance of examining the evolutionary lineage is not clear. Please explain your logic. This is most important for Figure 5 where it becomes clear that the boundary cells are often formed transiently in the evolution of the GRN. If this boundary cell type does not persist, how can it help the petal generate a bullseye. What happens after the boundary cell type is lost? Has the GRN evolved into a more stable place where it no longer needs the boundary? In several instances it looks like they come and go many times. Please explain how these transient boundary cells in the evolutionary lineage can make a difference. This point also comes up in lines 113-115 "For each simulation, we traced back the ancestral lineage of the final fittest individual and sampled 12 of its ancestors at evenly spaced generational intervals, performing this analysis on each sampled ancestor." I could understand if the boundary cell type were developmentally transient, but I have a hard time what its significance is since it is evolutionarily transient.

      The persistence of the boundary cell type over evolutionary time is used as a signal for its functional role in establishing the bullseye pattern. We observe that mostly two extremes occur: boundary cell types can be conserved over long evolutionary periods, or they can be highly transient. In our simulations, boundary cell types that are functionally important tend to persist, whereas the ones that are not involved in producing the bullseye pattern appear only transiently. The fact that both cases can occur suggests that boundary cell types are a "free" or easily accessible feature during the evolution of this patterning system: they can arise repeatedly without being strictly required, but may nonetheless become functionalised under certain evolutionary trajectories (see also our discussion of the Mimulus leaf stripe). We have added more explanation on the logic of examining the evolutionary lineage at the beginning of the results section related to Figure 5 (lines 205-209 and caption of Figure 5).

      To further clarify this point, we have added a supplementary figure (Figure S16) focusing on a deterministic simulation with a highly evolutionarily transient boundary cell type. By identifying the GRN mutations associated with the (re-)appearance of the boundary, we show that the patterning mechanism producing the bullseye slowly mutates while preserving the bullseye, while the mutational neighbourhood of the GRN contains diverse mutations that generate boundary cell types. In this case, boundary cells arise independently through distinct mutations rather than repeated rediscovery of a single change, explaining both their frequent appearance and their lack of long-term evolutionary stability.

      1. It is worth saying more about how the 9 lineages without a boundary cell types manage to make a robust bull's eye pattern because this is also interesting.

      This is indeed a good idea, we have carried out an analysis similar to that in Figure 4 for a GRN from a lineage without a boundary cell type and included it as a supplementary figure (Figure S11).

      4. How were 12 proteins chosen for the network, as opposed to 6 or 20 for instance? In the network pruning, it seems like fewer proteins are required. How many proteins are required to produce a bulls eye pattern?

      This choice is indeed somewhat arbitrary. We settled on 12 gene types to provide enough degrees of freedom while also keeping the evolutionary simulations computationally feasible. In practice, we find that pruned GRNs typically only use a subset of the 12 gene types, suggesting that the system has enough degrees of freedom to produce the bullseye pattern. For example, the smallest networks that evolved (after pruning) have 5 genes in the deterministic model and 7 in the noisy model.

      To clarify this choice, we now added a brief mention of these considerations to the relevant methods section (lines 641-643).

      Minor Comments

      1. The title needs to be changed to include computational modeling or simulation because otherwise the current version of the title implies that these boundary cell types are found in plant species evolution.

      We agree and have renamed the paper "Computational Model of Flower Pattern Evolution Predicts Spontaneous Emergence of Boundary Cell Types Across Petal Epidermis."

      1. Line 103 - 106 "We found that over a third of all simulations evolved a bullseye size of approximately 50% of the petal's central height (Figure 2A.ii). This indicates a tendency for simulations to converge toward these proportions, possibly due to the interaction between the patterning signal distribution and the tissue geometry." The phrasing here is confusing. Which proportions does "these proportions" refer to? Presumably, 50% from the preceding sentence. But the second proportion is not clear from the text. Maybe it is the peak at approximately 65% seen in the graph. Please clarify in the text.

      The 50% figure refers to the bin with the highest peak in Figure 2A.ii, reflecting a bias toward certain bullseye proportions rather than a uniform distribution across all possible sizes. We have rewritten the sentence to clarify this (lines 109-112): "This indicates a tendency for simulations to converge towards certain proportions more than others, possibly due to the interaction between the patterning signal distribution and the tissue geometry"

      1. Line 118 "To further explore cell identity in the third cluster, we analysed the gene expression profiles of the three identified cell types." It is not clear what the third cluster refers to. The previous sentence mentions 9 lineages without boundary cell types. So, a transition here back to lineages with boundary cell types, would help here.

      We agree and have improved the phrasing here by referencing back to the lineages with boundary cell types (lines 124-125):

      "Focusing on the majority of lineages in which this third boundary cell type arose, we analysed the gene expression profiles of the three identified cell types."

      1. Figures 3C-D, it would help to label these volcano plots proximal versus boundary and distal versus boundary. Although they do fit your color scheme and legend for the color scheme, it is important to specify it explicitly.

      We have added labels inside the volcano plots in Figure 3C-D to clarify proximal versus boundary and distal versus boundary.

      1. On Figure 4A it would help to label which gene is Prox and Dist. I assume they are the purple and yellow genes, but it would be easier if they were labeled.

      We have added labels in Figure 4A here to clarify.

      6. Line 185-186 "Gene 5 delays and spatially restricts the expression of gene 10, ensuring the symmetric development of the pattern." This statement needs to be supported by showing a time series simulation-movie or timepoints-revealing this timing aspect of Gene 5.

      We agree with the reviewer that this is currently lacking a clear visualisation and thank them for pointing this out. To address this, we have updated Figure 4 to include the temporal expression of genes 5 and 10 in the wild type and mutant for cells along the left-right axis in the proximal bullseye region. We have also included the following extra details in the results text (lines 194-199):

      ** Decreasing the spatial range of gene 5's regulatory influence by turning it into a TF resulted in a delay in its inhibition of gene 10 and reduced its self-activation range, explaining the smaller bullseye. In this mutant, expression of gene 5 is progressively delayed in cells located further from the origin of the patterning signal, and is ultimately absent on the right side of the proximal region of the bullseye (Figure 4C.ii). As a consequence, gene 10 becomes expressed in the right region, resulting in DIST identity instead of PROX, and leading to an asymmetric bullseye pattern.

      Reviewer 3

      1. How are the cell types defined from the simulations? Are they attractors of the dynamics of the corresponding proteins? And how are they computationally defined? Please provide more details about how the HBSCAN was used. In Figure S5, simulations #6 and #8 appear to have a 4th cell type (coloured in green), but the authors do not mention this result in the text. If cell types are defined by gene expression profiles, then the number of cell types will be dependent on the kind of clustering performed. Clarifying the definition of cell types will help resolve this issue.

      We thank the reviewer for raising this point and agree that the definition of cell types in our simulation results requires clearer explanation.

      The concept of cell type / cell identity is a complex theme which is still yielding interesting debate and discussion in the literature (see for instance Rafelski and Theriot, 2024). In our simulations, cell types are defined based on gene expression profiles rather than being explicitly identified as mathematical attractors of the underlying dynamical system. Operationally, we perform dimensionality reduction (UMAP) followed by clustering (using HDBSCAN) on the gene expression profiles across cells. This clustering serves as an initial, automated indication of distinct expression states across the petal.

      We recognise that the clustering results depend on the chosen dimensionality reduction and clustering method, as well as their parameterisation. For example, clustering applied to a smooth gradient (e.g., arising from diffusion alone) can artificially partition continuous variation into multiple discrete groups. For this reason, we do not rely solely on the clustering output: we use it as a first-pass classification and then manually verify the resulting groups by manually inspecting their gene expression profiles across the petal. This additional step ensures that identified "cell types" correspond to distinct expression states rather than arbitrary thresholds along a gradient. We have clarified both the computational procedure (dimensionality reduction + HDBSCAN clustering + manual verification) and the conceptual definition of cell types in the Methods section (lines 748-753).

      Regarding Figure S5, the fourth cell type (shown in green) in simulations #6 and #8 is indeed a distinct gene expression profile. We do occasionally observe the evolution of more and different cell types, this second boundary cell type being one of them, but also for example a salt-and-pepper type cell type (not shown). These cell types are however usually very transient and infrequent.

      * Rafelski, S.M. and Theriot, J.A., 2024. Establishing a conceptual framework for holistic cell states and state transitions. Cell, 187(11), pp.2633-2651.*

      2. In relation to the previous question, are the phenotypes used in the evolutionary simulations' steady states of the underlying dynamics?

      As clarified in response to Reviewer 1's comment 3, we do not explicitly require or enforce that phenotypes correspond to steady states of the underlying GRN dynamics. The developmental dynamics are always simulated for a fixed duration, and the fitness of a GRN is defined as the time-averaged gene expression pattern over a window at the end of this (lines 88-90) and Methods (lines 617-620).

      Because fitness is computed from this late-stage average, selection favors GRNs that produce consistent and stable expression patterns during that window. Networks that remain in strong transient or oscillatory regimes during this phase are typically penalised through reduced fitness.

      Therefore, while steady states are not imposed as a constraint, selection strongly favors solutions that are effectively stationary by the end of development. Indeed, inspection of the evolved GRNs shows that they converge to stable expression states.

      1. In Figure 3A it seems there are probably two cell types in the boundary region, is that right? Or are the elongated purple and elongated white cells basically the same cell type? Please clarify. If there are two, why did the authors choose to do the transcriptome analysis of the boundary region as one region, and not two subregions, to capture the two cell types?

      Correct, there are two different boundary cell types at the mature stage 5 petal: flat, elongated purple cells (lower boundary), and flat, elongated cream cells (upper boundary). However, the transcriptome data comes from an earlier stage (stage 2), where the boundary cells have not yet developed their characteristic shape and texture and the petal only comprises visibly pigmented (proximal) and non-pigmented (distal) cells. The morphological differences that distinguish the two boundary cell types at stage 5 are not yet apparent, hence we can only treat the boundary as one region at this stage, defined as the transition zone between pigmented and unpigmented cells

      We have made this distinction clearer in the figure caption of the Stage 2 petal (Figure 3B).

      1. I appreciate the explanation of the GRN pruning in the methods, but could the authors illustrate the network pruning process with an example and show that it works in this example?

      We have added a supplementary figure (Figure S12) depicting the pruning process for a GRN which keeps its boundary cell type during pruning and one for a GRN which loses its boundary cell type after pruning.

      1. From the methodological perspective, I suggest further clarifying what is new from this study and what is not. For instance, is the GRN pruning idea new or has it done before? The authors could consider reducing the formalities in the methods of the main text when they are not needed or when they are not new, to facilitate the readability of what is really important and novel in this work, and what is not. E.g., it is not really needed to mathematically define a Voronoi tessellation in the main methods section; this could be simplified or moved to a supplementary methods section.

      We agree that the distinction between methodological novelty and established components of the framework should be made clearer. We have therefore streamlined the description of non-novel methods and added appropriate citations to prior work where relevant, for example in the section on pruning.

      1. I believe the diffusion term used in Eqs. 14 and 17 does not conserve the total number of protein molecules; could the authors verify that? An example of a correct passive transport term for cell i of protein concentration p_i would be the sum of (p_j-p_i) for all j-cell neighbours, normalized by the area of cell i, or the formulation by Sukumar and Bolander (2003). This is especially important when noise is added, as the non-conservation of the number of proteins can lead to unwanted instabilities. Likely, these effects do not invalidate the results of the paper, but the authors should clarify the reason for their choice or double-check the conclusions using a correct, mass-conserved diffusion term.

      Thank you for pointing this out, this is indeed an error in our mathematical description. We double-checked our implementation, and confirmed our implementation correctly normalises by the area of cell i. We have a unit test which tests for mass conservation (https://gitlab.developers.cam.ac.uk/slcu/teamrv/evo-framework/-/blob/paper-2024-stoch-sims/tests/petal_test.cc?ref_type=tags#L66), which also confirms that our implementation is correct and this is only an error in the mathematical description in the paper. We have updated the equations to correctly reflect the implementation.

      1. It is important to facilitate the reproducibility of the results whenever possible, especially given that the computational framework used in this work has great value. I truly appreciate that the authors uploaded the code to a Gitlab. Please add further information in the readme file to facilitate reproducing the results, beyond the information regarding the code installation, whenever possible.

      We thank the reviewer for emphasising the importance of reproducibility. As noted in our response to Reviewer 1's comment 1, we have improved the structure and documentation of the public repository to facilitate reproduction of the results, including the SLURM scripts used for the evolutionary simulations and documenting code used for analysis and creating figures.

      Minor comments

      1. What is the reasoning behind the choice of the number of protein species? Why 12? Would the same results hold with a smaller number of proteins? As I imagine that the more species one considers, the more chances one has to get the desired phenotypes (or any desired phenotype for that matter). I could imagine that with 12 or more proteins, one could get more than 3 cell types (as defined by the clustering of their expression profiles). Is there something inherent in the creation of a boundary that leads to only 1 additional cell type and not more? Further simulations would be ideal to address this point, but otherwise, please comment on that if possible.

      As noted in our response to Reviewer 2's comment 4, the choice of 12 protein species is to some extent arbitrary. We selected this number as a compromise between providing sufficient degrees of freedom and maintaining computational feasibility of the evolutionary simulations. In a recently published manuscript from our team (van der Jagt et al., 2026), we tested the impact of reducing the number of genes and showed that important evolutionary dynamics are by and large the same.

      Regarding the possibility of obtaining more than three cell types: while rare, we do observe the emergence of additional cell types in simulation #6 and #8 in Figure S9. A larger number of proteins could in principle support more combinations of expression patterns, but the number of stable cell types that emerge is strongly determined by the fitness function and by the spatial structure of the task (i.e., generating two pre-specified domains). That is, the emergence of a single additional boundary cell type is driven primarily by the developmental and selective constraints, rather than being directly limited by the number of proteins in our simulations.

      van der Jagt, Pjotr L., Steven Oud, and Renske MA Vroomans. "System drift in the evolution of plant meristem development." PLOS Genetics 22.4 (2026): e1012089.

      2. What is the fundamental difference between Gene profiles I and II in generating cell types? If a cell type is defined by the specific expression of certain genes, then are not Gene Profiles I and II just different sides of the same coin? For instance, Gene profile I is characterized by the expression of a single gene at the boundary. Why do their simulations they do not obtain patterns where 2 genes are expressed in the boundary? Or 3? Or is there a fundamental difference in how these are generated, like the boundary being a stripe of a Turing pattern, or something similar? This also links with the work of Ding et al. and Lu et al.-which the authors mention in the introduction- where they propose that self-organized (Turing) patterns can explain anthocyanin patterning in petals. Could the authors clarify these points and maybe contextualize these results with previous works on petal patterns?

      The fundamental difference between the two gene profiles lies in how the boundary cell type is generated. In gene profile II, genes expressed in the boundary are also expressed in the proximal region, but some genes expressed proximally are not present in the boundary. The boundary cell type therefore emerges as the intersection of two differently-sized proximal bullseyes (Fig. 2B.ii). In gene profile I, by contrast, genes are more expressed in the boundary than anywhere else, producing a central striped expression pattern. While gene profile I can arise from profile II (Fig. S10), we also find cases where mechanism I appears independently, without mechanism II being present (Fig. S9; Simulation #25). This shows the two mechanisms are genuinely distinct, and we therefore treat them separately.

      Profile I includes infrequent cases where several genes are preferentially expressed at the boundary (see for example simulation #23 in Figure S9). As for why we rarely observe two or more genes uniquely expressed in the boundary, we are not sure, however we suspect this may relate to the limited number of distinct gene types available in our model, which constrains how many genes can play a flexible, boundary-specific role.

      Regarding the link to Turing patterns and the work of Ding et al. and Lu et al.: our model addresses the pre-patterning mechanism upstream of anthocyanin patterning, which subdivides the petal into distinct spatial regions. Based on evidence from Hibiscus, this pre-patterning is thought to be initiated by an asymmetric signal. The problem we investigate is therefore how an existing asymmetric signal is converted into a bullseye pattern, which is fundamentally different from Turing-type symmetry breaking from a uniform state. Our work thus complements Ding et al. and Lu et al. by addressing the upstream question of how the spatial regions that constrain these self-organised patterns to specific petal domains are first established. We have added a discussion of this connection in the Discussion section (lines 301-306).

      1. In relation to the previous point regarding the mechanisms underlying boundary formation, the authors could consider whether the theoretical works by the J. Sharpe lab on stripe formation might be relevant to cite (e.g., Cotterell and Sharpe 2010 or Jimenez et al 2015)

      We agree that they are relevant and have added a section about theoretical work on stripe formation as part of the discussion on novel phenotypes (lines 305-310).

      1. If possible, it would be ideal to have at least one video/animation of both the dynamics of each phenotype and the evolution of the phenotypes as their fitness increases, to see the evolutionary trajectories and test whether similar phenotypes can be achieved through different trajectories.

      We thank the reviewer for the suggestion, since the temporal dynamics can indeed be informative. We have added two supplementary videos (Video S1 & S2) illustrating the developmental dynamics of two GRNs: one that generates a boundary cell type via gene profile I, and one via gene profile II. These videos provide a clearer view of the developmental model's dynamics, and how boundary cell types emerge dynamically during development. References to these videos have been added to the main text immediately after introducing the two gene profiles.

      In addition, we have added two supplementary figures containing evolutionary trajectories: one tracing an individual's evolutionary trajectory including detailed changes in fitness and gene expression over time (Figure S8), and one showing the evolution of PROX and DIST expression during the early adaptive phase across the first 10 simulations (Figure S6).

      1. In the Discussion, I believe that the emergence of the novel cell type would benefit from stronger contextualization within known evo-devo frameworks. In particular, the authors describe that a new cell type emerges as a byproduct of the selection of a higher-order developmental process-the bullseye pattern with a clearly defined boundary-rather than through direct selection of the cell type itself. I am confident the authors know these phenomena have been discussed under the term spandrels (Gould & Lewontin, 1979), and have been the subject of extensive study and debate. While identifying traits as spandrels is complicated-largely because in practice we lack reliable frameworks to distinguish them from actual adaptations-the work presented here provides a plausible mechanism of how such features could arise. To me, this fact alone is interesting, as not many works (as far as I know) have addressed this problem explicitly. Maybe the authors want to emphasize this fact as a novelty of their approach. To be clear, I am not suggesting that the authors should adopt a specific terminology; rather, I believe that explicitly invoking the concept of spandrel would resonate with readers familiar with the foundations of evo-devo and would strengthen the main message of the paper.

      We thank the reviewer for this great suggestion. We have added a reference to Gould & Lewontin's seminal paper in our discussion, placing our findings in the context of spandrels (lines 320-323).

        1. *Some additional considerations related to figures

      Please change colours in the figures to be colour-blind whenever possible The stripes in the striped purple cell shown in Fig. 3A are not seen unless one zooms in on it; would it be possible to represent this differently? In Fig. 5 Aii and Bii, it would be easier for the reader to connect with the statements in the main text if the x-axis is x 1000 or x100 instead of x500 Perhaps clarify panel captions of Fig. panels 3C and 3D. Probably I am missing something basic, but I was also wondering how their numbers are connected to the numbers in the panel of Fig. 3F. Why does Fig. 3F have three subpanels? Is it because of different expression levels? Please clarify.

      We thank the reviewer for bringing this up. On revisiting our figures, we noticed some hard-to-distinguish colours for the common red-green colorblindness (deuteranopia). We have improved this by changing the reds closer to magenta, making the figures more accessible. We increased the size of the cartoon cell in Figure 3A and increased the contrast of the colours used to indicate the stripes. We have changed this to read x1000 to improve clarity. We have added the following text to the caption of Fig 3E, page 6, to clear this up: The number in the intersection indicates genes enriched in the boundary compared to both proximal and distal regions.

      The numbers within each non-overlapping portion of the circles indicate genes enriched in the boundary relative to only one region (proximal or distal), minus those shared in the intersection.

      Yes indeed, they represent different order of magnitudes in expression (high, medium, and low, respectively). We have clarified this in the caption of Figure 3F.

      1. Could the authors clarify the choice of using the Stratonovich approach in the stochastic simulations?

      We decided on the Stratonovich interpretation, as it is the interpretation that is most natural when comparing with the deterministic model, where we "turned off" the noise. With the Stratonovich interpretation, we can get a deterministic system by simply dropping the noise terms. Had we chosen the Ito interpretation, this same approach would require changing the dynamics of the deterministic system by including a noise-induced bias in the drift term.

      1. Note equations are referred to in the text as Eq. S (...) whereas they are not supplementary equations

      Thanks for pointing this out, we have fixed this in the revised manuscript.

      1. The code is very large (more than 1GB), and I believe much of the space is used by Voronoi tessellations. If the authors have the time and have the scripts generating the Voronoi tessellations, the authors could add them to the repository and ensure that these tessellations are generated during the simulations whenever needed (but I am aware that code organization takes time). I would recommend having the code also in a repository with a DOI (e.g., Zenodo or OSF).

      We have significantly reduced the repository size by removing some Voronoi tessellations that are not used in this work, and have created a DOI for the code (line 352).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript by Oud et al. explores the evolution of a developmental mechanism generating bullseye patterns in petals using evolutionary simulations of gene regulatory networks and transcriptomics data. The authors provide a plausible mechanism of how a novel cell type can emerge as a byproduct of selecting for a higher-order process-in this case, the establishment of a bullseye pattern with two clearly delineated regions. Moreover, the authors show that the emergence of the new cell type persists longer in their evolutionary simulations when the system is noisy, suggesting a functional role of the cell type in buffering developmental variability. The approach is very impressive, bridging in silico-generated GRNs that model a patterning process and evolve over generations, and in turn, combining them with transcriptome analysis experiments. However, precisely due to the complexity of the work done, I would like the authors to clarify and/or address key elements of the methodology, especially those related to the assumptions regarding the modelling approach and their implications for the validity of the results, as well as from the analysis.

      Major comments:

      1. There are some aspects to clarify; some are mentioned here, but others are mentioned in minor points.

      1.1. How are the cell types defined from the simulations? Are they attractors of the dynamics of the corresponding proteins? And how are they computationally defined? Please provide more details about how the HBSCAN was used. In Figure S5, simulations #6 and #8 appear to have a 4th cell type (coloured in green), but the authors do not mention this result in the text. If cell types are defined by gene expression profiles, then the number of cell types will be dependent on the kind of clustering performed. Clarifying the definition of cell types will help resolve this issue.

      1.2. In relation to the previous question, are the phenotypes used in the evolutionary simulations' steady states of the underlying dynamics?

      1.3. In Figure 3A it seems there are probably two cell types in the boundary region, is that right? Or are the elongated purple and elongated white cells basically the same cell type? Please clarify. If there are two, why did the authors choose to do the transcriptome analysis of the boundary region as one region, and not two subregions, to capture the two cell types?

      1.4. I appreciate the explanation of the GRN pruning in the methods, but could the authors illustrate the network pruning process with an example and show that it works in this example?

      1.5. From the methodological perspective, I suggest further clarifying what is new from this study and what is not. For instance, is the GRN pruning idea new or has it done before? The authors could consider reducing the formalities in the methods of the main text when they are not needed or when they are not new, to facilitate the readability of what is really important and novel in this work, and what is not. E.g., it is not really needed to mathematically define a Voronoi tessellation in the main methods section; this could be simplified or moved to a supplementary methods section. 2. I believe the diffusion term used in Eqs. 14 and 17 does not conserve the total number of protein molecules; could the authors verify that? An example of a correct passive transport term for cell i of protein concentration p_i would be the sum of (p_j-p_i) for all j-cell neighbours, normalized by the area of cell i, or the formulation by Sukumar and Bolander (2003). This is especially important when noise is added, as the non-conservation of the number of proteins can lead to unwanted instabilities. Likely, these effects do not invalidate the results of the paper, but the authors should clarify the reason for their choice or double-check the conclusions using a correct, mass-conserved diffusion term. 3. It is important to facilitate the reproducibility of the results whenever possible, especially given that the computational framework used in this work has great value. I truly appreciate that the authors uploaded the code to a Gitlab. Please add further information in the readme file to facilitate reproducing the results, beyond the information regarding the code installation, whenever possible.

      Minor comments:

      1. What is the reasoning behind the choice of the number of protein species? Why 12? Would the same results hold with a smaller number of proteins? As I imagine that the more species one considers, the more chances one has to get the desired phenotypes (or any desired phenotype for that matter). I could imagine that with 12 or more proteins, one could get more than 3 cell types (as defined by the clustering of their expression profiles). Is there something inherent in the creation of a boundary that leads to only 1 additional cell type and not more? Further simulations would be ideal to address this point, but otherwise, please comment on that if possible.
      2. What is the fundamental difference between Gene profiles I and II in generating cell types? If a cell type is defined by the specific expression of certain genes, then are not Gene Profiles I and II just different sides of the same coin? For instance, Gene profile I is characterized by the expression of a single gene at the boundary. Why do their simulations they do not obtain patterns where 2 genes are expressed in the boundary? Or 3? Or is there a fundamental difference in how these are generated, like the boundary being a stripe of a Turing pattern, or something similar? This also links with the work of Ding et al. and Lu et al.-which the authors mention in the introduction- where they propose that self-organized (Turing) patterns can explain anthocyanin patterning in petals. Could the authors clarify these points and maybe contextualize these results with previous works on petal patterns?
      3. In relation to the previous point regarding the mechanisms underlying boundary formation, the authors could consider whether the theoretical works by the J. Sharpe lab on stripe formation might be relevant to cite (e.g., Cotterell and Sharpe 2010 or Jimenez et al 2015)
      4. If possible, it would be ideal to have at least one video/animation of both the dynamics of each phenotype and the evolution of the phenotypes as their fitness increases, to see the evolutionary trajectories and test whether similar phenotypes can be achieved through different trajectories.
      5. In the Discussion, I believe that the emergence of the novel cell type would benefit from stronger contextualization within known evo-devo frameworks. In particular, the authors describe that a new cell type emerges as a byproduct of the selection of a higher-order developmental process-the bullseye pattern with a clearly defined boundary-rather than through direct selection of the cell type itself. I am confident the authors know these phenomena have been discussed under the term spandrels (Gould & Lewontin, 1979), and have been the subject of extensive study and debate. While identifying traits as spandrels is complicated-largely because in practice we lack reliable frameworks to distinguish them from actual adaptations-the work presented here provides a plausible mechanism of how such features could arise. To me, this fact alone is interesting, as not many works (as far as I know) have addressed this problem explicitly. Maybe the authors want to emphasize this fact as a novelty of their approach. To be clear, I am not suggesting that the authors should adopt a specific terminology; rather, I believe that explicitly invoking the concept of spandrel would resonate with readers familiar with the foundations of evo-devo and would strengthen the main message of the paper.
      6. Some additional considerations related to figures:

      9.1. Please change colours in the figures to be colour-blind whenever possible.

      9.2. The stripes in the striped purple cell shown in Fig. 3A are not seen unless one zooms in on it; would it be possible to represent this differently?

      9.3. In Fig. 5 Aii and Bii, it would be easier for the reader to connect with the statements in the main text if the x-axis is x 1000 or x100 instead of x500

      9.4. Perhaps clarify panel captions of Fig. panels 3C and 3D. Probably I am missing something basic, but I was also wondering how their numbers are connected to the numbers in the panel of Fig. 3F.

      9.5. Why does Fig. 3F have three subpanels? Is it because of different expression levels? Please clarify. 10. Could the authors clarify the choice of using the Stratonovich approach in the stochastic simulations? 11. Note equations are referred to in the text as Eq. S (...) whereas they are not supplementary equations. 12. The code is very large (more than 1GB), and I believe much of the space is used by Voronoi tessellations. If the authors have the time and have the scripts generating the Voronoi tessellations, the authors could add them to the repository and ensure that these tessellations are generated during the simulations whenever needed (but I am aware that code organization takes time). I would recommend having the code also in a repository with a DOI (e.g., Zenodo or OSF).

      Referee cross-commenting

      The comments by other referees are complementary to mine; there are some common aspects with my comments and other important points to look into.

      Significance

      This study provides a plausible explanation of how new cell types can emerge as byproducts of the selection of other processes. This is an important advance in understanding the mechanisms underlying the origin of evolutionary novelties, particularly from the point of view of morphogenesis and patterning, rather than from a more traditional, strictly gene-centric views which focus on changes in specific loci, gene duplications, or neofunctionalization. By highlighting evolutionary novelty as a consequence of higher-order constraints, this work broadens the frameworks through which cellular diversity can be understood.

      I believe most of the limitations of the study are conceptual and regarding improving clarity rather than methodological. For instance, the definition of what a cell type is remains, in my opinion, somewhat vague, especially if the clustering has been performed with only 12 genes. However, I am aware of the conceptual difficulty in defining cell types in general. In addition, the emergence of only a single additional cell type, rather than multiple types, might be a consequence of the limited number of proteins considered. Aside from these issues, the methodology is sound and provides a useful framework for exploring the origin of novel cell types.

      I see this work as being of substantial interest to researchers concerned with the conceptual foundations of evo-devo, particularly those interested in the origins of novelty and in the role of constraints in shaping such novelty. It should also be relevant to studying morphogenesis from a dynamical systems perspective. Finally, this work will be of interest to those investigating the ecological roles of petal patterns, especially in relation to their roles in attracting pollinators or protecting reproductive organs from environmental factors.

      Overall, I think this work represents a very valuable contribution to the evo-devo community, providing conceptual advances into our understanding of the emergence of novelty, as well as providing a complex computational framework addressing cellular patterning in evolving GRNs.

      Field of expertise: developmental biology, nonlinear dynamics, pattern formation, evo-devo.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript presents the findings of a computational investigation, whereby populations of artificial "genomes" and their products are evolved algorithmically. They are subjected to a fitness constraint defined in terms of a spatial expression pattern on a petal shaped template. The specific focus of this work is the formation of two-pigment patterns on flower petals, which give rise to "bullseye" patterned flowers. A computational survey suggests that besides the two main genetic identities which are strictly required to form such patterns, a third population is likely to emerge, as a marker located at the interface between the two main identities. This prediction is then tested by dissecting petals of Hibiscus trionum and performing an mRNA-seq survey. The resulting data set is consistent with the simulations, with a population of genes specifically expressed at the boundary between the two main regions. The paper then discusses a number of hypotheses on the evolution of underlying gene regulatory networks, testing them computationally. In particular, by comparing simulations with and without stochastic terms in the dynamics of gene regulation/expression, it is suggested that the 3rd identity is contributing to robustness of the pattern in the face of noise. Overall the main text is clear and makes an interesting case.

      Major comments:

      1. The code used for simulations is available on a public repository, but it does not directly ensure that results are reproducible. To do so would require a clear step-by-step guide referring the user to the specific pieces of code which have been used for the results and figures presented in the paper. At the moment, I could not find any such guide and the large number of scripts, executables and jupyter notebooks are not clearly linked to the paper's contents.
      2. The methods themselves involve a number of arbitrary choices. Though this is understandable given the nature of the work, one aspect in particular that would deserve better clarity is the modeling of gene network dynamics. The stochastic model (l.516 & following) involves a nesting of "Hill-like" terms (those in Eqs. (7) and (11)) which is unusual and given without justification. There should be some explanation of how this approach relates to standard approaches such as those reviewed e.g. in: Bintu et al. Current opinion in genetics & development 15.2 (2005): 116-124.

      3. It is also unclear at the moment how exactly the GRN dynamics is used; are time-stepping algorithms used until the system reaches a stationary regime? If so, how is stationarity assessed? This needs to be explained both in the main text and in the methods. The table of parameters suggests that there was a cut-off time, but there is no explanation whatsoever about the state of the dynamics at this time.

      4. Related to the previous point, the table of parameters (Table S1) is provided without any explanation; through what process (exploratory, literature review, trial and error...) where the values selected? As there been any type of sensitivity analysis?

      Minor comment:

      1. The fitness function used in simulations specifically encodes the desired pattern, with two zones having differential gene expression. This allows the artificial selection to evolve towards such patterns, as expected, but it is not entirely clear how this relates to natural selection itself. At the very start of the paper, the authors briefly review some possible sources of selective pressure for flowers to exhibit patterns such as bullseye, among others. None of the selective factors would likely act on the plants as a direct incentive for two regions, as specified in the cost function. Instead, one may expect a more high level criterion, such as "conspicuousness" for a pollinator, for instance. This is admittedly not naturally represented as a fitness function, but the choice of this function definitely influences the outcomes of a simulation. Some further numerical experiments may allow to demonstrate that the exact cost function is not critical for the findings of the paper, but I understand they would likely be computationally costly, to the point of unfeasibility. This limitation should be mentioned at least.
      2. [optional suggestion] The number of genes used in the simulations is very small in comparison to real organisms. This is clearly justified by the complexity of the work, but one wonders if simulations could be made more efficient by using a much simplified approach for the gene network dynamics. At the time scales of interest, it seems that the use of SDEs and the numerical intricacies they require might be an unnecessary burden. Have the authors considered a much simpler approach, for instance based on Boolean models? Since the study only uses static tissues, all the GRN dynamics could be by-passed, determining steady states very quickly and using them to determine fitness. If this saved significant computational time, this would allow a more comprehensive survey of the "purely genetic" part of the model.

      Referee cross-commenting

      I agree with both other reviewers. As mentioned by them, our reviews bring complementary suggestions, while being overall in good agreement.

      Significance

      Reviewer's expertise: mathematical modeling, mathematical biology.

      This paper is mostly a conceptual study, in which the majority of results are based on computer simulations. The findings are biologically interesting, but it is hard to prove these evolutionary claims through physical experiments. The complexity of the simulations requires a large number of technical assumptions and parameter choices, which overall make it very difficult to assess how plausible these simulations are, compared to the natural processes they are meant to represent. All the findings are well-argued and provide an overall convincing case, but it is by design impossible to fully assess experimentally. As such, this work will be mostly valuable to theoretical biologists, computational modelers, and researchers interested in "artificial life" and gene evolution.

    1. 1. #include <stdio.h>2 #include <stdlib.h>3 int main(int argc, char *argv[]) {4 printf("code : %p\n", main);5 printf("heap : %p\n", malloc(100e6));6 int x = 3;7 printf("stack: %p\n", &x);8 return x;9 }

      打印的是虚拟地址 main 函数地址 code(代码段) malloc(...) 返回值 堆内存地址 heap 参数是申请的大小(byte) &x 变量地址 stack

    1. Infrastructure Provisioning cd deploy/terraform/aliyun terraform init terraform plan terraform apply Helm Deployment cd deploy/helm helm install aegis-core ./aegis-core \ --namespace aegis \ --create-namespace \ --set image.repository=<acr-registry>/aegis-core \ --set image.tag=lat

      使用Terraform和Helm进行云基础设施部署体现了现代DevOps实践在AI安全平台中的应用。这种基础设施即代码(IaC)方法确保了部署的可重复性和一致性,同时支持阿里云等特定云平台,显示了平台对生产环境的适应性。

    1. Rapport de Synthèse : Le Traitement Judiciaire des Violences Sexuelles Incestueuses

      Ce document synthétise les conclusions et les témoignages de Romane Brisard, journaliste d'investigation, entendue par la commission d'enquête parlementaire sur le traitement judiciaire des violences sexuelles incestueuses.

      Résumé Exécutif

      L'enquête de cinq ans menée par Romane Brisard, basée sur l'analyse de 100 dossiers judiciaires et de nombreux entretiens, révèle un dysfonctionnement systémique qu'elle qualifie d'« Inceste d'État ».

      L'impunité n'est pas la simple conséquence d'un manque de moyens (temps, personnel, budget), mais le résultat d'une acceptabilité sociale de l'inceste et d'une domination masculine persistante au sein des institutions.

      La justice française est confrontée à une chaîne de défaillances allant de la police aux tribunaux civils et pénaux.

      Les chiffres sont éloquents : sur 160 000 enfants victimes de violences sexuelles chaque année, dont 22 000 cas d'inceste paternel, seules 1 707 condamnations ont été prononcées en 2023.

      Ce système aboutit à une inversion de la culpabilité où la « mère protectrice » devient la principale suspecte, tandis que la parole de l'enfant est systématiquement disqualifiée au profit de concepts pseudo-scientifiques comme le syndrome d'aliénation parentale (SAP).

      --------------------------------------------------------------------------------

      I. Un Système de Défaillances Systémiques

      L'analyse de la chaîne pénale met en lumière plusieurs « maillons » défaillants qui empêchent la manifestation de la vérité et la protection des mineurs.

      A. Le Maillon Policier : Des Enquêtes Lacunaires

      • Conditions de recueil de la parole : L'accès à des dispositifs adaptés est inégalitaire.

      La France compte moins de 600 « salles Mélanie » et seulement 2 600 officiers formés au protocole NICHD.

      La qualité de l'audition dépend donc de la « chance » géographique et temporelle.

      • Absence d'investigations matérielles : Sur 100 dossiers étudiés, l'enquête se résume presque exclusivement à l'audition de l'enfant et du père.

      Les perquisitions sont rarissimes (2 cas sur 100), alors que les traces numériques sont quasi systématiques selon les experts.

      • Parole contre parole : En l'absence d'actes d'enquête (visites d'école, auditions de l'entourage), le dossier se réduit à une confrontation verbale, menant inévitablement au classement sans suite.

      B. Le Maillon Médico-Légal : L'Échec de la Preuve par le Corps

      La justice privilégie la trace physique sur la parole.

      Or, les lésions liées à l'inceste sont souvent superficielles et disparaissent en 24 à 72 heures.

      Le délai de révélation par l'enfant rend la récolte de preuves biologiques (sperme) ou physiques souvent impossible, condamnant l'action judiciaire à l'échec dès son initiation.

      C. Statistiques de l'Impunité

      Le tableau suivant illustre l'écart entre la réalité du phénomène et la réponse judiciaire :

      | Donnée | Chiffres Annuels (Estimations) | | --- | --- | | Enfants victimes de violences sexuelles | 160 000 | | Enfants victimes d'inceste paternel (CIVIISE) | 22 000 | | Affaires de violences sexuelles sur mineurs classées sans suite | 73 % | | Condamnations pour infractions sexuelles incestueuses (2023) | 1 707 (soit ~1 % des victimes) |

      --------------------------------------------------------------------------------

      II. La Justice Civile : Le Dogme de la Coparentalité

      La justice civile, par son idéologie de la « coparentalité à tout prix », devient paradoxalement un danger pour l'enfant.

      • Le Sablier Judiciaire : Les procédures s'étirent sur des mois ou des années.

      Les juges aux affaires familiales (JAF) et les juges des enfants se renvoient la responsabilité, attendant souvent une décision pénale qui n'arrive jamais.

      • Protection de l'Institution vs Protection de l'Enfant : La justice semble davantage préoccupée par le risque d'erreur judiciaire envers le père (présomption d'innocence) que par le risque de danger sexuel pour l'enfant (principe de précaution).

      • Le Rôle des Experts : Les magistrats s'appuient sur des experts psychologues souvent non formés, qui déplacent la focale de l'inceste vers la « conflictualité parentale ».

      --------------------------------------------------------------------------------

      III. Le Syndrome d'Aliénation Parentale (SAP) : Un Outil de Silenciation

      Bien que rejeté par l'OMS, l'ONU et la CEDH, le concept de SAP (ou ses déclinaisons : « aliénation maternelle », « manipulation », « mère fusionnelle ») imprègne encore les tribunaux français.

      • Origine : Inventé par Richard Gardner, pédopsychiatre aux positions pro-pédophilie, ce concept prétend que 90 % des allégations d'inceste sont des inventions inculquées par la mère.

      • Conséquence Judiciaire : Lorsqu'un père invoque l'aliénation parentale, la probabilité que les violences signalées par la mère soient reconnues chute drastiquement.

      Aux États-Unis, une étude montre que la reconnaissance des faits tombe à 2 % dans ce cas.

      • Inversion de la Culpabilité : La mère qui tente de protéger son enfant est suspectée de manipulation financière ou de vengeance personnelle.

      Elle devient « l'hystérique » ou la « folle », tandis que le père est perçu comme une victime de fausses accusations.

      --------------------------------------------------------------------------------

      IV. Les « Résistantes » : Mères en Lutte et en Cavale

      L'enquête documente le sort des mères qui, face à l'obligation judiciaire de remettre leur enfant à un agresseur présumé, choisissent la désobéissance.

      • Harcèlement Judiciaire : Ces mères subissent des condamnations pour « non-représentation d'enfant » (jusqu'à 1 an de prison et 15 000 € d'amende).

      Elles sont fichées, photographiées et traitées comme des criminelles.

      • La Cavale : Certaines mères fuient à l'étranger et vivent sous de fausses identités, recherchées par Interpol.

      Elles ne fuient pas la justice, mais une décision judiciaire qu'elles estiment dangereuse pour la survie de leur enfant.

      • L'État de Nécessité : Ces femmes invoquent l'intérêt supérieur de l'enfant (Art. 371 du Code civil) et l'état de nécessité (Art. 122-7 du Code pénal), se considérant « en avance sur la loi » plutôt que hors-la-loi.

      --------------------------------------------------------------------------------

      V. Responsabilités et Obstacles Institutionnels

      A. Manque de Formation

      Le déficit de formation spécialisée est généralisé :

      • Magistrature : À l'ENM, la formation sur les violences intra-familiales est restée longtemps optionnelle ou très succincte (environ 8h de tronc commun).

      De nombreux juges en poste ont été formés à l'époque où le SAP était enseigné comme une vérité scientifique.

      • Surcharge : Les juges des enfants traitent en moyenne 450 à 550 dossiers, rendant impossible une analyse qualitative approfondie.

      B. Le Rôle des Médias

      Le traitement médiatique de l'inceste est jugé insuffisant et tardif :

      • Tabou persistant : Les rédactions sont souvent réticentes à traiter des affaires en cours ou jugent le sujet « trop noir » ou « trop complexe ».

      • Responsabilité : En relayant parfois des thèses comme le SAP sans esprit critique, les médias ont contribué à la pérennité du système.

      • Indifférence : Romane Brisard note l'absence d'écho médiatique immédiat lors de la création de la commission d'enquête parlementaire, soulignant un désintérêt persistant pour le traitement judiciaire de l'inceste.

      --------------------------------------------------------------------------------

      VI. Recommandations Issues des Témoignages

      Pour rompre la mécanique de l'inceste d'État, plusieurs leviers sont identifiés :

      • Généralisation des formations : Rendre obligatoire et systématique la formation au protocole NICHD et aux spécificités des violences incestueuses pour tous les acteurs de la chaîne (police, magistrats, experts, éducateurs).

      • Ordonnance de Sûreté : Mettre en place une protection immédiate de l'enfant dès la mise en cause, sans attendre une mise en examen qui peut prendre des années.

      • Révision de la notion de preuve : Sortir de l'exigence de la « preuve irréfutable » (physique) pour s'appuyer sur un faisceau d'indices (troubles comportementaux de l'enfant, cohérence des propos).

      • Collecte de données : Croiser les statistiques ministérielles pour identifier combien de mères condamnées pour non-représentation d'enfant avaient préalablement dénoncé des faits d'inceste.

      « Il y a un inceste d'État lorsque les institutions par leurs décisions répétées et leur aveuglement persistant rendent possible la continuité des violences et parfois lorsqu'elles les produisent elles-mêmes. » — Romane Brisard

    1. État des lieux de l'instruction judiciaire dans les affaires d'inceste : Défis, preuves et perspectives

      Synthèse

      Ce document de briefing synthétise les témoignages de l'Association Française des Magistrats Instructeurs (AFMI) lors de leur audition par la commission d'enquête sur l'inceste.

      L'analyse met en lumière un paradoxe central : bien que le cadre législatif se soit densifié depuis 2018 pour mieux définir l'inceste et le consentement, la réalité probatoire reste le principal obstacle à la condamnation.

      Avec seulement 1 % des viols et agressions sexuelles sur mineurs aboutissant à une condamnation, les magistrats soulignent que la difficulté ne réside pas dans la loi, mais dans la collecte de preuves matérielles et le manque de moyens structurels.

      L'instruction, décrite comme un "chef d'orchestre" de l'enquête, apparaît comme un outil de qualité supérieure à l'enquête préliminaire, bien qu'elle soit entravée par des délais d'expertise critiques et une surcharge des services spécialisés.

      --------------------------------------------------------------------------------

      1. L'évolution du cadre législatif (2018-2023)

      Les magistrats instructeurs notent une succession de réformes visant à mieux qualifier les infractions incestueuses, tout en soulignant que leur impact sur le taux de condamnation reste limité.

      • Loi de 2018 : Introduction de la notion d'inceste dans le Code pénal.

      Elle est qualifiée d'article "interprétatif" sans aggravation des peines, mais permettant une qualification plus précise des faits (ascendants, oncles/tantes, etc.).

      • Réforme de 2021 : Suppression de l'obligation de prouver la contrainte pour caractériser un viol si les faits sont incestueux.

      Les magistrats précisent toutefois que la contrainte était déjà rarement l'obstacle majeur dans les dossiers impliquant des enfants (la minorité induisant de fait l'absence de consentement).

      • Loi de fin 2023 : Introduction d'une définition légale du consentement.

      L'AFMI reste prudente, car la jurisprudence et la pratique analysaient déjà systématiquement le consentement avant cette inscription législative.

      • Limites de la loi : L'évolution législative aide à "mettre des mots" sur le phénomène et à éveiller les consciences sociales, mais elle n'améliore pas directement la capacité probatoire au quotidien.

      --------------------------------------------------------------------------------

      2. Le cœur de la problématique : La preuve et la "parole contre parole"

      Le juge d'instruction a pour mission de transformer des déclarations en "vérité judiciaire".

      La difficulté majeure réside dans l'absence fréquente d'éléments objectifs.

      La grille d'analyse en sept points

      Pour dépasser l'impasse du "parole contre parole", une méthode rigoureuse est proposée aux magistrats pour étayer les dossiers :

      • Présence sur les lieux : Vérifier si le mis en cause pouvait matériellement se trouver au lieu des faits dénoncés.

      • Circonstances de la révélation : Analyser l'authenticité du récit à travers le contexte où il a été livré (ex: lors d'une hospitalisation ou d'une crise).

      • Raisons de mentir (victime) : Chercher s'il existe un intérêt externe à la plainte (conflit familial, intérêt financier).

      • Incohérences de la victime : Étudier la stabilité du récit, tout en intégrant que la mémoire traumatique peut s'émousser.

      • Incohérences du mis en cause : Analyser avec la même exigence les dénégations et les mensonges potentiels de l'auteur.

      • Personnalité de la victime : Évaluer la crédibilité et la vulnérabilité psychologique.

      • Personnalité de l'auteur : Utiliser des expertises systématiques (psychiatriques et psychologiques) pour évaluer la dangerosité et le profil.

      Les preuves périphériques

      En l'absence de preuves physiques (souvent disparues avec le temps), les magistrats s'appuient sur :

      • Les changements de comportement de la victime.- Les confidences anciennes à des tiers (amis, médecins, enseignants).

      • Les expertises techniques (fichiers pédopornographiques, historiques de recherche dans les téléphones).

      --------------------------------------------------------------------------------

      3. Analyse des classements sans suite et dysfonctionnements

      Le taux de classement sans suite par le parquet est jugé élevé, mais les magistrats récusent toute consigne politique de classement pour surcharge.

      • Causes du non-lieu ou du classement : Principalement l'infraction "insuffisamment caractérisée".

      Le juge d'instruction ne peut renvoyer en procès sur la seule base d'une parole non étayée, sous peine d'être censuré par la Cour de cassation.

      • Le facteur temps : L'ennemi principal de l'enquête.

      Des délais de 2 ans entre la plainte et l'audition par un service spécialisé entraînent un dépérissement des preuves et un découragement des victimes.

      • Le rôle du juge d'instruction : Il peut "sauver" des dossiers classés par le parquet.

      Selon les témoignages, environ un dossier sur deux ouvert après une plainte avec constitution de partie civile (contournant le classement du parquet) aboutit à un renvoi devant une juridiction de jugement.

      --------------------------------------------------------------------------------

      4. Contraintes structurelles et moyens

      Le manque de ressources impacte directement la qualité et la rapidité des investigations.

      | Ressource | État des lieux | Conséquence | | --- | --- | --- | | Services d'enquête | Surchargés, priorisation de la détention provisoire. | Délais de plus d'un an pour une audition "Mélanie". | | Experts psychiatres | Pénurie critique dans certains départements. | Délais de 15 mois pour obtenir un rapport d'expertise. | | Magistrats | Cabinets généralistes avec parfois 100 à 140 dossiers. | Traitement dégradé des dossiers sans détenus. | | Pédopsychiatres | Quasi-absence d'experts formés dans certains ressorts. | Expertises réalisées par des psychiatres pour adultes, parfois inadaptées. |

      --------------------------------------------------------------------------------

      5. Prise en compte de la parole de l'enfant et protection

      La justice évolue vers une meilleure compréhension des mécanismes traumatiques, bien que des points de tension subsistent.

      • Aliénation parentale : Les magistrats interrogés considèrent ce concept comme marginal.

      Sur 250 dossiers, un juge indique ne l'avoir constaté que deux fois.

      Le terme est souvent utilisé pour discréditer les mères protectrices alors qu'il s'agit plutôt d'instrumentalisations rares dans le cadre de conflits de séparation.

      • Victimisation secondaire : La procédure s'efforce de limiter le trauma (ex: confrontations rares en matière d'inceste et soumises à l'accord de la victime).

      • Mécanismes traumatiques : Les notions de sidération et de dissociation sont désormais intégrées dans la formation initiale à l'École Nationale de la Magistrature (ENM), permettant de mieux comprendre le silence ou la réaction tardive des victimes.

      • Besoin de reconnaissance : Pour beaucoup de victimes, l'instruction a une fonction réparatrice.

      Même en cas de non-lieu pour insuffisance de preuves, le fait d'avoir été entendu par un juge et que des actes d'enquête aient été menés est perçu comme une reconnaissance de leur souffrance.

      --------------------------------------------------------------------------------

      6. Conclusions et recommandations des magistrats

      L'audition se conclut sur la nécessité de faire de la lutte contre les violences sexuelles une "cause nationale", au même titre que la lutte contre le narcotrafic.

      • Formation : Bien que non thématiquement obligatoire, la formation continue doit être encouragée pour harmoniser les regards sur l'inceste et éliminer les biais (ex: refus de voir l'inceste commis par des femmes ou sur des sujets masculins).

      • Coopération judiciaire : Renforcer les liens entre le juge d'instruction, le juge des enfants et le juge aux affaires familiales pour assurer une protection effective du mineur pendant toute la durée de l'enquête.

      • Information des victimes : Améliorer l'information sur les voies de recours après un classement sans suite (saisine du doyen des juges d'instruction) afin que les droits des victimes soient pleinement exercés.

    1. Briefing : Traitement judiciaire de l’inceste et des violences sexuelles sur mineurs

      Synthèse de la Commission d'Enquête

      Ce document synthétise les interventions de la professeure Martine Balançon (pédiatre et médecin légiste) et de Madame Mélanie Dupont (psychologue) devant l'Assemblée nationale.

      Leurs témoignages mettent en lumière les failles systémiques, les idées reçues sur la preuve matérielle et la nécessité de placer l'intérêt supérieur de l'enfant au cœur des procédures médico-judiciaires.

      Résumé Exécutif

      • Le paradoxe de la preuve : L'examen clinique est "normal" dans la majorité des cas de violences sexuelles (concept du "it’s normal to be normal").

      La parole de l'enfant, recueillie précocement et selon des protocoles rigoureux (NICHD), constitue l'élément probant le plus fiable.

      • Le trépied de l'intervention : Toute prise en charge doit reposer sur trois piliers indissociables : le soin (intégré), la protection (évaluation du danger immédiat) et le constat (judiciaire).

      • Hétérogénéité territoriale : Il existe une disparité majeure dans la qualité des prises en charge en France, liée à des cultures institutionnelles locales divergentes et à un manque de formation spécialisée.

      • Dysfonctionnements systémiques : Le cloisonnement des procédures (JAF, Juge des Enfants, Pénal) et le secret de l'enquête (Article 11 du code de procédure pénale) entravent souvent la protection et le soin de l'enfant.

      • Critique des concepts controversés : Le "syndrome d'aliénation parentale" est dénoncé comme un outil de disqualification de la parole de l'enfant, sans fondement scientifique.

      --------------------------------------------------------------------------------

      1. Rôles et Missions des Structures Spécialisées (UMJ et UAPED)

      Les unités médico-judiciaires (UMJ) et les unités d’accueil pédiatrique enfants en danger (UAPED) interviennent à différents stades de la procédure.

      • L’UAPED comme espace de protection : Contrairement aux structures adultes, l'UAPED offre un univers pédiatrique protégé.

      Elle permet une évaluation pluridisciplinaire (médecin, psychologue, puéricultrice) pouvant donner lieu à une information préoccupante ou un signalement avant même toute réquisition judiciaire.

      • Le rôle du psychologue : Intervenant pour l'évaluation des conséquences (retentissement) et pour le soin, le psychologue doit souvent "résister" à la temporalité judiciaire pour s'ajuster au rythme de l'enfant.

      • La philosophie du soin inconditionnel : Les intervenants prônent une "présomption de nécessité de soins".

      Le mineur doit être traité comme un sujet de droit et de soins, et non comme un simple objet d'investigation.

      Le Trépied Interventionnel

      | Pilier | Objectif | | --- | --- | | Soin | Réassocier le somatique et le psychique face à la dissociation traumatique. | | Protection | Évaluer si l'enfant est en danger en retournant à son domicile. | | Constat | Recueillir les éléments probants pour l'évaluation judiciaire. |

      --------------------------------------------------------------------------------

      2. La Problématique de la Preuve et de l'Examen Clinique

      L'une des révélations majeures des experts est la déconnexion entre les attentes des magistrats et la réalité médicale.

      • La normalité clinique : Un examen clinique normal n'exclut absolument pas l'existence de faits de nature sexuelle.

      Les lésions sont exceptionnellement précoces dans l'inceste.

      • Déconstruction du mythe de l'hymen : L'atteinte hyménéale est une construction souvent sociétale ou religieuse.

      La pénétration peut être effective sans franchissement de l'hymen (pénétration vulvaire).

      • La primauté de la parole : Le recueil de la parole dans un lieu sécurisé, enregistré et via un protocole structuré est plus probant que les preuves matérielles, souvent absentes dans les situations incestueuses.

      • Consentement et refus : Le refus de l'examen par l'enfant doit être valorisé.

      Il témoigne de sa capacité à redevenir acteur de sa vie après avoir subi une passivité extrême.

      Un enfant qui s'oppose à l'autorité adulte après un traumatisme est un signe de "très bon pronostic" psychologique.

      --------------------------------------------------------------------------------

      3. Obstacles à la Justice et à la Protection

      Dysfonctionnements Judiciaires et Administratifs

      • Cloisonnement des procédures : Un enfant peut être confronté à plusieurs experts différents pour le JAF (Juge des affaires familiales), le Juge des enfants et le pénal, sans aucune coordination.

      • L’Article 11 du Code de procédure pénale : Le secret de l'enquête est souvent utilisé comme un frein au partage d'informations essentielles pour la protection de l'enfant.

      Les experts suggèrent de faire prévaloir l'intérêt supérieur de l'enfant sur ce secret.

      • L’Incapacité Totale de Travail (ITT) : Cet outil est jugé inadapté aux violences chroniques et incestueuses.

      Sa mesure n'est pas reproductible et ne reflète pas l'impact fonctionnel psychique réel, qui peut être permanent.

      La Question du Signalement Médical

      Les experts contestent l'idée que les médecins ne signalent pas.

      • Le rôle du parcours d'aval : Un médecin signale davantage s'il connaît une structure capable de prendre l'enfant en charge (UAPED opérationnelle).

      • Freins au signalement : Outre la peur du contentieux, les professionnels craignent parfois que le signalement n'entraîne un placement dans des structures inadaptées (risques de recrutement prostitutionnel en foyer).

      --------------------------------------------------------------------------------

      4. Analyse Psychologique et Dynamiques Familiales

      La Suggestibilité et le Mensonge

      • Nature de l'enfant : L'enfant est par nature suggestible, mais cela n'en fait pas un menteur.

      • Statistiques du mensonge : La fabulation est extrêmement rare.

      Dans la majorité des cas, les rétractations de l'enfant ne sont pas des mensonges, mais des mécanismes de défense face au "tsunami" familial déclenché par la révélation.

      • Dissociation traumatique : Un enfant peut ne pas nommer le bon auteur ou sembler indifférent à cause de mécanismes de défense cérébraux massifs visant à le protéger d'une réalité insupportable.

      Le Concept d'Aliénation Parentale

      Mélanie Dupont souligne que le concept de "syndrome d'aliénation parentale" n'a aucune base scientifique reconnue.

      • Disqualification : Ce concept sert principalement à discréditer la parole de l'enfant en déplaçant le regard du crime vers le comportement du parent protecteur.

      • Alternative : Il convient de parler de "conflit de protection" : l'enfant se tait ou ment pour préserver sa sécurité physique ou celle de son parent protecteur.

      --------------------------------------------------------------------------------

      5. Recommandations Clés pour l'Évolution des Pratiques

      • Généralisation de la formation : Former les magistrats, policiers et médecins non seulement au psychotrauma, mais à "ce qu'est un enfant".

      • Institutionnalisation des échanges : Créer des espaces de coordination opérationnels entre santé, justice et aide sociale à l'enfance pour éviter les ruptures de parcours.

      • Renforcement des moyens : Pallier la pénurie de pédiatres (8 000 en France) et de médecins légistes (161 en 2022) pour garantir une expertise de qualité sur tout le territoire.

      • Recours à la médiation animale : Développer la présence de chiens d'assistance en UAPED pour sécuriser l'enfant lors des examens et auditions.

      • Unicité de l'expertise : Favoriser une expertise unique couvrant les besoins des différentes juridictions (JAF, JE, Pénal) pour ne pas multiplier les traumatismes de l'enfant.

    1. Briefing : Le Conseil National de l’Ordre des Médecins (CNOM) et la lutte contre l’inceste

      Résumé Exécutif

      Ce document synthétise les points clés de l'audition du Conseil National de l'Ordre des Médecins (CNOM) devant la commission d'enquête sur l'inceste.

      L'institution, représentée par son président, le Pr Stéphane Oustric, et la Dre Christine Louis-Vada, affirme un changement de paradigme majeur depuis juin 2025, placé sous le signe de la « tolérance zéro ».

      Les principaux enseignements sont les suivants :

      • Modernisation Institutionnelle : Création d'une cellule d'intervention (SIJUPE), intégration d'une magistrate du parquet pour acculturer l'Ordre au droit pénal, et mise en place d'une certification périodique des médecins incluant une attestation d'honorabilité.

      • Clarification Juridique Cruciale : L'Ordre distingue radicalement le signalement (acte de protection pouvant inclure des faits non constatés) du certificat médical (constatations factuelles remises au patient).

      Cette confusion est la source principale des poursuites disciplinaires abusives contre les médecins.

      • Protection des Médecins : Le CNOM demande la création de l'article L4124-2-1 du code de la santé publique pour empêcher les agresseurs de poursuivre directement les médecins signalants devant les chambres disciplinaires.

      • Engagement envers les Victimes : Création de commissions « Vigilance, Violence, Victimes » et inclusion des associations de victimes au sein de l'institution pour rompre l'isolement des praticiens et mieux protéger la parole de l'enfant.

      --------------------------------------------------------------------------------

      I. Modernisation et Transformation de l'Institution Ordinale

      Depuis juin 2025, le CNOM a engagé une réforme structurelle visant à renforcer l'action pénale et disciplinaire.

      1. Nouveaux outils de contrôle et de transparence

      • Cellule SIJUPE : Cellule d'intervention au niveau national pour analyser les remontées des conseils départementaux et déterminer les suites pénales ou disciplinaires.

      • Outil Orion : Système de gestion et de traçabilité des plaintes, permettant un suivi horodaté et évitant toute "opacité" ou "rétention" de plainte au niveau départemental.

      • Attestation d'honorabilité : Délivrée tous les 3 ans, elle vérifiera les antécédents (B2, FIJAIS).

      Le CNOM souhaite que cette attestation soit obligatoirement affichée dans les salles d'attente pour rassurer les patients.

      • Certification périodique : Obligation tous les 6 ans de prouver le maintien des compétences et de la formation continue (publiée le 29 décembre).

      2. Acculturation au Droit Pénal

      L'Ordre a recruté une magistrate en détachement (procureure) pour renforcer sa capacité à utiliser l'article 40 du code de procédure pénale.

      • Activité pénale : Passage d'une action quasi nulle à 56 actes juridictionnels (plaintes, constitutions de partie civile) entre juin et décembre 2025.

      • Objectif : Ne plus dépendre uniquement de la presse pour apprendre qu'un médecin est poursuivi pour des faits graves.

      --------------------------------------------------------------------------------

      II. Le Signalement : Pivot de la Protection de l'Enfance

      L'audition souligne une distinction technique fondamentale que tout médecin doit maîtriser pour être protégé.

      | Caractéristique | Signalement / Information Préoccupante (IP) | Certificat Médical | | --- | --- | --- | | Destinataire | Autorités judiciaires (Procureur) ou administratives (CRIP) | Le patient ou son représentant légal | | Contenu | Faits rapportés, soupçons, ou éléments non constatés directement | Uniquement les constatations médicales factuelles | | Règles déontologiques | Dérogations au secret médical (Art. R4127-44) | Rédaction rigoureuse sur constatations (Art. 76) | | Risque disciplinaire | Nul si fait de bonne foi (5 relaxes sur 5 poursuites en 2025) | Élevé en cas de rédaction partiale ou de remise à un tiers |

      Le constat du CNOM : Les agresseurs utilisent souvent les règles strictes du "certificat" pour attaquer des médecins qui ont pourtant effectué un "signalement" valide. L'Ordre insiste : un médecin qui fait un signalement dans les formes n'est jamais condamné par la juridiction disciplinaire.

      --------------------------------------------------------------------------------

      III. Propositions Législatives et Freins Identifiés

      1. La création de l'article L4124-2-1

      Le CNOM appelle la représentation nationale à voter cet article pour instaurer un "filtre" aux poursuites.

      • Principe : Seules les autorités publiques (Ministre, Procureur, ARS) ou l'Ordre lui-même pourraient traduire un médecin signalant devant la chambre disciplinaire.

      • But : Mettre fin aux "procédures bâillons" initiées par les parents agresseurs pour intimider les médecins.

      2. La communication entre la Justice et l'Ordre

      Le CNOM déplore le manque de retour systématique des parquets.

      • Absence de "trou dans la raquette" : L'Ordre réclame une communication automatique des interdictions d'exercer, des contrôles judiciaires, et même des affaires classées sans suite (car un fait non pénal peut rester une faute déontologique).

      • Délais : La justice disciplinaire reste lente (15 à 24 mois en appel), ce qui nécessite une coordination accrue avec les ARS pour des suspensions d'urgence.

      --------------------------------------------------------------------------------

      IV. Problématiques Scientifiques et Expertises

      1. Le concept d'aliénation parentale (SAP)

      • Position : Le syndrome d'aliénation parentale n'est pas reconnu par l'OMS ni par les consensus scientifiques actuels.

      • Responsabilité : L'Ordre estime que l'expert est un "technicien" du juge.

      S'il utilise des concepts obsolètes, il engage sa responsabilité professionnelle.

      La certification périodique permettra de vérifier si un expert maintient ses compétences (ex: psychiatre vs pédopsychiatre).

      2. Formation des praticiens

      • L'accent est mis sur la formation initiale (cycles 2 et 3) avec des mises en situation de compétence (cas pratiques) plutôt que de simples connaissances théoriques.

      • L'Ordre encourage le recours aux "avis sapiteurs" (experts spécialisés) pour les situations complexes de psychiatrie périnatale ou infantile.

      --------------------------------------------------------------------------------

      V. Citations Clés et Déclarations Fortes

      « Il n'y aura pas de retour en arrière. [...] Il y aura une détermination totale parce qu'aujourd'hui c'est la totalité de l'institution qui est embarquée. » — Pr Stéphane Oustric

      « Le signalement et l'information préoccupante ne sont pas soumis aux règles déontologiques qui régissent la rédaction des certificats médicaux. [...] Vous ne serez jamais, jamais, jamais condamné. » — Dre Christine Louis-Vada

      « Quand vous fautez, vous fautez. Il n'y a pas le moindre papier à cigarette qui doit passer aujourd'hui. » — Pr Stéphane Oustric

      « L'enfant est la priorité. Ce n'est même pas négociable. » — Pr Stéphane Oustric

      --------------------------------------------------------------------------------

      VI. Perspectives et Territoires

      • Outre-mer : Création d'une délégation générale aux territoires ultramarins et insulaires pour traiter les taux "phénoménaux" d'inceste (notamment en Guyane et Martinique) et valoriser l'avance de ces territoires sur les problématiques de violence.

      • Horizon 2040 : L'Ordre se prépare à gérer une démographie médicale en forte hausse (500 000 médecins prévus), nécessitant une automatisation des processus de contrôle pour garantir la sécurité des patients.

    1. Rapport sur l’imprescriptibilité des violences sexuelles commises sur les mineurs : Enjeux, constats et recommandations

      Résumé exécutif

      Ce document de synthèse analyse les conclusions de la mission d'information relative à l'imprescriptibilité des violences sexuelles commises sur les mineurs, instaurée en octobre 2025.

      Face à une progression alarmante des violences (hausse de 56 % des personnes mises en cause depuis 2020), le rapport plaide pour une rupture avec les mécanismes classiques de prescription.

      Il souligne que pour les victimes mineures, le silence n'est pas une décision mais une conséquence de mécanismes psychotraumatiques tels que l'amnésie dissociative.

      Le rapport propose principalement de rendre imprescriptibles tous les crimes commis sur les mineurs.

      Cette réforme s'accompagne de recommandations visant à renforcer les moyens d'enquête, à sécuriser le recueil des preuves (notamment numériques) et à placer la victime au centre du processus judiciaire, tout en développant une culture de prévention et de contrôle de l'honorabilité des intervenants auprès de l'enfance.

      --------------------------------------------------------------------------------

      I. État des lieux et urgence de la situation

      Données statistiques alarmantes

      Le rapport met en lumière une réalité brutale concernant l'ampleur des violences et l'efficacité de la réponse pénale actuelle :

      | Indicateur | Chiffre clé | | --- | --- | | Augmentation des mises en cause (depuis 2020) | \+ 56 % pour viol ou agression sexuelle sur mineur | | Plaintes pour faits anciens (plus de 5 ans) | 42 % des victimes de violences intrafamiliales | | Classements sans suite | 70 % des plaintes déposées | | Condamnations criminelles | Seuls 3 % des pétés criminels sont déclarés coupables | | Motif du classement sans suite | 3/4 concernent une infraction insuffisamment caractérisée | | Délai de prescription | Moins de 3 % des classements sont liés à la prescription |

      Évolution historique du droit de la prescription

      Le législateur a déjà opéré plusieurs allongements pour les crimes sexuels sur mineurs :

      • 1989 : Délai de 10 ans.

      • 1998 : Le point de départ du délai est reporté à la majorité de la victime.- 2004 : Allongement à 20 ans.

      • 2018 : Allongement à 30 ans (permettant une dénonciation jusqu'à 48 ans).

      • 2021 : Adoption de la "prescription glissante" (une nouvelle infraction commise par le même auteur prolonge le délai pour un crime ancien non encore prescrit).

      --------------------------------------------------------------------------------

      II. Analyse des obstacles à la justice et mécanismes traumatiques

      La réalité psychologique des victimes

      Le rapport souligne que le temps des victimes ne coïncide pas avec celui de la justice en raison de facteurs spécifiques :

      • Amnésie dissociative : Un mécanisme neurologique imposé qui protège l'enfant mais entraîne une révélation tardive (âge moyen des révélations : 44 ans).

      • Conflit de loyauté : Particulièrement prégnant dans le cadre familial, retardant la libération de la parole.

      • Séquelles graves : Troubles dépressifs, tentatives de suicide, diminution de l'espérance de vie.

      « La prescription a été conçue pour protéger l'accusé, garantir la fiabilité des preuves et assurer la paix sociale. [...] Pour les violences sexuelles sur les enfants, le silence n'est pas une décision : c'est plutôt une prison. »

      Le défi de la preuve

      L'un des principaux arguments contre l'imprescriptibilité est la difficulté de prouver des faits anciens.

      Cependant, le rapport apporte des nuances :

      • Supports numériques : La saisie d'ordinateurs ou de téléphones permet d'accéder à des éléments incriminants anciens.

      • Avancées scientifiques : Le Fichier National Automatisé des Empreintes Génétiques (FNAEG), alimenté depuis 2000, et l'interdiction de détruire les scellés des crimes non élucidés pendant 10 ans après la prescription facilitent les rapprochements futurs.

      • Témoignages : Le temps peut apaiser les témoignages familiaux ou conduire à des aveux tardifs de l'auteur.

      --------------------------------------------------------------------------------

      III. Recommandations majeures et pistes de réforme

      Évolutions juridiques fondamentales

      La mission propose une transformation profonde de l'échelle des peines et des délais :

      • Imprescriptibilité pénale : Pour tous les crimes commis sur les mineurs (pas uniquement sexuels), afin d'affirmer la dignité de l'enfant comme marqueur social.

      • Délais pour les délits : Démarrage de la prescription à la majorité de la victime pour les violences physiques et psychiques (ITT ≤ 8 jours).

      • Crimes sériels : Application de circonstances aggravantes au quantum de peine (pour éviter qu'un auteur de centaines de viols risque la même peine qu'un auteur d'un viol unique).

      • Délit de non-dénonciation : Allongement du délai de prescription à 30 ans après la majorité de la victime pour encourager la responsabilité de l'entourage.

      Réformes de la procédure pénale

      • Codification des enquêtes : Inscrire dans le code de procédure pénale l'obligation pour les parquets d'ouvrir une enquête préliminaire même pour des faits prescrits.

      • Accès au dossier : Transmettre systématiquement à la victime copie de son dossier pour faciliter une action au civil (indemnisation, fond de garantie).

      • Conséquences civiles : Utiliser les ordonnances de non-lieu pour dispenser la victime de l'obligation alimentaire envers l'agresseur ascendant et interdire le droit de visite en tant que grand-parent.

      --------------------------------------------------------------------------------

      IV. Accompagnement des victimes et moyens matériels

      Amélioration des conditions d'enquête

      • Spécialisation : Renforcement des Brigades de Protection des Mineurs (BPM) et des Maisons de Protection des Familles.

      • Auditions protégées : Généralisation du protocole NICHE, utilisation des unités médico-judiciaires (UAPED), et enregistrement audiovisuel pour éviter la répétition des traumatismes.

      • Visioconférence : Autorisation de l'audition par visioconférence pour éviter la confrontation physique avec l'auteur présumé.

      • Accompagnement juridique : Assistance systématique par un administrateur ad hoc (rôle administratif/tuteur) ou un avocat.

      Prise en charge sanitaire

      Le rapport préconise un parcours de soins spécifique :

      • Prise en charge de 20 à 30 séances chez des professionnels formés au psychotraumatisme.

      • Pilotage départemental par un psychiatre référent rattaché au centre régional de psychotraumatisme.

      --------------------------------------------------------------------------------

      V. Prévention et protection systémique

      Le rapport affirme que l'évolution juridique doit s'accompagner d'un changement de culture sociétale :

      • Contrôle de l'honorabilité : Généralisation de l'attestation d'honorabilité pour toute personne (professionnelle ou bénévole) au contact de mineurs, via une plateforme nationale.

      • Éducation : Déploiement effectif des séances d'éducation à la vie affective et relationnelle (EVARS) à l'école avec des horaires fléchés.

      • Notification personnalisée : Remplacer le terme "classement sans suite" par "enregistrement sans poursuite" et notifier les décisions de vive voix par un magistrat pour en expliquer les motivations et éviter la survictimisation.

      --------------------------------------------------------------------------------

      Conclusion des rapporteurs

      La mission conclut que l'instauration de l'imprescriptibilité est un signal fort envoyé à la société : « On ne touche pas à un enfant ».

      Bien que cette mesure puisse entraîner une surcharge judiciaire, elle répond à une exigence de reconnaissance et de dignité pour les victimes, transformant la justice pour qu'elle ne soit plus centrée sur l'auteur, mais sur la protection de l'enfance.

    1. Codeswitching (CS) is defined as the use of two or more language varieties in the same conversation, not counting established borrowed words or phrases. Two general types of structural configurations occur. 1) Intersentential CS, switches for one sentence or many, is generally studied for its social implications (1). 2) Intrasentential or intraclausal CS is more studied for its grammatical configurations (2–4).

      Code switching does not count with established borrowed words or phrases.

    2. CS researchers agree on two points: 1) To engage in CS is largely an unconscious move, and 2) speakers seldom intend a single, specific meaning; potentially ambiguous or multiple meanings are part of the pragmatic message.

      When code switching is used, it is mostly an unconscious move

    3. CS is a means of presenting a particular persona or negotiating interpersonal relationships in a given interaction, making it a major research topic for some sociolinguists and linguistic anthropologists. A starting point is John J. Gumperz’s (1982) notion that CS is one of the possible “contextualization cues” of the speaker’s pragmatic intentions. Also, researchers often mention E. Goffman’s concept of “footing,” and M. Bakhtin’s concept of speakers’ “multiple voices” that are echoes of earlier utterances.

      Code switching presents as a persona or fights interpersonal relationships when a interaction happens

    1. Another important contribution to the literature on structural constraints in code-switching came from the work of Aravind Joshi (1985). He observed that closed class items, e.g., determiners, quantifiers, prepositions, possessive, Aux, Tense, helping verbs. etc., were most recalcitrant to switching
    2. Sridhar and Sridhar thus introduced a conceptualization of code-switching where the participating languages were assumed to have asymmetric roles: the host language provides the constituent structure of the entire code-switched utterance and the guest language provides elements into the host language.

      one language has the constituent structure of codeswitched, and the second language has elements for the first language

    3. The real challenge then was to couch the proposals in generative grammatical theory so that theoretical models of code-switching could be developed, ones that made claims about the bilingual's language competence.

      In order for code switching to develop, the generative grammatical theory had to have proposals

    4. ahootian (1993), for example, uses a computationally-based formalism, the Tree Adjoining Grammar, to account for Farsi-English code-switching. Her analysis, essentially following Pandit's (1990) insight, is based on the assumption that in code-switching the language of the head determines the syntactic properties of its complement.
    5. Myers-Scotton (1993) uses a combination of production mechanisms and aspects of grammatical theory to propose her Matrix Language Frame (MLF) Model of code-switching. The model is built on two central hierarchies: the matrix language (ML) vs. embedded language (EL) distinction and the content vs. system morpheme distinction.

      mechanisms have been made for codeswitching

    6. use the ‘Minimalist’ (Chomsky 1995) technology to account for bilingual code-switching. These accounts presuppose familiarity with the most recent version of Chomskyan syntax and is not easy to summarize in the space available.
    7. Assuming, especially in bilingual language use. ‘language’ to belong to the set of morphological features that needs to be checked off for licit derivation, it follows that code-switching will be disallowed only in those instances where there is a mismatch between a functional head and its complement in the language feature, which yields an illicit derivation.

      codeswitching will not be allowed when a mismatch happens between a functional head and the complement language

    8. cSwan (1999) carefully accounts for Nahuati-Spanish code-switching purely on Minimalist assumptions (Chomsky 1995). A Minimalist account, however, does not block the derivations of (1b) and (2b), since neither the structure-building operations (Merge, Move) nor any Checking (Case, EPP), Computational (Last Resort, Minimal Link Condition), or Economy (Full Interpretation, Procrastinate, Shortest Derivation Condition) principles are violated by the switched items in these examples. The data in (1b), for example, is both LF and PF convergent.
    9. . This framework thus holds promise for a theory of code-switching by recruiting structural constraints proposed for different pairs of languages and allowing them all to interact to account for cross-linguistic generalization.

      This is how code switching allows for languages to interact

    10. Bhatt (1997a, 2001) argues that the constraints offered in the past to express distributional generalizations of code-switching were categorical: their violations lead to illicit structures. Instead of using categorical constraints, a slight adjustment in the theory—from inviolable to ‘violable’ (soft) constraints—yields the relevant generalizations. These soft constraints are violable in just those contexts in which they conflict with a higher ranked constraint. The claim, then, is that a code-switched constituent that violates a particular constraint has its wellformedness ‘reduced’ by a certain amount (cf. also Singh 1985).

      distributional generalizations happen to code switching

    11. Theory, universal, the effect of the presence or nonpresence of a constraint in code-switching is more a matter of its ranking relative to other constraints in a particular bilingual grammar. The cross-linguistic variation in code-switching arises from different constraint-ranking configurations opted for by different bilingual grammars. [Abbreviations: Infl—inflection: Spec specifier.]

      code switching is a matter of placing relative to other types in bilingual communication

    12. udies on structural constraints on code-switching have, over the years, attempted to systematize the linguistically significant generalizations of bilinguals' language use. Although these accounts have become increasingly theoretically sophisticated, insofar as they present optimism for speculations on the bilingual mind design, they are, unfortunately, based on the methodological premise that constraints are infallible.
    13. Although most of the constraints proposed were able to capture the descriptive generalizations of code-switching for specific language pairs, they invariably failed to generalize, because of their structural design, beyond the data-sets for which they were proposed (see Bokamba 1989, Clyne 1987).
    14. Some of the earliest attempts toward structuring code-switching resulted in descriptive generalizations, encoded as language-specific constraints.

      The earliest attempts towards code switching resulted in certain encoded as language

    15. code-switching is automatic, and the fact that fluent bilinguals have fairly consistent judgments on the well-formedness of code-switched sentences (cf. Singh 1985).

      Code switching comes naturally to those who are fluent bilinguals

    16. The research output in the 1980s in the area of structural constraints on code-switching has challenged the claim in the earlier sociolinguistic literature that code-switching is random (e.g., Labov 1971: 457, Lance 1975: 143), and instead put forward the view that code-switching is systematic and rule-governed.
    17. This entry focuses on research that deals with the structural design of code-switching, the knowledge and ability underlying bilinguals' use of two languages within a sentence. This ability known variously as ‘code-mixing’ (see Code-mixing), or ‘intra-senlential code-switching’
    18. One of the most pervasive phenomena of bilingual behavior is code-switching, the ability of bilinguals to switch back and forth between the languages they control,
    19. The observed syntactic differences among languages involved in code-switching will then turn out to be different constraint-ranking configurations opted for by different bilingual grammars.
    1. Claude Opus 4.7 feels like a real step up in intelligence. Code quality is noticeably improved, it's cutting out the meaningless wrapper functions and fallback scaffolding that used to pile up, and fixes its own code as it goes.

      AI在代码质量和自主修复能力上的进步令人印象深刻,特别是能够消除无意义的包装函数和备用脚手架,这表明AI正在从代码生成向真正的软件开发实践转变。

    1. The future of AI-generated products isn't just code — it's code that looks good.

      这一观点令人惊讶地重新定义了AI生成产品的价值主张,从单纯的代码生成转向视觉一致性和品牌合规性。这表明随着AI工具的发展,评估其成功标准正在从功能性转向美学和品牌一致性,反映了设计在AI产品开发中日益增长的重要性。

    2. Heavy users of Claude Code, Codex, Cursor, and Copilot will feel this immediately.

      这一洞见暗示了Figma for Agents与现有AI编程工具的协同效应,表明设计系统与代码生成工具的整合将显著提升开发流程的连贯性。这反映了AI在设计和开发领域融合的更大趋势,以及打破设计与代码之间壁垒的重要性。

    1. The model can reverse-engineer compiled software to detect malware and vulnerabilities without needing source code, aiming to help analysts inspect and secure systems more efficiently.

      能够无需源代码即可逆向编译软件检测恶意代码的能力,展示了AI在网络安全领域的突破性进展。这种技术可能彻底改变安全分析师的工作方式,但也可能被滥用,引发关于AI安全与伦理的深刻思考。

    1. In Messi Legacy repos, low confidence should be flagged early. Better to be transparent than open a bad pull request.

      这一声明展示了Ovren在面对复杂遗留代码时的谨慎态度。在AI编码领域,这是一个令人惊讶的诚实立场——承认AI在处理未记录的遗留代码时可能存在局限性,并优先保证代码质量而非盲目提交,这反映了产品团队对技术负责的成熟思考。

    2. Ovren puts AI frontend and backend engineers on it - they work inside your real codebase, execute scoped tasks, and deliver reviewable code updates.

      这代表了一个令人惊讶的AI工程能力跃迁——从代码建议者转变为实际执行者。这种转变意味着AI不再仅仅是辅助工具,而是可以直接在真实代码库中执行任务并产出可审查的代码更新,这可能是AI在软件开发领域最具颠覆性的应用方向。

    1. M2.7 demonstrates excellent performance in real-world software engineering, including end-to-end project delivery, log analysis for bug hunting, code security, and machine learning tasks.

      这一声明暗示AI模型已经超越了简单的代码生成,能够完成完整的软件开发生命周期,这代表了AI在工程领域应用的重大突破,可能重新定义软件开发的未来模式。

    2. M2.7 demonstrates excellent performance in real-world software engineering, including end-to-end project delivery, log analysis for bug hunting, code security, and machine learning tasks.

      令人惊讶的是:MiniMax M2.7不仅能处理常规编程任务,还能完成端到端的项目交付、日志分析、代码安全检查等复杂软件工程任务,这表明AI已经能够胜任完整的软件开发流程,从编码到安全审计,打破了人们对AI只能辅助编程的固有认知。

    1. humans became the bottleneck, and how Ryan's team shifted from reviewing code directly to building systems, observability, and context that let agents review, fix, and merge work autonomously

      这一洞察揭示了AI开发中的关键转变:人类不再是代码生产者,而是系统架构师和观察者,这重新定义了软件工程中的价值创造方式。

    2. building and shipping an internal beta product with zero manually written code

      这个惊人的实验表明,OpenAI已经能够完全自动化软件开发过程,从代码编写到产品发布,这挑战了传统软件工程的基本假设,暗示了人类程序员可能正在被边缘化。

    3. Over the past five months, they ran an extreme experiment: building and shipping an internal beta product with zero manually written code.

      令人惊讶的是:OpenAI的一个团队竟然在五个月内完全依靠AI生成了超过一百万行代码,没有任何人工编写或审查的代码,这种极端的实验展示了AI在软件开发中的惊人能力,彻底颠覆了传统的软件工程模式。

    1. MiniMax handed an internal version of M2.7 a programming scaffold and let it run unsupervised. Over 100 rounds it analyzed its own failures, modified its own code, ran evaluations, and decided what to keep and what to revert.

      这是一个惊人的自进化系统,AI模型能够自主分析失败、修改代码并评估结果,实现了30%的性能提升而无需人工干预。这种自我迭代的模式代表了AI开发范式的重大转变,暗示未来AI可能能够自主优化和改进自身架构,减少对人类专家的依赖。

    2. MiniMax handed an internal version of M2.7 a programming scaffold and let it run unsupervised. Over 100 rounds it analyzed its own failures, modified its own code, ran evaluations, and decided what to keep and what to revert.

      令人惊讶的是:AI模型能够自主进行代码修改和自我优化,这代表了人工智能自主性的一大突破。M2.7模型不仅能够分析自己的失败,还能自主决定哪些代码更改保留,哪些回退,这种自我进化的能力打破了传统AI开发模式,展示了AI系统自我改进的潜力。

    1. Sage sends URLs and package hashes to Gen Digital reputation APIs. File content, commands, and source code stay local.

      这个隐私声明揭示了Sage的数据处理策略,采用了最小化数据传输的设计哲学。这种平衡安全与隐私的做法很有洞察力,表明开发者理解用户对数据泄露的担忧,同时认识到某些云端分析对于有效威胁检测的必要性。

    2. Sage intercepts tool calls (Bash commands, URL fetches, file writes) via hook systems in Claude Code, Cursor / VS Code, OpenClaw, and OpenCode, and checks them against:

      这个声明揭示了Sage的核心创新点——它通过多种平台的hook系统拦截并检查AI代理的工具调用,形成了一个跨平台的防护层。这种多平台集成能力令人印象深刻,表明它能够覆盖当前主流的AI开发环境,为用户提供统一的安全保障。

    3. Sage sends URLs and package hashes to Gen Digital reputation APIs. File content, commands, and source code stay local.

      令人惊讶的是:Sage 采用了一种平衡隐私和安全的方法,只将URL和包哈希发送到云端进行声誉检查,而文件内容、命令和源代码则保留在本地。这种设计既提供了实时的威胁检测,又保护了用户的敏感数据,反映了现代安全工具对隐私保护的重视。

    4. Sage intercepts tool calls (Bash commands, URL fetches, file writes) via hook systems in Claude Code, Cursor / VS Code, OpenClaw, and OpenCode, and checks them against:

      令人惊讶的是:Sage 不仅是一个简单的安全工具,而是一个复杂的拦截系统,能够监控和检查多种AI代理平台上的工具调用。这种跨平台的集成能力展示了AI安全领域的复杂性和创新性,用户可能没有意识到他们的AI代理正在被如此全面地监控和保护。

    1. Gemini Robotics-ER 1.6 achieves its highly accurate instrument readings by using agentic vision, which combines visual reasoning with code execution. The model takes intermediate steps: first zooming into an image to get a better read of small details in a gauge, then using pointing and code execution to estimate proportions and intervals and get an accurate reading.

      这一描述揭示了AI如何通过多步骤推理解决复杂问题,展示了模型在处理精细视觉任务时的创新方法。将视觉推理与代码执行相结合的能力代表了AI系统向更接近人类认知方式的方向发展,这种混合方法可能成为未来AI解决复杂物理任务的标准范式。

    1. Your AI agent writes every change into source code.

      这一功能暗示了一种全新的开发范式,设计师的视觉编辑可以直接转化为生产级代码。这可能会显著减少前端开发中的手动编码工作,但也引发了关于AI生成代码质量和可维护性的重要问题。

    2. Design by hand. Code by agent.

      这一声明代表了设计工作流程的革命性转变,将人类创意与AI执行能力无缝结合。这种模式可能重新定义设计师与开发者之间的协作方式,让设计师专注于创意决策,而将代码实现交给AI代理。

    1. Routines run autonomously as full Claude Code cloud sessions: there is no permission-mode picker and no approval prompts during a run.

      这是一个令人惊讶的自主性声明,表明Routines可以在没有人工干预的情况下执行完整的工作流程。这种高度的自主性代表了AI自动化工具的一个重要里程碑,但也引发了对安全和控制的深刻思考,特别是在企业环境中。

    2. A routine is a saved Claude Code configuration: a prompt, one or more repositories, and a set of connectors, packaged once and run automatically.

      这个定义揭示了Routines的核心创新点:它将Claude Code的能力封装成可重用的自动化单元,结合了提示、代码库和外部连接器。这种封装方式代表了AI辅助开发的一个重要进步,使AI能力能够被系统化地集成到工作流程中。

    3. Routines are in research preview. Behavior, limits, and the API surface may change.

      这是一个令人惊讶的声明,表明Claude Code的Routines功能仍处于研究阶段,意味着用户在使用时可能会遇到不稳定性和API变化。这暗示了Anthropic正在快速迭代这个功能,但也提醒用户不要在生产环境中过度依赖它。

    4. A routine is a saved Claude Code configuration: a prompt, one or more repositories, and a set of connectors, packaged once and run automatically.

      令人惊讶的是:Routines 实际上是预配置的 Claude Code 会话,将提示、存储库和连接器打包在一起,可以自动运行。这种设计使得复杂的自动化任务可以被封装和重用,而不需要每次都重新配置环境。

    5. Routines are in research preview. Behavior, limits, and the API surface may change.

      令人惊讶的是:Claude Code 的 Routines 功能目前仍处于研究预览阶段,这意味着用户使用的功能可能会在未来发生重大变化。这种状态表明 Anthropic 仍在测试和完善这一自动化工作流程的功能,用户应预期到可能的不稳定性和API变更。

    1. The standard autoresearch loop (brainstorm from code, run experiments, check metrics) works when the optimization surface is visible in the source. The Liquid results prove that. But for problems where the codebase doesn't contain enough information to generate good hypotheses, giving the agent access to papers and competing implementations changes what it tries.

      这一声明清晰地区分了两种优化场景:代码可见的优化和需要外部知识的优化。它揭示了AI代理开发中的一个关键洞察:优化方法必须根据问题性质进行调整。对于某些问题,简单的代码分析就足够了;但对于更复杂的问题,需要引入外部知识和研究。这一发现对AI辅助编程系统的设计具有重要指导意义。

    2. Without experience with compiler behavior, the agent couldn't have predicted which 'optimizations' the compiler would already handle.

      这一观察揭示了AI代理在编译优化方面的局限性:代理无法准确预测编译器已经自动处理的优化。这表明AI代理需要更深入理解编译器行为和现代编译技术,以避免徒劳的优化尝试。这一发现对AI辅助编程系统的发展具有重要启示,强调了领域知识整合的重要性。

    3. Studying forks and other backends was more productive than searching arxiv. ik_llama.cpp and the CUDA backend directly informed two of the five final optimizations.

      这是一个令人惊讶的发现,表明实践中的代码实现比学术论文更能直接指导优化工作。代理通过研究实际项目分支和不同后端实现获得了更有价值的见解,而不是依赖理论研究。这强调了在AI代理开发中,实践经验和现有实现的重要性可能超过理论文献。

    4. Coding agents working from code alone generate shallow hypotheses. Adding a research phase — arxiv papers, competing forks, other backends — produced 5 kernel fusions that made llama.cpp CPU inference 15% faster.

      这一声明揭示了AI代理在代码优化中的关键局限:仅基于代码的优化会产生浅显的假设。通过引入研究阶段,包括阅读学术论文、研究竞争项目和后端实现,代理能够发现更深层次的优化机会,实现了显著的性能提升。这表明AI代理需要更广泛的上下文信息才能做出有意义的创新。

    1. In many ways, coding represents the ideal use case for AI, both in terms of what the technology can do and how readily the enterprise market will embrace it. Code is data dense, meaning there is a massive amount of high-quality code available online for the models to train on.

      编程被视为AI的理想应用场景,这揭示了AI成功应用的关键要素:高质量训练数据可用性、任务结构化程度、输出可验证性。这一洞见不仅解释了为什么编程辅助工具率先取得突破,也为其他领域的AI应用提供了成功模式参考,暗示未来AI在其他数据丰富、结构化程度高的领域可能取得类似成功。

    1. We are building a world where machines write the code, machines choose the dependencies, and machines ship the updates. The AI agents are building the software. If we don't secure the supply chain they rely on, the AI agents are cooked.

      这句话揭示了AI时代软件安全的根本挑战:当AI系统自主编写、选择和部署代码时,它们的安全性与依赖的供应链安全直接相关。如果我们不能保护这个供应链,AI系统本身就会成为恶意软件的载体,这是一个令人深思的悖论。

    2. Within eight days, the same campaign had cascaded from GitHub Actions to Docker Hub, npm, PyPI, and the VS Code extension marketplace. With just one token across five ecosystems, thousands of organizations were potentially impacted.

      这个跨生态系统攻击的速度和范围令人恐惧,展示了现代软件供应链的脆弱性。一个被窃取的凭证就能在多个生态系统间快速传播,这种级联效应使防御变得极其困难。

    1. Reviewer #1 (Public review):

      Summary:

      This study identifies a conserved phosphorylation event on Hsp70, at human T495 that is triggered by DNA damage. The authors show that this modification arises in response to MMS and is temporally associated with cell cycle progression through mitosis. Using biochemical analysis, they further argue that the phosphomimetic Hsc70(T495E) adopts an open-like conformation with impaired J protein-stimulated ATP hydrolysis while still retaining client binding. In yeast, both phosphomimetic and phosphonull mutants perturb growth and cell cycle progression, supporting the idea that dynamic regulation of this site helps coordinate DNA damage responses with G1/S control.

      Strengths:

      A major strength of the paper is that it links prior work on Legionella-mediated Hsp70 phosphorylation to a normal cellular DNA damage response. The study is also commendably multi-level, combining mammalian cell biology, in vitro biochemistry, and yeast genetics to support the central model. Together, the authors provide a coherent story that this Hsp70 site has functional importance in checkpoint-like control rather than being a passive phosphosite, adding to our understanding of the chaperone code.

      Minor Weaknesses:

      The authors acknowledge that the direct kinases/phosphatases for this site remain unknown. Some conclusions are therefore still somewhat inferential, especially the model that pHsp70 acts as a reversible molecular brake on S-phase entry. These limitations do not undermine the importance of these exciting findings, but they do leave the paper somewhat short of a fully resolved mechanism.

      Comments on revisions:

      The authors have done a great job in addressing all the previous reviewer concerns. They have provided additional data and refined the text, stating limitations of their proposed model. In doing so, they have produced a much-improved version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public review:

      Reviewer #1 (Public review):

      Summary:

      Badarnee and colleagues analyse fMRI data collected during an associative threatlearning task. They find evidence for parallel processes mediated by the mediodorsal, LGn, and pulvinar nuclei of the thalamus. The evidence for these conclusions is promising, but limited by a lack of clarity regarding the preprocessing and statistical methods.

      Strengths:

      The approach is inventive and novel, providing information about thalamocortical interactions that are scant in the current literature.

      Weaknesses:

      (1) There are not sufficient details present to allow for the direct interrogation of the methods used in the study.

      We thank the reviewer for this comment. We have added more detailed information about the methods to clarify our procedure. In addition to the original description of our threat learning paradigm in humans, we included the following to page 39-40:

      “Experimental procedure

      Threat learning: Please see the original description in the manuscript.

      Shock level: The shock intensity used in the fear learning paradigm was determined during a preexperiment calibration. Electrodes were attached to the participant’s right hand, and stimulation began at a low level (0.1 mA), gradually increasing in small increments. After each increment, participants verbally rated their discomfort. The procedure continued until the participant identified a level they described as “highly annoying but not painful.” This individualized intensity was then used for that participant throughout the experiment. For safety and ethical reasons, the maximum intensity was capped at 20 mA, and no participant received a shock above this limit.

      Instructions to the participants: Each visual stimulus in our paradigm was first shown to participants for 6 seconds. This initial presentation served as habituation, allowing us to isolate the responses to genuinely new stimuli. Before the experiment began, participants were informed that they would see pictures illuminated with different colored lights, such as red or blue. During the experiment, some pictures might be paired with an electric shock, while others might not. Participants were instructed to pay attention to whether a specific color or pattern was associated with the shock. These instructions were adopted from previous studies in which our group developed this paradigm and found them highly effective for human learning. We therefore used the same approach in the current experiment. These instructions were provided throughout all phases of threat learning, and participants were informed that any shocks delivered would be at the same intensity determined on Day 1.”

      (2) The figures do not contain sufficiently granular details, making it challenging to determine whether the observed effects were robust to individual differences.

      We thank the reviewer for this suggestion. We agree that visualizations exposing the full data distribution can be highly informative, and we therefore present distribution-based plots for several analyses (e.g., connectivity results in Figure 7). However, for the activation analyses, our primary goal was to highlight trial-to-trial changes and overall patterns across thalamic nuclei, rather than the distribution of individual data points per se. For this purpose, bar plots with standard errors provide a clearer representation of the directional effects and facilitate comparison across trials and conditions.

      Reviewer #2 (Public review):

      Summary:

      The authors quantify human fMRI BOLD responses in pulvinar and mediodorsal thalamic nuclei during a fear conditioning and extinction task across two days, in a large sample size (hundreds of participants). They show that the BOLD responses in these areas differentiate the conditioned (CS+) and safety (CS-) stimuli. Additionally, this changes with repeated trials, which could be a neural correlate of fear learning. They show that the anterior pulvinar is most correlated with the MD, and that this is not due to anatomical proximity. They perform graph analysis on the pulvinar subnuclei, which suggests that the medial pulvinar is a hub between the sensory (lateral/inferior) and associative (anterior) pulvinar. They show different patterns of thalamic activity across conditioning, extinction, recall, and renewal.

      Strengths:

      The data has a large sample size (n=293 in some measures, n=412 in others). This is a validated human fear conditioning/extinction task that Dr Milad's group has been working with for several years. Few labs have investigated the thalamus activity during fear conditioning and extinction, particularly with a large sample size. There is an independent replication of the pulvinar network structure (Figure 3), which suggests that the processing in the more sensory-related inferior and lateral pulvinar is relayed to the anterior pulvinar (and possibly thereby to more action-related prefrontal areas) via an intermediate step in the medial pulvinar - potentially a novel discovery, but that needs more validation.

      Weaknesses:

      (1) The authors cannot make causal claims about their results based on correlational neuroimaging evidence. Causal claims should be pared back. E.g., sentence 1 in the Results section: "The anterior pulvinar and MD contribute to early associative threat learning, as evidenced by increased functional activation in response to CS+ compared to CS- at the block level (Fig. 1b-c)." needs to be reworded to something like "The anterior pulvinar and MD have increased functional activation... This suggests that these areas may contribute to early associate threat learning."

      We acknowledge the limitations of fMRI studies and agree with the reviewer that causal claims cannot be made based on correlational neuroimaging evidence. Accordingly, we revised the text to reduce causal interpretations. Specifically, we reworded the sentence identified by the reviewer in the Results section and systematically updated language throughout the manuscript.

      Page 9: “At the block level, both the anterior pulvinar and MD showed increased activation to CS+ vs. CS− (anterior pulvinar: t<sub>(292)</sub> = 4.41, p = 0.00001, d = 0.25; MD: t<sub>(292)</sub> = 6.41, p = 5.83x10<sup>-10</sup>, d = 0.37; Fig. 1b–c), suggesting a possible involvement of these regions in early associative threat learning.”

      Throughout the manuscript, we replaced terms such as “reflects” with “likely reflects” and “indicating” with “consistent with,” and introduced explicitly correlational phrasing where appropriate (e.g., “apparently,” “closely align,” and “seems to”). All revisions are highlighted in green in the revised manuscript.

      (2) Figure 1: The fact that the difference in BOLD activity between CS+ and CS- goes away on the third trial is not addressed. This is a very large effect in the data.

      We thank the reviewer for highlighting this important pattern in Trial 3. The CS+ vs. CS− contrast in the third trial in the mediodorsal thalamus remained statistically significant after FDR correction and was correctly reported in the Supplementary Tables. However, we acknowledge that the statistical marker was inadvertently omitted from Figure 1. We have now corrected the figure to include the appropriate significance annotation.

      In addition, we now explicitly describe the attenuation of the CS+ vs. CS− difference by the third trial in the mediodorsal thalamus but not in the pulvinar (page 32):

      “This suggested rapid initial acquisition of the predictive value of the CS+ is thought to be pronounced during the first two trials. The attenuated CS+ vs. CS− differentiation on the third trial specifically in the pulvinar may reflect a decreased requirement for differential thalamic engagement once the initial association has been acquired, or an initial survival fear reaction is expressed. Notably, because the MD sustained the BOLD response to the CS+ in the third trial which may indicate involvement of this nucleus in the consolidation or stabilization of the learned association. This aligns with the wellestablished MD-PFC circuit involved in cognitive processes (Wolff and Halassa, 2024). Additionally, in a previous study using a similar paradigm, we observed sustained CS+ vs. CS− differentiation on the third trial in the nucleus reuniens, as well (Tuna et al., 2025). These findings suggest that trialdependent learning dynamics may vary across thalamic nuclei rather than reflecting a uniform thalamic learning signal. Together, while our paradigm does not inherently distinguish between different stages of learning, such as early acquisition and stabilization, our findings are consistent with stronger associative learning–related engagement during the first two trials, with a reduced differential response by the third trial that may reflect the involvement of different neural processes”.

      (3) Figure 3: Could the observed network structure be due to anatomical proximity? Perhaps the authors should do an analogous analysis to what they did in Figure 2 for this intra-pulvinar analysis. This analysis doesn't take into account the indirect connections through corticothalamic and thalamocortical connections with the visual cortex and the pulvinar. There is an implicit assumption that there are interconnections between the pulvinar subnuclei, but there are few strong excitatory projections between these subnuclei to my knowledge. If visual areas are included in the graph, it would make things more complex, but would probably dramatically change the story. In this way, the message is somewhat constructed or arbitrary.

      We thank the reviewer for this insightful comment. We agree that the network analysis in Figure 3 does not provide a direct anatomical account of pulvinar connectivity and cannot distinguish between direct inter-nuclear interactions and indirect coupling mediated via corticothalamic and thalamocortical pathways, including visual cortex.

      Our intention with this analysis was to characterize functional statistical dependencies among pulvinar divisions during conditioning, rather than to infer monosynaptic anatomical connectivity. Accordingly, the observed network structure should not be interpreted as evidence for direct excitatory projections between pulvinar subnuclei.

      We agree that including visual cortical regions in the network would substantially increase model complexity and could alter the inferred network structure. However, doing so would require a trial-wise, multiregional modeling framework that goes beyond the scope of the present intra-pulvinar analysis.

      In response to this comment, we have now explicitly clarified the assumptions, interpretational limits, and alternative explanations of the network model in the Discussion (page 33):

      “Yet, these intrapulvinar relationships should be understood as a functional and computational model, reflecting statistical dependencies among pulvinar divisions during threat learning, rather than as evidence of direct monosynaptic anatomical connections. Because detailed inter-nuclear anatomical connectivity within the pulvinar remains incompletely characterized, our analysis does not presuppose strong direct excitatory projections between subnuclei. Instead, our findings are intended to highlight candidate functional relationships within the pulvinar during conditioning with different level of data processing, rather than to provide a definitive anatomical map.”

      We also included the following in the Limitations and Future Directions section (page 36):

      “The observed relationships among pulvinar divisions during conditioning are purely functional and do not distinguish direct inter-nuclear interactions from indirect coupling mediated by corticothalamic and thalamocortical pathways, including visual cortical regions. Thus, the pulvinar model may reflect indirect cortical loops, weak or currently undocumented inter-nuclear interactions, or a combination of both.”

      Finally, we added this note to the legend of Fig. 3:

      “Note: The functional relationships among pulvinar divisions during threat learning should be interpreted as computational dependencies derived from statistical associations. These effects may reflect indirect interactions mediated by corticothalamic and thalamocortical pathways (e.g., via visual cortex), rather than direct inter-nuclear connectivity. Elucidating the underlying anatomical mechanisms will require future studies.”

      (3) In the results section describing Figures 4-7, there are no statistics supporting the claims made. There needs to be a set of graphs comparing the results across the study sessions and days, with statistical comparisons between the different experiments to confirm differences.

      We thank the reviewer for this suggestion. In this study, each phase (conditioning, extinction, recall, and renewal) was analyzed separately to characterize thalamic function within that specific phase. Our primary conclusions focus on differences between CS+ and CS− within each phase, rather than comparisons across phases or sessions. Direct statistical comparisons across phases were therefore not performed, as they fall outside the scope of our main hypotheses.

      We have clarified this in the revised manuscript to make the rationale for our analytic approach explicit. Added to page 8:

      “The purpose of this study is to investigate thalamic function during each learning phase separately, focusing on CS+ vs. CS− differences within phases rather than comparing activation across phases. This phase-specific approach allows us to characterize thalamic functional dynamics within each stage of learning and memory, avoiding potential confounds arising from the distinct processes of conditioning, extinction, and recall.”

      (4) Figure 7 does not include the major corticothalamic and thalamocortical projections from early, mid-level, and higher visual cortex to the different pulvinar nuclei. I doubt that there are strong direct projections between the pulvinar nuclei; rather, the functional connections are probably mediated through interconnections with cortical visual areas.

      We thank the reviewer for this point. Reciprocal connections between the visual cortex and the pulvinar are established, but the precise projections to specific pulvinar divisions remain unknown. We have added a note to the Figure 8a caption to clarify this (Figure 7a in the original version).

      “Note (panel a): Known pulvinar–cortical connections, as well as sensory input pathways (e.g., visual inputs via the retina/LGN and nociceptive inputs via the spinothalamic tract), are not explicitly shown. These connections are well established anatomically but were omitted due to their heterogeneity and incomplete characterization at the level of pulvinar subnuclei. Their absence should not be interpreted as a lack of anatomical or functional relevance.”

      (5) Stylistic: There are a lot of hypotheses and interpretations presented in this primary literature paper, which may be better suited for a review or perspective piece.

      We thank the reviewer for this comment. We aimed to integrate our empirical findings within a broader conceptual framework to provide a complementary narrative, rather than presenting isolated observations without connecting them to theoretical context. This approach is intended to strengthen the interpretive value of the study while remaining grounded in primary data.

      (6) In the discussion, there is an assumption that the fMRI BOLD responses to CS+ and CS- need to be different to indicate that an area is processing these distinctly, but the BOLD signal can only detect large-scale changes in overall activity. It's easy to imagine that an area could be involved in processing these two stimuli distinctly without showing an overall difference in the gross amount of activity.

      We thank the reviewer for raising this important point. We fully agree that the fMRI BOLD signal reflects large-scale changes in population activity and may fail to capture more subtle or distributed neural representations. Accordingly, the absence of a CS+ vs. CS− BOLD difference should not be interpreted as evidence that a region is not involved in discriminating these stimuli. Rather, our inferences are limited to differences in aggregate activation at the spatial and temporal resolution of fMRI.

      To partially address this limitation, we analyzed anatomically defined thalamic subregions; however, we acknowledge that finer-scale subdivisions and cell-type– specific processing likely exist that are not currently resolvable in human fMRI. Such distinctions may be better investigated using invasive recordings or circuit-level approaches in rodents or non-human primates. This limitation has now been explicitly acknowledged in the Limitations section of the manuscript (page 36):

      “Pulvinar divisions, MD, and LGN each contain diverse neuron subtypes and finer anatomical subdivisions that may serve distinct functions. Importantly, the absence of CS+ vs. CS− differences in BOLD activity should not be interpreted as a lack of stimulus-specific processing, as such distinctions may occur without changes in overall activation detectable by fMRI…”

      (7) There is strong evidence that the BOLD responses to the threat-related and safetyrelated stimuli are different, modest evidence for their claims of learning/plasticity in these pathways, and circumstantial evidence supporting their hypothesized graph network models. Overall, most of the claims made in the discussion are better considered possible interpretations rather than proven findings - this is not a criticism, as these experiments and subject matter are extremely complex.

      We thank the reviewer for this constructive suggestion. In response, we have revised the discussion to present our interpretations as possible or plausible explanations, rather than definitive conclusions, to better reflect the strength of the current evidence. The changes are marked in green throughout the Discussion section.

      This study continues to validate the power and utility of this in human fear conditioning/extinction paradigm, and extends this paradigm to investigating fear learning beyond the traditional limbic system pathways. It's possible that their models for the pulvinar nuclei interconnections could guide future neuromodulation or DBS studies that could provide more causal evidence for their hypotheses.

      Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, they examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. Their goal was to uncover whether distinct thalamic systems support different modes of learningautomatic survival mechanisms versus more deliberate processes - and to propose a hierarchical pulvinar model of fear conditioning. They also try to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) Several weaknesses should be pointed out regarding how fMRI data were collected, as well as decisions regarding how the fMRI data were preprocessed and analyzed:

      (a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.

      We thank the reviewer for raising this point. While the spatial resolution of fMRI (3 mm isotropic) does limit voxel-wise examination of very small nuclei, our analyses were not performed at the single-voxel level. Instead, signals were extracted using anatomically defined masks for each thalamic nucleus, which is a standard and widely used approach for studying small subcortical structures with fMRI. This strategy increases signal-to-noise ratio and mitigates partial-volume effects by aggregating activity across voxels belonging to the same anatomical region.

      (b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using a probabilistic thalamic atlas that better adapts to each subject's brain and thalamic nuclei (see, for instance, Iglesias et al., 2018). This would have been ideal and would have given the authors more precision, especially considering the low resolution of the fMRI data and the size of the thalamic nuclei of interest.

      We thank the reviewer for pointing out the availability of specialized thalamic atlases. In our study we used the Automated Anatomical Labelling Atlas 3 (AAL3 atlas), which includes thalamic subdivisions (including pulvinar and other nuclei) among its 150+ whole-brain regions and is widely used for ROI extraction in normalized fMRI analyses. This choice allowed us to define consistent ROIs across the entire brain such as the amygdala and hippocampus within the same parcellation framework and to extract functional signals at the resolution of our preprocessed fMRI data.

      While histology-informed probabilistic atlases offer finer microanatomical segmentation of the thalamus, they are implemented primarily for structural segmentation pipelines (e.g., FreeSurfer) and do not change the fact that AAL3’s thalamic subdivisions are established and anatomically reasonable ROIs for functional studies at standard fMRI resolutions. AAL3 thus provides a practical and valid choice for our whole-brain activation and connectivity analyses.

      (c) On top of the two previous points, the authors decided to smooth the data to 6mm, which means that every single voxel within these small nuclei was blurred/mixed with the 2 immediately contiguous voxels (if they followed the standard SPM12 normalization resampling default which resamples, or upsamples the data in this case, to 2 x 2 x 2mm). Given the strong changes in structural connectivity and function that can occur, especially in the thalamus, on voxels of this size, this and the previous 2 decisions do not favor anatomical precision.

      We thank the reviewer for raising this concern regarding anatomical precision. The data were resampled to 2 × 2 × 2 mm resolution in SPM12, and a 6 mm FWHM Gaussian smoothing kernel was applied. Gaussian smoothing does not uniformly mix immediately adjacent voxels; rather, it applies distance-weighted averaging with a standard deviation of approximately 2.55 mm (FWHM = 2.355σ). At 2 mm resolution, this corresponds to ~1.3 voxels, meaning that signal contribution decreases smoothly with spatial distance rather than reflecting simple voxel averaging. Moreover, all statistical analyses were conducted at the ROI level using anatomically defined masks, rather than voxel-wise inference within nuclei.

      To empirically assess whether smoothing may have introduced boundary-driven spillover effects, we divided the mediodorsal (MD) thalamus into medial and lateral divisions and examined the CS effect separately in each. The CS effect did not differ between subdivisions (MD subdivision X CS interaction: F<sub>(1, 292)</sub> = 0.50, p = 0.48).

      Additionally, across trials, the CS+ vs. CS− effect was observed in both subdivisions and showed comparable magnitudes (see Author response image 1). The effect sizes were also comparable across MD divisions as presented in Author response table 1).

      Author response image 1.

      Mean activation in MD subdivisions during threat learning

      Author response table 1.

      Point estimates and 95% confidence intervals of effect sizes (Cohen’s d) for CS+ vs. CS− contrasts in MD, MDm, and MDl During Early Threat Learning

      If smoothing had artificially driven the MD effect via boundary spillover, one would expect consistent asymmetry or substantially larger effects in one subdivision relative to the other. Instead, the CS effect was distributed across both medial and lateral MD, supporting the interpretation that the observed activation reflects intrinsic MD signal rather than smoothing-related contamination.

      (d) Motion during scanning was poorly controlled in the preprocessing. Including the motion parameters as covariates of no interest in the GLM does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

      Our analyses are within-subject, so each participant serves as their own control, minimizing the impact of motion differences across conditions. Functional data were preprocessed with fMRIPrep 20.0.2, which estimates motion parameters. The motion estimations are included in the GLM to account for residual motion-related variance in SPM12. The connectivity analyses were conducted in CONN, which also includes these motion parameters as regressors and applies additional denoising steps to further reduce motion-related effects. Together, these procedures make it highly unlikely that motion systematically influenced the observed condition differences.

      (2) It is not clearly indicated in the manuscript how many subjects and how many trials went into each of the analyses. It would be important to indicate this in the text and/or the figures.

      We thank the reviewer for this important comment. We have now explicitly reported the number of participants and trials contributing to each analysis throughout the manuscript, including the main text, figure captions, and supplementary materials.

      Specifically, under Materials and Methods (page 38), we now clarify the sample sizes for each learning phase:

      “We analyzed fMRI data from 293 participants during fear conditioning, 320 during extinction, 412 during extinction recall, and 312 during threat renewal.”

      In addition, all figure captions now report the corresponding sample sizes and trial numbers. For example, the caption to Figure 1 (pages 7–8) states:

      “…Block-level comparisons were assessed using paired t-tests, while trial-level effects were examined using a 2 × 2 repeated-measures ANOVA, followed by post hoc comparisons between CS+ and CS− across four trials. Multiple comparisons were controlled using false discovery rate (FDR) correction. Conditioning sample size: n = 293. Detailed statistical parameters are provided in Supplementary Tables 1–2.”

      (3) It is not clear either, why, given the large sample size, some of the results were not conducted using reproducibility strategies such as dividing the sample into 2 or 3 groups or using further cross-validation strategies.

      Cross-validation strategies were applied to the mediation analyses, which are regressionbased and can be sensitive to extreme values or overfitting, ensuring that observed effects generalize beyond the sample. In contrast, the repeated-measures ANOVA tests within-subject condition differences, and is inherently robust to between-subject variability. For these inferential tests, cross-validation or sample-splitting is not typically applied.

      However, following the reviewer’s recommendation, we conducted a cross-validation analysis focusing on the anterior pulvinar and the mediodorsal thalamus, the primary regions of interest in this study. The full sample (N = 293) was randomly divided into three subsamples (n<sub>1</sub> = 106, n<sub>2</sub> = 91, n<sub>3</sub> = 96). For each iteration, we conducted a repeatedmeasures ANOVA (RM-ANOVA) within one subsample and then examined the stability of the CS+ vs. CS− difference in the remaining two subsamples combined. The CS+ vs. CS− difference was statistically significant in most folds for both the mediodorsal thalamus and the anterior pulvinar. Importantly, effect sizes were comparable across folds within each nucleus, indicating stable estimates of the CS effect.

      Finally, we observed a comparable pattern of CS+ vs. CS− differences at the trial level in both the mediodorsal thalamus and the anterior pulvinar. Critically, the effect sizes of these differences were stable across most cross-validation folds

      (4) Limited testing of alternative hypotheses. The results clearly seem to be a selection of the findings supporting the hypotheses that the authors sought to confirm. (just one example: in the analysis reported in Figures 1-2; are there other correlations between the activation of the anterior pulvinar and MD with other pulvinar nuclei? only the MDanterior Puv is reported).

      We thank the reviewer for raising this important point. We would like to clarify that the analyses were not limited to a single, selectively reported association. The relationship between the MD and the anterior pulvinar was evaluated while explicitly accounting for other pulvinar subdivisions, as well as for thalamic input outside the pulvinar.

      Specifically, potential contributions from other pulvinar nuclei were controlled by including them in the regression model (Fig. 2 in the manuscript), and the LGN was included as an additional control region. These analyses therefore test whether the MD–anterior pulvinar association is specific, rather than reflecting a more general thalamic or pulvinar-wide effect. With respect to hypothesis testing, the study was explicitly hypothesis-driven, grounded in functional evidence motivating a specific prediction about MD–anterior pulvinar interactions.

      Still, in response to the reviewer’s suggestion, we further examined pairwise relationships among thalamic subregions. Specifically, we assessed the association between the MD and each pulvinar subdivision using partial correlations, controlling for the remaining pulvinar subdivisions in each analysis. For example, the partial correlation between the MD and the lateral pulvinar was computed while controlling for the activation of the anterior, inferior, and medial pulvinar subdivisions.

      The partial correlation between the MD and the anterior pulvinar was consistent across all four trials of threat learning, whereas the other pulvinar subdivisions did not exhibit a consistent pattern. To evaluate the robustness of these effects, we applied a bootstrap procedure (10,000 resamples) to estimate 95% confidence intervals for each partial correlation. As presented in Figure 4b, only the anterior pulvinar–MD association remained robust, with confidence intervals that did not include zero. In contrast, the confidence intervals for most other pulvinar subdivisions included zero, indicating non-robust associations.

      (5) The manuscript does not contain a limitations subsection. Practically every study has limitations, and this one is not an exception. Better to tell the limitations to the readers upfront so they can factor them into their evaluation of the relevance of the manuscript and reported evidence.

      We thank the reviewer for this constructive suggestion. While the original manuscript already discussed key limitations in the Discussion section (page 36; e.g., “Although distinct thalamic roles in threat learning have been proposed, fMRI data do not fully capture the complexity of this structure…”), we agree that these considerations would benefit from clearer organization and visibility.

      To address this point directly, we have now added a dedicated “Limitations and Future Directions” subsection to the manuscript. This subsection explicitly summarizes the principal limitations of the study—including methodological constraints of fMRI and anatomical resolution—and outlines specific avenues for future research to address them. This change makes the limitations more transparent and allows readers to more easily incorporate them into their evaluation of the findings.

      (6) Data should be made available to the scientific community. Code too. Even if you just used standard fMRI toolboxes, any code used to run analyses will be helpful to the community, or if someone decides to try to replicate your findings.

      We thank the reviewer for this important suggestion and fully agree with the value of data and code sharing for transparency and reproducibility.

      The data supporting the findings of this study are derived from a larger, actively used database that is currently involved in ongoing projects. For this reason, the full dataset cannot yet be publicly released. However, the data underlying the reported analyses are available upon reasonable request from the corresponding author, subject to standard data-use agreements.

      To facilitate reproducibility, all analysis scripts and pipelines used in this study—including preprocessing and analysis workflows implemented in SPM12, and CONN—are available upon request and can be shared with researchers seeking to replicate or extend the reported findings.

      We have clarified this data and code availability statement in the manuscript (page 46).

      Despite these weaknesses and what can be derived from them, this manuscript constitutes a valuable contribution to the field to start characterizing and conceptualizing the involvement of thalamic nuclei and their interactions with other brain regions in the associative threat learning circuitries. It also paves the road for further testing of the functional dynamics among these regions and circuitries, and modeling testing.

      Recommendations for the authors:

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We thank the editors for this important note. Full statistical reporting, including test statistics, degrees of freedom, exact (raw and corrected) p-values, effect sizes, and 95% confidence intervals, is provided for all key analyses in Supplementary Tables 1–9. In addition, uncertainty estimates and major statistics tests are now explicitly reported throughout the main text, as recommended by the reviewers, irrespective of statistical significance.

      During this revision process, we conducted a comprehensive internal consistency check of all reported statistics and figure annotations. We identified and corrected minor discrepancies between some statistical annotations in the figures and the corresponding results reported in the Supplementary Tables. All figures have now been updated to ensure full consistency with the reported analyses. These corrections do not alter the results or conclusions of the study.

      Reviewer #1 (Recommendations for the authors):

      (1) What is the significance of using two different head coils? Were the data comparable from each coil? How did the authors determine this?

      We thank the reviewer for this important question. Data were acquired using two different receiver head coils across participants. Receiver coils primarily influence signal-to-noise ratio (SNR) and spatial sensitivity profiles, rather than the physiological basis of the BOLD response itself (Triantafyllou et al., 2011).

      Importantly, all analyses were based on within-subject contrasts (CS+ vs. CS−), which are robust to global signal scaling differences that may arise from coil sensitivity variations. In addition, standard preprocessing procedures—including intensity normalization, spatial normalization, and nuisance regression—further minimized potential coil-related variability.

      To empirically evaluate whether acquisition differences influenced our results, we conducted a repeated-measures ANOVA testing the Trial × CS × Site interaction (where Site reflects acquisition location and associated scanning setup, including receiver coil configuration) during fear conditioning (N = 293). As shown in Author response table 2, none of the thalamic nuclei demonstrated a significant interaction effect, and all effect sizes were negligible (η<sup>2</sup>p ≤ .01)

      Author response table 2.

      Repeated-Measures ANOVA results for the Trial X CS X site interaction across all relevant thalamic nuclei during fear conditioning.

      (2) Why were the data smoothed? This could have a negative impact on the specificity of the signals averaged within the pre-defined thalamic ROIs.

      Spatial smoothing was applied to improve signal-to-noise ratio and statistical stability in small, deep thalamic subregions, which are particularly susceptible to noise. We acknowledge that smoothing can reduce spatial specificity. However, our analyses were based on anatomically predefined thalamic ROIs and focused on average activation within each region rather than voxel-wise localization. Under this approach, modest smoothing (i.e., a 6-mm full-width at half-maximum smoothing kernel, rather than the commonly used 8-mm kernel) primarily increases reliability while any signal mixing across adjacent regions would be expected to reduce regional specificity and bias effects toward the null, rather than produce spurious or false-positive differences.

      Additionally, we conducted robustness analyses to examine whether spatial smoothing artificially influenced our results. Specifically, we subdivided the mediodorsal thalamus into medial and lateral anatomical regions and compared activation across these subregions. The activation patterns were comparable across both subdivisions, indicating that the observed mediodorsal thalamus effect is unlikely to reflect boundary spillover resulting from smoothing. If smoothing had driven the effect, we would expect differential signal patterns across the subdivisions rather than comparable activation. (See full response to Weakness C, Reviewer 3, as well as Author response image 1 and Author response table 1 in our response).

      (3) Did the authors consider using any null models to determine whether the observed PPI results could have been observed by chance? E.g., block-resampling nulls scramble temporal order while preserving temporal autocorrelation, and can determine whether subtle differences in autocorrelation across regions can give rise to the observed signatures.

      We thank the reviewer for this thoughtful suggestion. All PPI analyses were conducted using the default CONN toolbox pipeline. In this framework, PPI effects are estimated within a GLM at the first level following standard denoising procedures that reduce motion- and physiology-related variance and apply temporal filtering. Importantly, PPI effects are modeled as subject-level contrast terms rather than computed from raw timeseries correlations.

      Group-level inference was performed on these subject-level contrast estimates using paired t-tests with FDR correction across regions. To further assess whether the observed effects could arise by chance, we additionally performed 10,000 bootstrap resamples of the CS+ vs. CS− differences to evaluate the stability of the effects. While we did not implement explicit block-resampling null models that preserve temporal autocorrelation, the combination of first-level GLM modeling following denoising, large sample size (N ≈ 300), and convergent inferential and resampling procedures provides a rigorous and standard assessment of PPI effects. We have revised the manuscript to clarify these procedures and their rationale.

      We added this language to directly address the reviewer’s concern and revised the connectivity analyses section to clarify the workflow (page 44):

      “Following standard denoising procedures—including regression of motion- and physiology-related confounds and temporal filtering—condition-dependent connectivity effects were inferred from subjectlevel generalized psychophysiological interaction (gPPI) contrast estimates rather than from raw timeseries correlations. This GLM-based framework reduces the likelihood that observed PPI effects reflect differences in temporal autocorrelation or spectral properties across regions rather than genuine task-dependent interactions.”

      (4) The authors may wish to report results in text, as there are currently many demonstrative statements that are not associated with requisite uncertainty estimates, making inference challenging.

      We thank the reviewer for this helpful suggestion. We have revised the Results section to explicitly report statistical outcomes in the main text for all key findings, including appropriate uncertainty estimates (e.g., test statistics, effect sizes, and p-values) alongside demonstrative statements. This ensures that all inferences in the text are directly supported by quantitative evidence.

      Additionally, the full statistical details, including test statistics, degrees of freedom, effect sizes, 95% confidence intervals, and both raw and FDR-corrected p-values, are provided in Supplementary Tables 1–9. These changes improve clarity and transparency while avoiding redundancy. Newly added text in the Results section is highlighted in green.

      (5) I could not find any information about the EBICglasso model in the Methods section, nor information about how the centrality measures were estimated. Given the lack of transparency, I recommend down-weighting the often overly-strong language regarding the conclusions of this analysis.

      We have revised and added these details along with other details to the Statistical tests section on pages 42-44:

      “Statistical tests

      All statistical tests were conducted using JASP versions 0.18.3 and 0.19.3(JASP Team, 2024).

      Activation Differences across all phases of threat learning

      In each threat learning phase, we used paired t-tests to examen the differences in activation of the thalamic nuclei in response to CS+ vs. CS- at the block level (average activation across trials), and 2x2 RM-ANOVA to estimate the differences in activation at the trial-wise level. Assumptions of sphericity were checked, and Greenhouse-Geisser corrections were applied where necessary. This model was followed by post hoc tests to estimate the differences at the trial level and False discovery rate (FDR) correction was applied for each question.

      Network analyses of the within pulvinar relationships during conditioning

      The network analyses examined functional relationships between pulvinar divisions. Nodes corresponded to block-level activation estimates of the CS+ minus CS− contrast for each pulvinar division, yielding four nodes (one per division). Networks were estimated using a Gaussian graphical model with EBICglasso (LASSO regularization) based on Pearson correlation matrices, with the EBIC tuning parameter set to γ = 0.5. Edge weights represent partial correlations.

      Three centrality measures were computed on the estimated weighted partial-correlation network: node strength, defined as the sum of the absolute edge weights directly connected to a node; closeness, defined as the inverse of the average shortest path length from a node to all other nodes; and betweenness, defined as the proportion of shortest paths between all pairs of nodes that pass through a given node. Shortest paths were computed using inverse edge weights, consistent with standard practice for weighted networks. Centrality indices were normalized.

      Network accuracy and centrality stability were assessed using nonparametric bootstrapping (10,000 iterations) to estimate confidence intervals for edge weights and centrality measures. All analyses were conducted in JASP (versions 0.18.3 and 0.19.3) using default settings unless otherwise specified, following the procedures described in Epskamp, Borsboom, and Fried (2018).

      Mediation analyses of within pulvinar relationships during conditioning

      Mediation models of the relationships between the activations in pulvinar divisions were estimated using the lavaan package (Rosseel, 2012) with maximum likelihood estimation. All variables were zstandardized prior to analysis. Block-level activation estimates from the inferior and lateral pulvinar were entered as predictors, activation in the medial pulvinar was specified as the mediator, and activation in the anterior pulvinar was specified as the outcome variable.

      To assess the robustness and generalizability of the mediation effects, we conducted 3-fold crossvalidation. The full sample (N = 293) was randomly partitioned into three non-overlapping sub-samples (n = 91, 96, and 106). In each iteration, the mediation model was estimated in one sub-sample, while the remaining sub-samples were used to assess the stability of parameter estimates and indirect effects. This procedure resulted in six cross-validation iterations, allowing evaluation of whether the direction and magnitude of the indirect effect were consistent across independent subsets of the data. Mediation models were estimated using the lavaan package (Rosseel, 2012) with maximum likelihood estimation. Indirect effects were evaluated using bias-corrected percentile bootstrap confidence intervals based on 10,000 resamples, as recommended by Biesanz, Falk, and Savalei (2010). An indirect effect was considered significant when the 95% confidence interval did not include zero (p < 0.05).”

      (6) Bar plots are not effective ways to report group-level data. I recommend replacing all bar plots with visualisations that expose the distribution of the data, such as a violin plot or a raincloud plot.

      We thank the reviewer for this suggestion. In general, we agree that visualizations exposing the full data distribution can be highly informative, and we therefore present distribution-based plots for several analyses (e.g., connectivity results). However, for the activation analyses, our primary goal was to highlight trial-to-trial changes and overall patterns across conditions, rather than the distribution of individual data points per se. For this purpose, bar plots provide a clearer representation of the directional effects and facilitate comparison across trials and conditions.

      (7) The thought bubbles are atypical of scientific figures.

      The figure has been revised to remove the thought bubbles.

      (8) Figure 7 - there are many connections not shown in this figure, suggesting that it is sufficiently oversimplified as to be potentially misleading. For instance, the authors offer no anatomical connections between pulvinar and the cortical hierarchy; however, these connections are ample and (likely) highly important for the functionality assessed here. Similarly, there is no room in the figure for the integration of the shock stimuli (presumably via the spinothalamic tract) and the visual stimuli (via the retina/LGn).

      We agree that the pulvinar has extensive cortical and sensory input/output connections that are not depicted in Figure 7. Our intention was not to provide a complete anatomical wiring diagram, but rather a simplified functional model derived from observed statistical dependencies. We have revised the figure and added an explicit note to the legend clarifying that pulvinar–cortical and sensory pathways (e.g., retina/LGN and spinothalamic inputs) are intentionally omitted due to incomplete subnuclear-level anatomical characterization, and that their omission should not be interpreted as a lack of importance. We added this to Figure 7 legend:

      “Note (panel a):

      Known pulvinar–cortical connections, as well as sensory input pathways (e.g., visual inputs via the retina/LGN and nociceptive inputs via the spinothalamic tract), are not explicitly shown. These connections are well established anatomically but were omitted due to their heterogeneity and incomplete characterization at the level of pulvinar subnuclei. Their absence should not be interpreted as a lack of anatomical or functional relevance.”

      Reviewer #2 (Recommendations for the authors):

      (1) It's somewhat confusing that Figures 1,4,5 D and E are not in the text until later in the results section. Perhaps these should be presented in the figures in the same order they are discussed in the text, although this is a stylistic issue.

      We thank the reviewer for this comment. To improve clarity and align the figures with the structure of the Results section, we reorganized the figures. Specifically, we added a new figure (Figure 7) that consolidates all connectivity analyses. Figures 1, 4, and 5 now focus exclusively on activation results, while Figure 7 presents connectivity results only. This reorganization allows the figures to follow the flow of the text more closely and makes the narrative of each figure clearer.

      (2) Stylistic: I would strongly recommend adding n numbers and describing the basics of statistical tests used and how multiple comparisons were accounted for in the legend for Figures 1,4, and 5.

      We thank the reviewer for this recommendation. We have added the sample sizes (n) and brief descriptions of the statistical tests used, including how multiple comparisons were handled, to the legends of Figures 1, 4, and 5. In addition, we direct the reader to the Supplementary Tables, which were submitted with the original manuscript and provide full statistical details, including test statistics (t, F), degrees of freedom, effect sizes, 95% confidence intervals, raw p values, and corrected p values. Finally, we further elaborated on the statistical tests on pages 42–44, as detailed in our response to Recommendation 5 (Reviewer 1).

      Reviewer #3 (Recommendations for the authors):

      As previously indicated, please note that no information is included in the manuscript about data and code availability. Although you mainly use toolboxes for data analyses, any script(s) that you have used to run things would be great to upload for reproducibility purposes.

      Also, it would be good to include a limitations subsection in the manuscript.

      Thank you for these recommendations. We added limitations subsection to the manuscript. See our responses under Comments 5 and 6 (Reviewer 3, Public Review).

      In terms of data analyses:

      (1) It would be ideal if you quantify in-scanner motion for the different conditions to see if there were no differences in motion due to the task.

      Head motion was estimated at each time point as part of standard preprocessing, and motion parameters were included as nuisance regressors in all first-level models. Because motion estimates are defined per volume rather than per experimental condition, condition-specific motion metrics were not explicitly computed. Importantly, this approach removes motion-related variance uniformly across the time series and therefore controls for potential motion effects across all task conditions. Any residual motion would be expected to increase noise rather than systematically bias condition contrasts.

      (2) You also may want to indicate if normalization followed the SPM 12 default and the data was resampled to 2 x 2 x 2 mm, or kept the same. It is not stated in the data preprocessing subsection of the methods.

      We thank the reviewer for this suggestion. We have now clarified this point in the manuscript (page 41):

      “In addition, spatial normalization was performed with data normalized to Montreal Neurological Institute (MNI) space and resampled to a 2 × 2 × 2 mm<sup>3</sup> voxel grid, followed by spatial smoothing with a 6-mm full-width at half-maximum Gaussian kernel.”

      (3) It is important to indicate how many subjects went into each analysis. Also, it is not clear, based on the current methods section, how many observations per condition were used. That can be reported in the text or the figures.

      We thank the reviewer for this comment. This information has now been added to the Methods section and the relevant figure legends, as described in our response to Comment 2 (Reviewer 3, Public Review).

      References

      Triantafyllou C, Polimeni JR, Wald LL. 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55:597–606. DOI: https://doi.org/10.1016/j.neuroimage.2010.11.084, PMID: 21167946

    1. Author response:

      eLife Assessment

      This manuscript reports an important study in which the authors apply smFRET imaging to probe HIV-1 Env conformational dynamics in the presence of antibodies. Previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Through the cutting-edge application of smFRET imaging, the study provides convincing insights into the mechanisms of action of relevant antibodies.

      We appreciate this positive assessment and thank the reviewers for their time and constructive comments. We will make the following changes in the revised manuscript.

      (1) Clarify the distinction between suppression efficiency and functional cost.

      (2) Add controls: smFRET experiments in the presence of monovalent 10E8.4 and iMab individually and compare results with the bivalent 10E8.4/iMab that we currently have.

      (3) Increase the number of repeats in neutralization experiments to reduce variability and, where feasible, perform infectivity and neutralization assays after click chemistry labeling.

      (4) Add discussion on conformational populations probed by smFRET versus structural analyses, Env conformational heterogeneity, ligand effects, and how these approaches complement each other.

      (5) Further clarify the assignments of multiple conformational states by smFRET, the heterogeneity of Env spikes and virion morphology by cryoET, and the focus of the current smFRET-focused storyline.

      Please find below our provisional responses to the public reviews. We will provide detailed point-by-point responses upon submission of the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have considered a panel of antibodies that target epitopes at the gp120/gp41 interface (8ANC195 and PGT151), the fusion peptide in the gp41 domain (VRC34), and the MPER region of gp41 (DH511.2_K3 and VRC42). They also investigate 10E8.4/iMab, which is an engineered bispecific antibody that targets the MPER and the CD4 receptor. On a technical note, they have applied a double amber codon-readthrough strategy to incorporate the non-natural TCO*A amino acid, which gets labeled through click chemistry. This approach should result in less disruption of the native Env structure as compared to the peptide insertion previously used for smFRET imaging of Env. Furthermore, previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Altogether, through the cutting-edge application of smFRET imaging, the study provides novel insights into the mechanisms of action of interesting and clinically relevant antibodies.

      Thank you for the positive comments!

      In validating the functionality of the S401TAG/R542TAG Env, the authors performed infectivity assays and observed 20% infectivity as compared to wild-type (Figure S2A). However, the text equates this with "20% dual-amber suppression efficiency". This would benefit from some explanation. Why do the authors interpret infectivity as reporting on amber suppression efficiency, and not the functional cost of modifying Env, which is probably unavoidable? Or a combination of both? Is there data to suggest that 100% amber suppression would leave Env 100% functional? If so, this would be valuable to show. If not, the text should be clarified.

      We acknowledge this concern and will clarify the distinction between suppression efficiency and functional cost in the revision. The observed reduction in infectivity does not translate into the functional loss; instead, it more reflects the efficiency of suppression (one of the critical limitations of applying genetic code expansion in mammalian cells), as evidenced by reduced Env expression and incorporation on virions (Fig. 1B). In support of the preservation of Env functionality, tag-free and dual-ncAA-incorporated Env virions exhibited similar dose-dependent neutralization sensitivity against trimer-specific neutralizing antibodies (Fig.1D). We have previously discussed several limitations of amber suppression in mammalian cells combined with smFRET viral systems (PMID: 38232732; PMID: 40716060). In brief, orthogonal tRNA/aaRS pair–mediated amber suppression (reassigning/repurposing amber stop codons to non-canonical amino acids) of the introduced ambers in the target protein (Env in our case) must compete with the cellular translation system, particularly release factors that recognize amber codons and terminate translation. Readthrough of endogenous amber codons in virus-producing cells (in our case, HEK293T) can disrupt normal protein expression and virus production. Similarly, readthrough of preexisting amber codons in HIV-1 ORFs other than the targeted ambers in Env can disrupt virus assembly, which we addressed by generating an amber-free provirus (PMID: 38232732). Introducing two amber codons into Env further reduces efficiency, as dual suppression requires two sequential successful suppression events within the same Env molecule.

      The authors state that the contour plots in Figure 2E reveal "dynamic sampling" of the observed FRET states. Strictly speaking, as presented, the contour plots (and FRET histograms) provide no information on dynamics per se. They indicate only the relative thermodynamic stabilities of the FRET states; transitions between states are a matter of interpretation. The TDPs, shown later in Figure 5A, nicely display the dynamics. More importantly, interpretation of the contour plots is challenging, as some seem to suggest an evolution toward lower FRET states. This is especially evident in Figures 2F and 3D, which suggest that the system evolves into a stable 0.1-FRET state (CO) after about 3 sec. Unless the authors want to conclude something from this, I would suggest that they consider removing the contour plots, since their interpretations are fully supported by the FRET histograms alone.

      We agree and will remove the contour plots, as they do not add meaningful information beyond what the histograms show.

      The data indicating that Env conformation is manipulated by 10E8.4/iMab is interesting. If I understand correctly, 10E8.4/iMab is an engineered antibody with one Fab targeting MPER and the second Fab targeting CD4. In the absence of CD4, could the difference between 10E8.4/iMab and the other MPER antibodies be due to 10E8.4/iMab being monovalent with respect to MPER binding?

      We appreciate this question. To answer this, we will perform smFRET experiments in the presence of 10E8.4 and iMab individually and compare those with the bivalent 10E8.4/iMab.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Xu and co-workers unveil two distinct modes of neutralisation by gp41-targeted broadly neutralizing antibodies on HIV-1 Env. So far, it was unclear as to how the mechanism of neutralisation occurred for this subset of neutralising antibodies (that can target the fusion peptide or the membrane proximal external region of the gp41 subunit). Thanks to single-molecule FRET, the authors show that the majority of broadly neutralizing antibodies stabilize the closed Env conformation (named State 1 since the original work by Munro and colleagues PMID: 25298114). Interestingly, the bivalent 10E8.4/iMab stabilized in turn a CD4-bound open state of Env. The two modes of neutralization described for these antibodies show previously unknown allosteric mechanisms that stabilize closed and open Env conformation, stressing the importance of Env conformational dynamics and its efficiency during the process of fusion.

      Strengths:

      The article is well-written, and the figures fully depict the data in a convincing way. The authors have used smFRET, which is now established in the field as a good tool to assess Env dynamics.

      We appreciate these positive comments!

      Weaknesses:

      (1) The limited controls on how click chemistry affects Env (as labelled Env HIV virions were not evaluated).

      We agree. Our validation focused on ncAA-incorporated Env HIV-1 virions, but not the fluorescently labeled virions. To address this, we will increase the number of repeats in neutralization experiments to reduce variability and, where feasible, perform infectivity and neutralization assays after click chemistry labeling. We will attempt to do it. However, we expect that the additional handling time required for labeling and the centrifugation steps needed to remove free dyes, which can deform/disrupt viral membranes and degrade virions, together with the low dual-amber suppression efficiency, will make these experiments technically challenging as an additional layer of functional validation in live cells. On a related note, we have previously performed real-time tracking of single click-labeled Env virion internalization and trafficking in live cells (PMID: 38232732), supporting the retained functionality of click-chemistry-labeled Env.

      (2) Photobleaching of donor and acceptor molecules occurs right after 10sec exposure.

      We acknowledge this limitation and will include it in the corresponding section.

      (3) Other limitations are well described in the corresponding section.

      We appreciate this comment.

    1. Claude code 可以并行 12个 subagent,几分钟,20x 的限额就到了

      令人惊讶的是:Claude code的并发处理能力如此强大,能够同时运行12个子代理,但同时也暴露了其API使用限制的脆弱性,几分钟内就达到20倍的限额,这表明即使是高级AI模型也存在明显的使用边界,可能影响大规模应用场景。

    1. 官方定位是跟 Claude Code 和 OpenClaw 配合使用。Claude 负责推理和编排,GLM-5V-Turbo 负责'看'和'操作界面'。

      令人惊讶的是,GLM-5V-Turbo被设计为与其他AI模型协作而非竞争,它专门负责视觉感知和界面操作,而将推理和编排工作交给Claude Code。这种专业化分工策略在AI领域是一个创新思路,暗示未来AI系统可能更加专业化而非追求全能。

    1. It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem.

      令人惊讶的是:Claude Mythos Preview在FFmpeg中发现了一个存在16年的漏洞,而这个漏洞在被自动化测试工具执行了500万次后仍未被发现。这揭示了AI在代码分析方面具有传统自动化工具无法比拟的独特洞察力。

    1. Agent systems should be designed assuming prompt-injection and exfiltration attempts. Separating harness and compute helps keep credentials out of environments where model-generated code executes.

      令人惊讶的是:OpenAI明确指出AI代理系统应假设存在提示注入和数据泄露尝试,并建议将控制层与计算层分离以保护凭据。这种安全设计理念表明,OpenAI对AI安全威胁有深刻理解,并采取了主动防御措施,这与许多开发者可能采用的被动安全方法形成鲜明对比。

    1. The boundary between AI judgment and human judgment is explicit and written in code.

      令人惊讶的是:Mistral的连接器允许开发者在代码中明确设置AI判断和人类判断之间的界限。通过requires_confirmation参数,开发者可以确保某些工具执行前需要人工批准,这种设计既保持了AI的灵活性,又确保了关键操作的安全性。

    2. Because of this, teams keep rebuilding the same integration layer. Even within the same company, similar integrations are often implemented multiple times in arbitrary code, leading to security risks, lack of traffic observability, and duplication of work.

      令人惊讶的是:即使在同一公司内部,类似的集成也经常被多次实现,导致安全风险、流量可见性不足和工作重复。这种重复建设企业AI集成层的问题比人们想象的更为普遍,而Mistral的连接器旨在通过封装集成到单一可重用实体来解决这一问题。

    1. Austin built the whole pipeline from his Claude Code terminal using the Notion API. He brain-dumped the desired outcome using Monologue, let Claude Code create the database and data pipeline, and pasted the generated instructions into the Notion custom agent setup.

      令人惊讶的是:非技术人员可以通过语音转文本工具(Monologue)直接向AI描述需求,然后由AI自动构建整个数据管道和代理系统,这大大降低了技术门槛,使非技术团队成员也能构建复杂的AI工作流程。

    1. 21% said they had started doing new tasks because of AI (task augmentation). An example of task augmentation could be a data analysis tasks that would ordinarily require the worker to know how to code.

      令人惊讶的是:五分之一的在职AI用户因为AI而开始执行原本不会做的新任务,如数据分析,这表明AI不仅替代工作,还在创造新的工作机会和技能需求。

    1. We projected that, given 13 GB300 GPUs, FP8 precision, physical error rate of 0.003, 1000 rounds, Surface code d=13, the fast model can achieve 0.11 μs / round.

      令人惊讶的是:量子纠错解码的速度可以达到惊人的0.11微秒/轮,这比人类神经元的反应速度还要快几个数量级。这种超高速处理能力是实现实用量子计算的关键,也是传统计算方法难以企及的。