10,000 Matching Annotations
  1. Jul 2025
    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Xie and colleagues presents transcriptomic experiments that measure gene expression in eight different tissues taken from adult female and male mice from four species. These data are used to make inferences regarding the evolution of sex-biased gene expression across these taxa.

      Strengths:

      The experimental methods and data analysis appear appropriate. The authors promote their study as unprecedented in its size and technical precision.

      We do not understand the statement "the authors promote" as if there was a doubt about this. If there is a doubt, we welcome to see it specified.

      Weaknesses:

      The manuscript does not present a clear set of novel evolutionary conclusions. The major findings recapitulate many previous comparative transcriptomics studies - gene expression variation is prevalent between individuals, sexes, and species; and genes with sex-biased expression evolve more rapidly than genes with unbiased expression - but it is not clear how the study extends our understanding of gene expression or its evolution.

      There have been no "previous comparative transcriptomics studies" at a micro- evolutionary scale in animals, hence, we do not "replicate" these. And our contrast between somatic and gonadal patterns reveals insights that have not been recognized before, namely that gonadal sex-specific expression turnover is actually not faster that the corresponding non-sex-specific truover. We have now further clarified this distinction throughout the text and have also adapted the title of the paper accordingly.

      We agree with the overall statement that "gene expression variation is prevalent between individuals, sexes, and species" but the aspect of "sex-biased gene expression between individuals" has not been systematically analysed before in such a context.

      Concerning the statement that "genes with sex-biased expression evolve more rapidly than genes with unbiased expression", we note that this is mostly derived from gonadal data and that there is no study that has quantified this so far at a population level and between subspecies in comparison to somatic data.

      Our results show further that previous assumptions of a substantial set of genes with sex- biased expression conserved between mice and humans are due to underestimating the convergence issues when there is an extremly fast turnover of sex-biased gene expression. This has a major implication for using mice as a model for gender-speficic medicine questions in humans.

      Many gene expression differences between individual animals are selectively neutral, because these differences in mRNA concentration are buffered at the level of translation, or differences in protein abundance have no effect on cellular or organismal function. The hypothesis that sex-biased genes are enriched for selectively neutral expression differences is supported by the excess of inter-individual expression variance and inter-specific expression differences in sex-biased genes.

      This statement repeats a statement from the first round of reviews. We had added new data and extensive discussion on this topic. We do not understand why this has not been taken into account. In fact, a major strength of our paper is that it shows that most sex- biased gene expression differences are not neutral!

      There are two major issues here: to identify sex-biased gene expression in the first place, we (and all other papers in the field) use the neutral model as null-hypothesis. Genes that are not compatible with this null-hypothesis are considered sex-biased. In contrast to most previous papers, we have the possibility to take into account the variances between individuals to add an additional significance test. Hence, we can apply a much more rigorous two-step process: first a ratio-cutoff plus a Wilcoxon rank sum test with correction for multiple testing to identify significant deviations from the null-hypothesis. We have added some additional statements in the Results and Discussion sections to emphasize this.Second, by focusing on the genes that are not following a neutral model, the variance and divergences data support the action of selection, rather than neutral drift.

      A higher rate of adaptive coding evolution is inferred among sex-biased genes as a group, but it is not clear whether this signal is driven by many sex-biased genes experiencing a little positive selection, or a few sex-biased genes experiencing a lot of positive selection, so the relationship between expression and protein-coding evolution remains unclear.

      Again, there are two major issues here. First, the distribution of alpha-values shown in Figure 3B are rather homogeneous, i.e. there is not support for a scenario that the average is driven by only a few genes.

      Second, it seems that the referee wants to see an analysis where dn/ds ratios are broken down for every single gene. This has been done in previous papers, but it is now understood that this procedure is fraught with error because of the demographic contingencies inherent to natural populations that can yield wrong results for individual loci. We have added some statements to the text to clarify this further.

      It is likely that only a subset of the gene expression differences detected here will have phenotypic effects relevant for fitness or medicine, but without some idea of how many or which genes comprise this subset, it is difficult to interpret the results in this context.

      It is the basic underlying assumption for the whole research field that significantly sex- biased genes are phenotypically relevant for fitness, since they would otherwise not be sex- biased in the first place.

      Throughout the paper the concepts of sexual selection and sexually antagonistic selection are conflated; while both modes of selection can drive the evolution of sexually dimorphic gene expression, the conditions promoting and consequence of both kinds of selection are different, and the manuscript is not clear about the significance of the results for either mode of selection.

      We had explained in our previous response that our data collection was not designed to distinguish between these two processes. But given that the issue is being brought up again, we have now added some discussion on this issue.

      The manuscript's conclusion that "most of the genetic underpinnings of sex-differences show no long-term evolutionary stability" is not supported by the data, which measured gene expression phenotypes but did not investigate the underlying genetic variation causing these differences between individuals, sexes, or species.

      We agree that - under a strict definition - our use of the term "genetic underpinning" in this conclusion sentence can be criticized. The most correct term would be "transcriptional underpinnings", but of course, given that it is the current practice of the whole field to assume that "transcriptional" is part of the overall genetics, we do not consider our initial statement as incorrect. Still, we have changed the term accordingly.

      Furthermore, most of the gene expression differences are observed between sex-specific organs such as testes and ovaries, which are downstream of the sex-determination pathway that is conserved in these four mouse species, so these conclusions are limited to gene expression phenotypes in somatic organs shared by the sexes.

      Yes - correct. But the whole focus of the paper is on somatic expression, i.e. organs that share the same cell compositions. Of course, the comparison between gonadal organs is conflated by being composed of different cell types. We have extended the discussion of this point.

      The differences between sex-biased expression in mice and humans are attributed to differences in the two species effective population sizes; but the human samples have significantly more environmental variation than the mouse samples taken from age-matched animals reared in controlled conditions, which could also explain the observed pattern.

      These are indeed the two alternative explanations that we had discussed (last paragraph of the discussion section, now the penultimate paragraph).

      The smoothed density plots in Figure 5 are confusing and misleading. Examining the individual SBI values in Table S9 reveals that all of the female and male SBI values for each species and organ are non-overlapping, with the exception of the heart in domesticus and mammary gland in musculus, where one male and one female individual fall within the range of the other sex. The smoothed plots therefore exaggerate the overlap between the sexes;

      Smoothing across discrete values is an entirely standard procedure for continuous variables. It allows to visualize the inherent data trends that cannot easily be glanced from simple inspection of the actual values. This is a mathematical procedure, not an "exaggeration". We used the same smoothening procedure for all the comparisons, and it is clear that the distributions between females and males of the sex organs and a few somatic organs are well separated (non-overlapping), which serves as a control.

      in particular, the extreme variation shown in the SBI in the mammary glands in spretus females and spicilegus males is hard to understand given the normalized values in Table S3. The R code used to generate the smoothed plots is not included in the Github repository, so it is not possible to independently recreate those plots from the underlying data.

      We apologize that there was indeed an error in the Figure - the columns for SPR and SPI were accidentally interchanged. We have corrected this figure. Generally, the smoothened patterns we show are easily verified by looking up the respective primary values. We apologize that the code lines for the plots were accidentally omitted. We have used a standard function from ggplot2: geom_density, with "adjust=3, alpha=0.5" for all plots and included this description in the Methods. We have now added this to the R code in the GitHub repository.

      The correlations provided in Table S9 are confusing - most of the reported correlations are 1.0, which are not recovered when using the SBI values in Table S9, and which does not support the manuscript's assertion that sex-biased gene expression can vary between organs within an individual. Indeed, using the SBI values in Table S9, many correlations across organs are negative, which is expected given the description of the result in the text.

      There is a misunderstanding here. The tables do not report correlations, but only p-values for correlations, the raw ones and the ones after corrections for multiple testing. P = 1.0 means no significant correlation. We have adjusted the caption of this table to clarify this further.

      Reviewer #3 (Public review):

      This manuscript reports interesting data on sex differences in expression across several somatic and reproductive tissues among 4 mice species or subspecies. The focus is on sex- biased expression in the somatic tissues, where the authors report high rates of turnover such that the majority of sex-biased genes are only sex-biased in one or two taxa. The authors show sex-biased genes have higher expression variance than unbiased genes but also provide some evidence that sex-bias is likely to evolve from genes with higher expression variance. The authors find that sex-biased genes (both female- and male-biased) experience more adaptive evolution (i.e., higher alpha values) than unbiased genes. The authors develop a summary statistic (Sex-Bias Index, SBI) of each individual's degree of sex- bias for a given tissue. They show that the distribution of SBI values often overlap considerably for somatic (but not reproductive) tissues and that SBI values are not correlated across tissues, which they interpret as indicating an individual can be relatively "male-like" in one tissue and relatively "female-like" in another tissue.

      This is a good summary of the data, but we are puzzled that it does not include the completely new module analysis and the finding of extremely fast evolution of sex-biased somatic gene expression compared to the gonadal one.

      Though the data are interesting, there are some disappointing aspects to how the authors have chosen to present the work. For example, their criteria for sex-bias requires an expression ratio of one sex to the other of 1.25. A reasonably large fraction of the "sex- biased genes" have ratios just beyond this cut-off (Fig. S1). A gene which has a ratio of 1.27 in taxa 1 can be declared as "sex-biased" but which has a ratio of 1.23 in taxa 2 will not be declared as "sex-biased". It is impossible to know from how the data are presented in the main text the extent to which the supposed very high turnover represents substantial changes in dimorphic expression. A simple plot of the expression sex ratio of taxa 1 vs taxa 2 would be illuminating but the authors declined this suggestion.

      Choosing a cutoff is the standard practice when dealing with continuously distributed data. As we have pointed out, we looked at various cutoff options and decided to use the present one, based on the observed data distributions. Note that some studies have used even lower ones (e.g. 1.1). To visualize the data distribution, we had provided the overall distribution of ratios, because one would have to look at many more plots otherwise. But we have now also added individual plots as Figure 1, Figure supplement 2, as requested. They confirm what is also evident from the overall plots, namely that most ratio changes are larger than the incremental values suggested by the reviewer. Note that the original data are of course also available for inspection.

      I was particularly intrigued by the authors' inference of the proportion of adaptive substitutions ("alpha") in different gene sets. The show alpha is higher for sex-biased than unbiased genes and nicely shows that the genes that are unbiased in focal taxa but sex- biased in the sister taxa also have low alpha. It would be even stronger that sex-bias is associated with adaptive evolution to estimate alpha for only those genes that are sex- biased in the focal taxa but not in the sister taxa (the current version estimates alpha on all sex-biased genes within the focal taxa, both those that are sex-biased and those that are unbiased in the sister taxa).

      We have added the respective values in the results section, but since fewer genes are involved, they are less comparable to the other sets of genes. Still, the tendencies remain.

      The author's Sex Bias Index is measured in an individual sample as: SBI = median(TPM of female-biased genes) - median(TPM of male-biased genes). This index has some strange properties when one works through some toy examples (though any summary statistic will have limitations). The authors do little to jointly discuss the merits and limitations of this metric. It would have been interesting to examine their two key points (degree of overlapping distributions between sexes and correlation across tissues) using other individual measures of sex-bias.

      We had responded to this comment before (including the explanation that it has no strange properties when one applies the normalization that is now implemented) and we have added a whole section devoted to the discussion of the merits of the SBI. We do not know which other "individual measures of sex-bias" this should be compared to. Still, we have now added a paragraph in the discussion about using PCA as an alternative to show that this would result in similar conclusions, but is technically less suitable for this purpose.

      Figure 5 shows symmetric gaussian-looking distributions of SBI but it makes me wonder to what extent this is the magic of model fitting software as there are only 9 data points underlying each distribution. Whereas Figure 5 shows many broadly overlapping distributions for SBI, Figure 6 seems to suggest the sexes are quite well separated for SBI (e.g., brain in MUS, heart in DOM).

      We use a standard fitting function in R (see above), which tries to fit a normalized distribution, but this function can also add an additional peak when the data are too heterogeneous (e.g. Mammary in Figure 7).

      Fig. S1 should be shown as the log(F/M) ratio so it is easier to see the symmetry, or lack thereof, of female and male-biased genes.

      The log will work differently for values <1, compared to values >1 when used in a single plot. We have now generated combined plots with symmetric values to allow a better comparability.

      It is important to note that for the variance analysis that IQR/median was calculated for each gene within each sex for each tissue. This is a key piece of information that should be in the methods or legend of the main figure (not buried in Supplemental Table 17).

      ​We have now moved these descriptions into the Methods section.

    1. eLife Assessment

      This study investigated the role of insulin receptor (IR) and insulin-like growth factor 1 receptor (IGF1R) in the renal glomerular podocytes by characterizing the mice with dual deletion of both receptors in vivo as well as the cultured murine podocytes with induced deletion of both receptors in vitro. The solid data presented in this paper demonstrated the critical requirement of both IR and IGF1R signaling in normal podocyte physiology in mice, albeit a more detailed characterization of the mouse model is desired. Interestingly, long-range sequencing revealed significant retention of introns in mRNAs, due to an altered spliceosome level resulted from the loss of IR and IGF1 signaling in cultured podocytes. This new finding suggests an essential role of IR and IGF1R signaling in regulating RNA metabolism in podocyte, which provides useful information for the understanding of physiology and metabolism of podocytes. However, the underlying molecular mechanism for such a regulation is still unclear and awaits further studies.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the roles of the insulin receptor and the insulin growth factor receptor were investigated in podocytes. Mice in which both receptors were deleted developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism, the authors performed global proteomics and find that spliceosome proteins are down-regulated. They confirm this by using long-range sequencing. These results suggest a novel role for these pathways in podocytes.

      This is primarily a descriptive study. The mechanism of how insulin and IGF1 signaling are linked to the spliceosome is not addressed and the phenotype of the mice is only superficially explored. The main issues are that the completeness of the mouse KO is never assessed nor is the completeness of the KO in cell lines. The absence of this data is a significant weakness. The mouse experiments would be improved if the serum creatinines were measured to provide some idea about the severity of the kidney injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful. If this didn't rescue the phenotype, an explanation in the text would suffice. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on. Lastly, in the cell line experiments, the authors should discuss the caveats associated with studying the 50% of the cells that survive vs the ones that died.

      Significance:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney, therefore, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism, the major limitations are the lack of information regarding the completeness of the KO's. If, for example, they can determine that in the mice, the KO is complete, that the GFR is relatively normal, then the phenotype they describe is relatively mild.

      Comments on revision plan:

      I agree with the suggested experiments especially, the experiments to examine whether insulin/IGF1 signaling have effects on splicing proteins. An alternative experiment would be to ask whether rescue of IR or IGF1R would ameliorate the splicing effects.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, with bars/superimposed dot-plots.

      Methods are generally well described. It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity.

      Specific comments:

      (1) Data are presented as mean/SEM. In general, mean/SD or median/IQR are preferred to allow the reader to evaluate the spread of the data. There may be exceptions where only SEM is reasonable.

      (2) It would be useful to for the reader to be told the number of over-lapping genes (with similar expression between mouse groups) and the results of a statistical test comparing WT and KO mice. The overlap of intron retention events between experimental repeats was about 30% in both knock-out podocytes. This seems low and I am curious to know whether this is typical for typical for this method; a reference could be helpful.

      (3) Please explain "adjusted p value of 0.01." It is not clear how was it adjusted. The number of differentially-expressed proteins between the two cell types was 4842.

      Comments on revision plan:

      The authors suggest additional experiments that should address my concerns and probably the other reviewers' concerns.

      I encourage the authors to proceed with their proposed experiments and revisions.

    4. Reviewer #3 (Public review):

      Summary:

      These investigators have previously shown important roles for either insulin receptor (IR) or insulin-like growth factor receptor (IGF1R) in glomerular podocyte function. They now have studied mice with deletion of both receptors and find significant podocyte dysfunction. They then made a podocyte cell line with inducible deletion of both receptors and find abnormalities in transcriptional efficiency with decreased expression of spliceosome proteins and increased transcripts with impaired splicing or premature termination.

      The studies appear to be performed well and the manuscript is clearly written.

      There are a number of potential issues and questions with these studies.

      (1) For the in vivo studies, the only information given is for mice at 24 weeks of age. There needs to be a full time course of when the albuminuria was first seen and the rate of development. Also, GFR was not measured. Since the podocin-Cre utilized was not inducible, there should be a determination of whether there was a developmental defect in glomeruli or podocytes. Were there any differences in wither prenatal post natal development or number of glomeruli?

      (2) Although the in vitro studies are of interest, there are no studies to determine if this is the underlying mechanism for the in vivo abnormalities seen in the mice. Cultured podocytes may not necessarily reflect what is occurring in podocytes in vivo.

      (3) Given that both receptors are deleted in the podocyte cell line, it is not clear if the spliceosome defect requires deletion of both receptors or if there is redundancy in the effect. The studies need to be repeated in podocyte cell lines with either IR or IGFR single deletions.

      (4) There are no studies investigating signaling mechanisms mediating the spliceosome abnormalities.

      Comments on revision plan:

      I do not have any changes from my prior review. I applaud the authors for developing a plan to address the questions and concerns raised in my prior review.

    5. Author response:

      Evidence reducibility and clarity

      Reviewer 1:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, were both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulosclerosis over several months. Because of concerns about incomplete KO, the authors generated podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism, the authors performed global proteomics and find that spliceosome proteins are downregulated. They confirm this by using long-range sequencing. These results suggest a novel role for these pathways in podocytes.

      Thank you

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling are linked to the spiceosome is not addresed.

      We do not think the paper is descriptive as we used non-biased phospho and total proteomics in the DKO cells to uncover the alterations in the spliceosome (that have not been previously described) that were detrimental. However, we are happy to look further into the underlying mechanism.

      We would propose:

      (1) Stimulating/inhibiting insulin/IGF signalling pathways in the Wild-type and DKO knockout cells and check expression levels and/or phosphorylation status of splice factors (including those in Figure 3E) and those revealed by phospho-proteomic data; a variety of inhibitors of insulin/IGF1 pathways could also be used along the pathways that are shown in Fig 2.

      (2) Looking at the RNaseq data bioinformatically in more detail – the introns/exons that move up or down are targets of the splice factors involved; most splice factors binding sequences are known, so it should be possible to ask bioinformatically – from the sequences around the splice sites of the exons and introns that move in the DKO, which splice factors binding sites are seen most frequently? To uncover splice factors/RNA-binding proteins (RBPs) that are involved in the insulin signaling we will use a software named MATT which was specifically designed to look for RNA-binding motifs (PMID 30010778). In brief, using the long-sequencing data, we will test 250 nt sequences flanking the splice sites of all regulated splicing events (intronic and exonic) against all RNA- binding proteins in the CISBP-RNA database (PMID 23846655) using MATT. This will result in a list of RBPs potentially involved in the insulin signaling. We will validate these by activating insulin signaling (similar to Figures 2 B,C) and probe whether the RBPs are activated (e.g. phosphorylated or change in expression) or we will manipulate expression of the candidate RBPs and measure how they affect the insulin signaling.

      (3) Examining the phospho and total proteomic data for IGF1R and Insulin receptor knockout alone podocytes (which we have already generated) and analysing these in more detail and include this data set to elucidate the relative importance of both receptors to spliceosome function.

      The phenotype of the mouse is only superficially addressed. The main issues are that the completeness of the mouse KO is never assessed nor is the completeness of the KO in cell lines. The absence of this data is a significant weakness.

      We apologise for not making clear but we did assess the level of receptor knockdown in the animal and cell models.  The in vivo model showed variable and non-complete levels of insulin receptor and IGF1 receptor podocyte knock down (shown in supplementary figure 1B). This is why we made the in vitro  floxed podocyte cell lines in which we could robustly knockdown both the insulin receptor and IGF1 receptor (shown in Figure 2A)

      The mouse experiments would be improved if the serum creatinines were measured to provide some idea how severe the kidney injury is.

      We can address this:

      We have further urinary Albumin:creatinine ratio (uACR) data at 12, 16 and 20 weeks. We also have more blood tests of renal function that can be added. There is variability in creatinine levels which is not uncommon in transgenic mouse models (probably partly due to variability in receptor knock down with cre-lox system). This is part of rationale of developing the robust double receptor knockout cell models where we knocked out both receptors by >80%.

      An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful. If this didn't work, an explanation in the text would suffice.

      We would consider  over express SF3BF4 in the Wild type and DKO cells and assess the effects on spliceosome if deemed necessary.  However, we think it is unlikely to rescue the phenotype as so many other spliceosome components are downregulated in the DKO cells.

      As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      We have some detail on this and can add to the manuscript. However it is not extensive as not a major driver of this work.

      Lastly, the authors should caveat the cell experiments by discussing the ramifications of studying the 50% of the cells that survive vs the ones that died.

      Thank you, we appreciate this and this was the rationale behind cells being studied after 2 days differentiation before significant cell loss in order to avoid the issue of studying the 50% of cells that survive.

      Reviewer 2:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.<br /> Specific comments.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The six figures are generally well-designed, bars/superimposed dot-plots.

      Thank you

      Evaluation.

      Methods are generally well described. It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity.

      We did this and will add this information to the methods/figure legend.

      Specific comments.

      (1) Data are presented as mean/SEM. In general, mean/SD or median/IQR are preferred to allow the reader to evaluate the spread of the data. There may be exceptions where only SEM is reasonable.

      Graphs can be changed to SD rather than SEM.

      (2) It would be useful to for the reader to be told the number of over-lapping genes (with similar expression between mouse groups) and the results of a statistical test comparing WT and KO mice. The overlap of intron retention events between experimental repeats was about 30% in both knock-out podocytes. This seems low and I am curious to know whether this is typical for typical for this method; a reference could be helpful.

      This is an excellent question. We had 30% overlap as the parameters used for analysis were very stringent. We suspect we could get more than 30% by being less stringent, which still be considered as similar events if requested. Our methods were based on FLAIR analysis (PMID: 32188845)

      (3) Please explain "adjusted p value of 0.01." It is not clear how was it adjusted. The number of differentially-expressed proteins between the two cell types was 4842.

      We used the Benjamini-Hochberg method to adjust our data. We think the reviewer is referring to the transcriptomic data and not the proteomic data.

      Minor comments

      Page numbers in the text would help the reviewer communicate more effectively with the author.

      We will do this

      Reviewer 3:

      These investigators have previously shown important roles for either insulin receptor (IR) or insulin-like growth factor receptor (IGF1R) in glomerular podocyte function. They now have studied mice with deletion of both receptors and find significant podocyte dysfunction. They then made a podocyte cell line with inducible deletion of both receptors and find abnormalities in transcriptional efficiency with decreased expression of spliceosome proteins and increased transcripts with impaired splicing or premature termination.

      The studies appear to be performed well and the manuscript is clearly written.

      Thank you

      Referees cross-commenting

      I am in agreement with Reviewer 1 that the studies are overly descriptive and do not provide sufficient mechanism and the lack of more investigation of the in vivo model is a significant weakness.

      Please see our responses to reviewer 1 above.

      Significance

      Reviewer 1:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism, the major limitations are the lack of information regarding the completeness of the KO's. If, for example, they can determine that in the mice, the KO is complete, that the GFR is relatively normal, then the phenotype they describe is relatively mild.

      Thank you. The receptor  KO in the mice is unlikely to be complete (Please see comments above and Supplementary Figure 1b). There are many examples of KO models targeting other tissues showing that complete KO of these receptors seems difficult to achieve , particularly in reference to the IGF1 receptor. In the brain (which is also terminally differentiated cells PMID:28595357 (barely 50% iof IGF1R knockdown was achieved in the target cells). Ovarian granulosa cells PMID:28407051 -several tissue specific drivers tried but couldn't achieve any better than 80%. The paper states that 10% of IGF1R is sufficient for function in these cells so they conclude that their knockdown animals are probably still responding to IGF1. Finally, in our recent IGF1R podocyte knockdown model we found Cre levels were important for excision of a single floxed gene (PMID: 38706850) hence we were not surprised that trying to excise two floxed genes (insulin receptor and IGF1 receptor) was challenging. This is the rationale for making the double receptor knockout cell lines to understand process / biology in more detail.

      Reviewer 2:

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The figures are generally well-designed, bars/superimposed dot-plots.

      Evaluation.

      Methods are generally well described. It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity.

      Thank you we will do this.

      Reviewer 3:

      There are a number of potential issues and questions with these studies.

      (1) For the in vivo studies, the only information given is for mice at 24 weeks of age. There needs to be a full time course of when the albuminuria was first seen and the rate of development. Also, GFR was not measured. Since the podocin-Cre utilized was not inducible, there should be a determination of whether there was a developmental defect in glomeruli or podocytes. Were there any differences in wither prenatal post natal development or number of glomeruli?

      Thank you we will add in further phenotyping data. We do not think there was a major developmental phenotype as  albuminuria did not become significantly different until several months of age. We could have used a doxycycline inducible model but we know the excision efficiency is much less than the podocin-cre driven model SUPP FIGURE 1. This would likely give a very mild (if any) phenotype and not reveal the biology adequately.

      (2) Although the in vitro studies are of interest, there are no studies to determine if this is the underlying mechanism for the in vivo abnormalities seen in the mice. Cultured podocytes may not necessarily reflect what is occurring in podocytes in vivo.

      Thank you for this we are happy to employ Immunohistochemistry (IHC) and immunofluorescence (IF) using spliceosome antibodies on tissue sections from DKO and control mice to examine spliceosome changes. However, as the DKO results in podocyte loss, there may not be that many DKO podocytes still present in the tissue sections. This will be taken into consideration.

      (3) Given that both receptors are deleted in the podocyte cell line, it is not clear if the spliceosome defect requires deletion of both receptors or if there is redundancy in the effect. The studies need to be repeated in podocyte cell lines with either IR or IGFR single deletions.

      Thank you. We have full total and phospho-proteomic data sets from single insulin receptor and IGF1 receptor knockout cell lines that we will investigate for this point.

      (4) There are not studies investigating signaling mechanisms mediating the spliceosome abnormalities.

      Thank you as outlined as above to reviewer 1 point 1 we are very happy to investigate insulin / IGF signalling pathways in more detail.

    1. eLife Assessment

      This paper performs a valuable critical reassessment of anatomical and functional data, proposing a reclassification of the mouse visual cortex in which almost all the higher visual areas are consolidated into a single area V2. However, the evidence supporting this unification is incomplete, as the fundamental assumptions of the model conflict with key experimental observations. This study will likely be of interest to neuroscientists focused on the mouse visual cortex and the evolution of cortical organization.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors argue that defining higher visual areas (HVAs) based on reversals of retinotopic tuning has led to an over-parcellation of secondary visual cortices. Using retinotopic models, they propose that the HVAs are more parsimoniously mapped as a single area V2, which encircles V1 and exhibits complex retinotopy. They reanalyze functional data to argue that functional differences between HVAs can be explained by retinotopic coverage. Finally, they compare the classification of mouse visual cortex to that of other species to argue that our current classification is inconsistent with those used in other model species.

      Strengths:

      This manuscript is bold and thought-provoking, and is a must-read for mouse visual neuroscientists. The authors take a strong stance on combining all HVAs, with the possible exception of area POR, into a single V2 region. Although I suspect many in the field will find that their proposal goes too far, many will agree that we need to closely examine the assumptions of previous classifications to derive a more accurate areal map. The authors' supporting analyses are clear and bolster their argument. Finally, they make a compelling argument for why the classification is not just semantic, but has ramifications for the design of experiments and analysis of data.

      Weaknesses:

      Although I enjoyed the polemic nature of the manuscript, there are a few issues that weaken their argument.

      (1) Although the authors make a compelling argument that retinotopic reversals are insufficient to define distinct regions, they are less clear about what would constitute convincing evidence for distinct visual regions. They mention that a distinct area V3 has been (correctly) defined in ferrets based on "cytoarchitecture, anatomy, and functional properties", but elsewhere argue that none of these factors are sufficient to parcellate any of the HVAs in mouse cortex, despite some striking differences between HVAs in each of these factors. It would be helpful to clearly define a set of criteria that could be used for classifying distinct regions.

      (2) On a related note, although the authors carry out impressive analyses to show that differences in functional properties between HVAs could be explained by retinotopy, they glossed over some contrary evidence that there are functional differences independent of retinotopy. For example, axon projections to different HVAs originating from a single V1 injection - presumably including neurons with similar retinotopy - exhibit distinct functional properties (Glickfeld LL et al, Nat Neuro, 2013). As another example, interdigitated M2+/M2- patches in V1 show very different HVA connectivity and response properties, again independent of V1 location/retinotopy (Meier AM et al., bioRxiv). One consideration is that the secondary regions might be considered a single V2 with distinct functional modules based on retinotopy and connectivity (e.g., V2LM, V2PM, etc).

      (3) Some of the HVAs-such as AL, AM, and LI-appear to have redundant retinotopic coverage with other HVAS, such as LM and PM. Moreover, these regions have typically been found to have higher "hierarchy scores" based on connectivity (Harris JA et al., Nature, 2019; D'Souza RD et al., Nat Comm, 2022), though unfortunately, the hierarchy levels are not completely consistent between studies. Based on existing evidence, there is a reasonable argument to be made for a hybrid classification, in which some regions (e.g., LM, P, PM, and RL) are combined into a single V2 (though see point #2 above) while other HVAs are maintained as independent visual regions, distinct from V2. I don't expect the authors to revise their viewpoint in any way, but a more nuanced discussion of alternative classifications is warranted.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Rowley and Sedigh-Sarvestani presents modeling data suggesting that map reversals in mouse lateral extrastriate visual cortex do not coincide with areal borders, but instead represent borders between subregions within a single area V2. The authors propose that such an organization explains the partial coverage in higher-order areas reported by Zhuang et al., (2017). The scheme revisits an organization proposed by Kaas et al., (1989), who interpreted the multiple projection patches traced from V1 in the squirrel lateral extrastriate cortex as subregions within a single area V2. Kaas et al's interpretation was challenged by Wang and Burkhalter (2007), who used a combination of topographic mapping of V1 connections and receptive field recordings in mice. Their findings supported a different partitioning scheme in which each projection patch mapped a specific topographic location within single areas, each containing a complete representation of the visual field. The area map of mouse visual cortex by Wang and Burkhalter (2007) has been reproduced by hundreds of studies and has been widely accepted as ground truth (CCF) (Wang et al., 2020) of the layout of rodent cortex. In the meantime, topographic mappings in marmoset and tree shew visual cortex made a strong case for map reversals in lateral extrastriate cortex, which represent borders between functionally diverse subregions within a single area V2. These findings from non-rodent species raised doubts about whether during evolution, different mammalian branches have developed diverse partitioning schemes of the cerebral cortex. Rowley and Sedigh-Sarvestani favor a single master plan in which, across evolution, all mammalian species have used a similar blueprint for subdividing the cortex.

      Strengths:

      The story illustrates the enduring strength of science in search of definitive answers.

      Weaknesses:

      To me, it remains an open question whether Rowley and Sedigh-Sarvestani have written the final chapter of the saga. A key reason for my reservation is that the areas the maps used in their model are cherry-picked. The article disregards published complementary maps, which show that the entire visual field is represented in multiple areas (i.e. LM, AL) of lateral extrastriate cortex and that the map reversal between LM and AL coincides precisely with the transition in m2AChR expression and cytoarchitecture (Wang and Burkhalter, 2007; Wang et al., 2011). Evidence from experiments in rats supports the gist of the findings in the mouse visual cortex (Coogan and Burkhalter, 1993).

      (1) The selective use of published evidence, such as the complete visual field representation in higher visual areas of lateral extrastriate cortex (Wang and Burkhalter, 2007; Wang et al., 2011) makes the report more of an opinion piece than an original research article that systematically analyzes the area map of mouse visual cortex we have proposed. No direct evidence is presented for a single area V2 with functionally distinct subregions.

      (2) The article misrepresents evidence by commenting that m2AChR expression is mainly associated with the lower field. This is counter to published findings showing that m2AChR spans across the entire visual field (Gamanut et al., 2018; Meier et al., 2021). The utility of markers for delineating areal boundaries is discounted, without any evidence, in disregard of evidence for distinct areal patterns in early development (Wang et al., 2011). Pointing out that markers can be distributed non-uniformly within an area is well-familiar. m2AChR is non-uniformly expressed in mouse V1, LM and LI (Ji et al., 2015; D'Souza et al., 2019; Meier et al., 2021). Recently, it has been found that the patchy organization within V1 plays a role in the organization of thalamocortical and intracortical networks (Meier et al., 2025). m2AChR-positive patches and m2AChR-negative interpatches organize the functionally distinct ventral and dorsal networks, notably without obvious bias for upper and lower parts of the visual field.

      (3) The study has adopted an area partitioning scheme, which is said to be based on anatomically defined boundaries of V2 (Zhuang et al., 2017). The only anatomical borders used by Zhuang et al. (2017) are those of V1 and barrel cortex, identified by cytochrome oxidase staining. In reality, the partitioning of the visual cortex was based on field sign maps, which are reproduced from Zhuang et al., (2017) in Figure 1A. It is unclear why the maps shown in Figures 2E and 2F differ from those in Figure 1A. It is possible that this is an oversight. But maintaining consistent areal boundaries across experimental conditions that are referenced to the underlying brain structure is critical for assigning modeled projections to areas or sub-regions. This problem is evident in Figure 2F, which is presented as evidence that the modeling approach recapitulates the tracings shown in Figure 3 of Wang and Burkhalter (2007). The dissimilarities between the modeling and tracing results are striking, unlike what is stated in the legend of Figure 2F.

      (4) The Rowley and Sedigh-Sarvestani find that the partial coverage of the visual field in higher order areas shown by Zhuang et al (2017) is recreated by the model. It is important to caution that Zhuang et al's (2017) maps were derived from incomplete mappings of the visual field, which was confined to -25-35 deg of elevation. This underestimates the coverage we have found in LM and AL. Receptive field mappings show that LM covers 0-90 deg of azimuth and -30-80 elevation (Wang and Burkhalter, 2007). AL covers at least 0-90 deg of azimuth and -30-50 deg of elevation (Wang and Burkhalter, 2007; Wang et al., 2011). These are important differences. Partial coverage in LM and AL underestimates the size of these areas and may map two projection patches as inputs to subregions of a single area rather than inputs to two separate areas. Complete, or nearly complete, visual representations in LM and AL support that each is a single area. Importantly, both areas are included in a callosal-free zone (Wang and Burkhalter, 2007). The surrounding callosal connections align with the vertical meridian representation. The single map reversal is marked by a transition in m2AChR expression and cytoarchitecture (Wang et al., 2011).

      (5) The statement that the "lack of visual field overlap across areas is suggestive of a lack of hierarchical processing" is predicated on the full acceptance of the mappings by Zhuang et al (2017). Based on the evidence reviewed above, the reclassification of visual areas proposed in Figure 1C seems premature.

      (6) The existence of lateral connections is not unique to rodent cortex and has been described in primates (Felleman and Van Essen, 1991).

      (7) Why the mouse and rat extrastriate visual cortex differ from those of many other mammals is unclear. One reason may be that mammals with V2 subregions are strongly binocular.

    4. Reviewer #3 (Public review):

      Summary:

      The authors review published literature and propose that a visual cortical region in the mouse that is widely considered to contain multiple visual areas should be considered a single visual area.

      Strengths:

      The authors point out that relatively new data showing reversals of visual-field sign within known, single visual areas of some species require that a visual field sign change by itself should not be considered evidence for a border between visual areas.

      Weaknesses:

      The existing data are not consistent with the authors' proposal to consolidate multiple mouse areas into a single "V2". This is because the existing definition of a single area is that it cannot have redundant representations of the visual field. The authors ignore this requirement, as well as the data and definitions found in published manuscripts, and make an inaccurate claim that "higher order visual areas in the mouse do not have overlapping representations of the visual field". For quantification of the extent of overlap of representations between 11 mouse visual areas, see Figure 6G of Garrett et al. 2014. [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014). Topography and areal organization of mouse visual cortex. The Journal of neuroscience 34, 12587-12600. 10.1523/JNEUROSCI.1124-14.2014.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors argue that defining higher visual areas (HVAs) based on reversals of retinotopic tuning has led to an over-parcellation of secondary visual cortices. Using retinotopic models, they propose that the HVAs are more parsimoniously mapped as a single area V2, which encircles V1 and exhibits complex retinotopy. They reanalyze functional data to argue that functional differences between HVAs can be explained by retinotopic coverage. Finally, they compare the classification of mouse visual cortex to that of other species to argue that our current classification is inconsistent with those used in other model species.

      Strengths:

      This manuscript is bold and thought-provoking, and is a must-read for mouse visual neuroscientists. The authors take a strong stance on combining all HVAs, with the possible exception of area POR, into a single V2 region. Although I suspect many in the field will find that their proposal goes too far, many will agree that we need to closely examine the assumptions of previous classifications to derive a more accurate areal map. The authors' supporting analyses are clear and bolster their argument. Finally, they make a compelling argument for why the classification is not just semantic, but has ramifications for the design of experiments and analysis of data.

      Weaknesses:

      Although I enjoyed the polemic nature of the manuscript, there are a few issues that weaken their argument.

      (1) Although the authors make a compelling argument that retinotopic reversals are insufficient to define distinct regions, they are less clear about what would constitute convincing evidence for distinct visual regions. They mention that a distinct area V3 has been (correctly) defined in ferrets based on "cytoarchitecture, anatomy, and functional properties", but elsewhere argue that none of these factors are sufficient to parcellate any of the HVAs in mouse cortex, despite some striking differences between HVAs in each of these factors. It would be helpful to clearly define a set of criteria that could be used for classifying distinct regions.

      We agree the revised manuscript would benefit from a clear discussion of updated rules of area delineation in the mouse. In brief, we argue that retinotopy alone should not be used to delineate area boundaries in mice, or any other species. Although there is some evidence for functional property, architecture, and connectivity changes across mouse HVAs, area boundaries continue to be defined primarily, and sometimes solely (Garrett et al., 2014; Juavinett et al., 2018; Zhuang et al., 2017), based on retinotopy. We acknowledge that earlier work (Wang and Burkhalter, 2007; Wang et al., 2011) did consider cytoarchitecture and connectivity alongside retinotopy, but more recent work has shifted to a focus on retinotopy as indicated by the currently accepted criterion for area delineation.  

      As reviewer #2 points out, the present criteria for mouse visual area delineation can be found in the Methods section of: [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014)].

      Criterion 1: Each area must contain the same visual field sign at all locations within the area.

      Criterion 2: Each visual area cannot have a redundant representation of visual space.

      Criterion 3: Adjacent areas of the same visual field sign must have a redundant representation.

      Criterion 4: An area's location must be consistently identifiable across experiments.

      As discussed in the manuscript, recent evidence in higher order visual cortex of tree shrews and rats led us to question the universality of these criteria across species. Specifically, tree shrew V2, macaque V2, and marmoset DM, exhibit reversals in visual field-sign in what are defined as single visual areas. This suggests that criterion 1 should be updated. It also suggests that Criterion 2 and 3 should be updated since visual field sign reversals often co-occur with retinotopic redundancies, since reversing course in the direction of progression along the visual field can easily lead to coverage of visual field regions already traveled.  

      More broadly, we argue that topography is just one of several criteria that should be considered in area delineation. We understand that few visual areas in any species meet all criteria, but we emphasize that topography cannot consistently be the sole satisfied criterion – as it currently appears to be for many mouse HVAs. Inspired by a recent perspective on cortical area delineation (Petersen et al., 2024), we suggest the following rules, that will be worked into the revised version of the manuscript. Topography is a criterion, but it comes after considerations of function, architectonics and connectivity.

      (1) Function—Cortical areas differ from neighboring areas in their functional properties  

      (2) Architectonics—Cortical areas often exhibit distinctions from neighboring areas in multiple cyto- and myeloarchitectonic markers

      (3) Connectivity—Cortical areas are characterized by a specific set of connectional inputs and outputs from and to other areas

      (4) Topography—Cortical areas often exhibit a distinct topography that balances maximal coverage of the sensory field with minimal redundancy of coverage within an area.

      As we discuss in the manuscript, although there are functional, architectonic, and connectivity differences across mouse HVAs, they typically vary smoothly across multiple areas – such that neighboring areas share the same properties and there are no sharp borders. For instance, sharp borders in cytoarchitecture are generally lacking in the mouse HVAs. A notable exceptions to this is the clear and sharp change in m2AChR expression that occurs between LM and AL (Wang et al., 2011). 

      (2) On a related note, although the authors carry out impressive analyses to show that differences in functional properties between HVAs could be explained by retinotopy, they glossed over some contrary evidence that there are functional differences independent of retinotopy. For example, axon projections to different HVAs originating from a single V1 injection - presumably including neurons with similar retinotopy - exhibit distinct functional properties (Glickfeld LL et al, Nat Neuro, 2013). As another example, interdigitated M2+/M2- patches in V1 show very different HVA connectivity and response properties, again independent of V1 location/retinotopy (Meier AM et al., bioRxiv). One consideration is that the secondary regions might be considered a single V2 with distinct functional modules based on retinotopy and connectivity (e.g., V2LM, V2PM, etc).

      Thank you for the correction. We will revise the text to discuss (Glickfeld et al., 2013), as it remains some of the strongest evidence in favor of retinotopy-independent functional specialization of mouse HVAs. However, one caveat of this study is the size of the V1 injection that is the source of axons studied in the HVAs. As apparent in Figure 1B, the large injection covers nearly a quarter of V1. It is worth nothing that (Han et al., 2018) found, using single-cell reconstructions and MAPseq, that the majority of V1 neurons project to multiple nearby HVA targets. In this experiment the tracing does not suffer from the problem of spreading over V1’s retinotopic map, and suggests that, presumably retinotopically matched, locations in each area receive shared inputs from the V1 population rather than a distinct but spatially interspersed subset. In fact, the authors conclude “Interestingly, the location of the cell body within V1 was predictive of projection target for some recipient areas (Extended Data Fig. 8). Given the retinotopic organization of V1, this suggests that visual information from different parts of visual field may be preferentially distributed to  specific target areas, which is consistent with recent findings (Zhuang et al., 2017)”. Given an injection covering a large portion of the retinotopic map, and the fact that feed-forward projections from V1 to HVAs carry coarse retinotopy - it is difficult to prove that functional specializations noted in the HVA axons are retinotopyindependent. This would require measurement of receptive field location in the axonal boutons, which the authors did not perform (possibly because the SNR of calcium indicators prevented such measurements at the time).  

      Another option would be to show that adjacent neurons in V1, that project to far-apart HVAs, exhibit distinct functional properties on par with differences exhibited by neurons in very different parts of V1 due to retinotopy. In other words, the functional specificity of V1 inputs to HVAs at retinotopically identical locations is of the same order as those that might be gained by retinotopic biases. To our knowledge, such a study has not been conducted, so we have decided to measure the data in collaboration with the Allen Institute. As part of the Allen Institute’s pioneering OpenScope project, we will make careful two-photon and electrophysiology measurements of functional properties, including receptive field location, SF, and TF in different parts of the V1 retinotopic map. Pairing this data with existing Allen Institute datasets on functional properties of neurons in the HVAs will allow us to rule in, or rule-out, our hypotheses regarding retinotopy as the source of functional specialization in mouse HVAs. We will update the discussion in the revised manuscript to better reflect the need for additional evidence to support or refute our proposal.

      Meier AM et al., bioRxiv 2025 (Meier et al., 2025) was published after our submission, but we are thankful to the reviewers for guiding our attention to this timely paper. Given the recent findings on the influence of locomotion on rodent and primate visual cortex, it is very exciting to see clearly specialized circuits for processing self-generated visual motion in V1. However, it is difficult to rule out the role of retinotopy as the HVA areas (LM, AL, RL) participating in the M2+ network less responsive to self-generated visual motion exhibit a bias for the medial portion of the visual field and the HVA area (PM) involved in the M2- network responsive to self-generated visual motion exhibit a bias for the lateral (or peripheral) parts of the visual field. For instance, a peripheral bias in area PM has been shown using retrograde tracing as in Figure 6 of (Morimoto et al., 2021), single-cell anterograde tracing  as in Extended Data Figure 8 of (Han et al., 2018), and functional imaging studies (Zhuang et al., 2017). Recent findings in the marmoset also point to visual circuits in the peripheral, but not central, visual field being significantly modulated by selfgenerated movements (Rowley et al., 2024). 

      However, a visual field bias in area PM that selectively receive M2- inputs is at odds with the clear presence of modular M2+/M2- patches across the entire map of V1 (Ji et al., 2015).  One possibility supported by existing data is that neurons in M2- patches, as well as those in M2+ patches, in the central representation of V1 make fewer or significantly weaker connections with area PM compared to areas LM, AL and RL. Evidence to the contrary would support retinotopy-independent and functionally specialized inputs from V1 to HVAs.

      (3) Some of the HVAs-such as AL, AM, and LI-appear to have redundant retinotopic coverage with other HVAS, such as LM and PM. Moreover, these regions have typically been found to have higher "hierarchy scores" based on connectivity (Harris JA et al., Nature, 2019; D'Souza RD et al., Nat Comm, 2022), though unfortunately, the hierarchy levels are not completely consistent between studies. Based on existing evidence, there is a reasonable argument to be made for a hybrid classification, in which some regions (e.g., LM, P, PM, and RL) are combined into a single V2 (though see point #2 above) while other HVAs are maintained as independent visual regions, distinct from V2. I don't expect the authors to revise their viewpoint in any way, but a more nuanced discussion of alternative classifications is warranted.

      We understand that such a proposal would combine a subset of areas with matched field sign (LM, P, PM, and RL) would be less extreme and received better by the community. This would create a V2 with a smooth map without reversals or significant redundant retinotopic coverage. However, the intuition we have built from our modeling studies suggest that both these areas, and the other smaller areas with negative field sign (AL, AM, LI), are a byproduct of a complex single map of the visual field that exhibits reversals as it contorts around the triangular and tear-shaped boundaries of V1. In other words, we believe the redundant coverage and field-sign changes/reversals are a byproduct of a single secondary visual field in V2 constrained by the cortical dimensions of V1. That being said, we understand that area delineations are in part based on a consensus by the community. Therefore we will continue to discuss our proposal with community members, and we will incorporate new evidence supporting or refuting our hypothesis, before we submit our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study by Rowley and Sedigh-Sarvestani presents modeling data suggesting that map reversals in mouse lateral extrastriate visual cortex do not coincide with areal borders, but instead represent borders between subregions within a single area V2. The authors propose that such an organization explains the partial coverage in higher-order areas reported by Zhuang et al., (2017). The scheme revisits an organization proposed by Kaas et al., (1989), who interpreted the multiple projection patches traced from V1 in the squirrel lateral extrastriate cortex as subregions within a single area V2. Kaas et al's interpretation was challenged by Wang and Burkhalter (2007), who used a combination of topographic mapping of V1 connections and receptive field recordings in mice. Their findings supported a different partitioning scheme in which each projection patch mapped a specific topographic location within single areas, each containing a complete representation of the visual field. The area map of mouse visual cortex by Wang and Burkhalter (2007) has been reproduced by hundreds of studies and has been widely accepted as ground truth (CCF) (Wang et al., 2020) of the layout of rodent cortex. In the meantime, topographic mappings in marmoset and tree shew visual cortex made a strong case for map reversals in lateral extrastriate cortex, which represent borders between functionally diverse subregions within a single area V2. These findings from non-rodent species raised doubts about whether during evolution, different mammalian branches have developed diverse partitioning schemes of the cerebral cortex. Rowley and Sedigh-Sarvestani favor a single master plan in which, across evolution, all mammalian species have used a similar blueprint for subdividing the cortex.

      Strengths:

      The story illustrates the enduring strength of science in search of definitive answers.

      Weaknesses:

      To me, it remains an open question whether Rowley and Sedigh-Sarvestani have written the final chapter of the saga. A key reason for my reservation is that the areas the maps used in their model are cherry-picked. The article disregards published complementary maps, which show that the entire visual field is represented in multiple areas (i.e. LM, AL) of lateral extrastriate cortex and that the map reversal between LM and AL coincides precisely with the transition in m2AChR expression and cytoarchitecture (Wang and Burkhalter, 2007; Wang et al., 2011). Evidence from experiments in rats supports the gist of the findings in the mouse visual cortex (Coogan and Burkhalter, 1993).

      We would not claim to have written the final chapter of the saga. Our goal was to add an important piece of new evidence to the discussion of area delineations across species. We believe this new evidence supports our unification hypothesis.  We also believe that there are several missing pieces of data that could support or refute our hypothesis. We have begun a collaboration to collect some of this data.  

      (1) The selective use of published evidence, such as the complete visual field representation in higher visual areas of lateral extrastriate cortex (Wang and Burkhalter, 2007; Wang et al., 2011) makes the report more of an opinion piece than an original research article that systematically analyzes the area map of mouse visual cortex we have proposed. No direct evidence is presented for a single area V2 with functionally distinct subregions.

      This brings up a nuanced issue regarding visual field coverage. Wang & Burkhalter, 2007 Figure 6 shows the receptive field of sample neurons in area LM that cover the full range between 0 and 90 degrees of azimuth, and -40 to 80 degree of elevation – which essentially matches the visual field coverage in V1. However, we do not know whether these neurons are representative of most neurons in area LM. In other words, while these single-cell recordings along selected contours in cortex show the span of the visual field coverage, they may not be able to capture crucial information about its shape, missing regions of the visual field or potential bias. To mitigate this, visual field maps measured with electrophysiology are commonly produced by even sampling across the two dimensions of the visual area, either by moving a single electrode along a grid-pattern (e.g. (Manger et al., 2002)), or using a grid-liked multi-electrode probe (e.g. (Yu et al., 2020)). This was not carried out either in Wang & Burkhalter 2007 or Wang et al. 2011.  Even sampling of cortical space is time consuming and difficult with electrophysiology, but efficient with functional imaging. Therefore, despite the likely under-estimation of visual field coverage, imaging techniques are valuable in that they can efficiently exhibit not only the span of the visual field of a cortical region, but also its shape and bias.  

      Multiple functional imaging studies that simultaneously measure visual field coverage in V1 and HVAs report a bias in the coverage of HVAs, relative to that in V1 (Garrett et al., 2014; Juavinett et al., 2018; Zhuang et al., 2017). While functional imaging will likely underestimate receptive fields compared to electrophysiology, the consistent observation of an orderly bias for distinct parts of the visual field across the HVAs suggests that at least some of the HVAs do not have full and uniform coverage of the visual field comparable to that in V1. For instance, (Garrett et al., 2014) show that the total coverage in HVAs, when compared to V1, is typically less than half (Figure 6D) and often irregularly shaped.

      Careful measurements of single-cell receptive fields, using mesoscopic two-photon imaging across the HVAs would settle this question. As reviewer #1 points out, this is technically feasible, though no dataset of this kind exists to our knowledge.

      (2) The article misrepresents evidence by commenting that m2AChR expression is mainly associated with the lower field. This is counter to published findings showing that m2AChR spans across the entire visual field (Gamanut et al., 2018; Meier et al., 2021). The utility of markers for delineating areal boundaries is discounted, without any evidence, in disregard of evidence for distinct areal patterns in early development (Wang et al., 2011). Pointing out that markers can be distributed non-uniformly within an area is well-familiar. m2AChR is non-uniformly expressed in mouse V1, LM and LI (Ji et al., 2015; D'Souza et al., 2019; Meier et al., 2021). Recently, it has been found that the patchy organization within V1 plays a role in the organization of thalamocortical and intracortical networks (Meier et al., 2025). m2AChR-positive patches and m2AChR-negative interpatches organize the functionally distinct ventral and dorsal networks, notably without obvious bias for upper and lower parts of the visual field.

      We wrote that “Future work showed boundaries in labeling of histological markers such as SMI-32 and m2ChR labeling, but such changes mostly delineated area LM/AL (Wang et al., 2011) and seemed to be correlated with the representation of the lower visual field.” The latter statement regarding the representation of the lower visual field is directly referencing the data in Figure 1 of (Wang et al., 2011), which is titled “Figure 1: LM/AL border identified by the transition of m2AChR expression coincides with receptive field recordings from lower visual field.” Similar to the Wang et al., we were simply referring to the fact that the border of area LM/AL co-exhibits a change in m2AChR expression as well as lower-visual field representation.  

      (3) The study has adopted an area partitioning scheme, which is said to be based on anatomically defined boundaries of V2 (Zhuang et al., 2017). The only anatomical borders used by Zhuang et al. (2017) are those of V1 and barrel cortex, identified by cytochrome oxidase staining. In reality, the partitioning of the visual cortex was based on field sign maps, which are reproduced from Zhuang et al., (2017) in Figure 1A. It is unclear why the maps shown in Figures 2E and 2F differ from those in Figure 1A. It is possible that this is an oversight. But maintaining consistent areal boundaries across experimental conditions that are referenced to the underlying brain structure is critical for assigning modeled projections to areas or sub-regions. This problem is evident in Figure 2F, which is presented as evidence that the modeling approach recapitulates the tracings shown in Figure 3 of Wang and Burkhalter (2007). The dissimilarities between the modeling and tracing results are striking, unlike what is stated in the legend of Figure 2F.

      Thanks for this correction. By “anatomical boundaries of higher visual cortex”, we meant the cortical boundary between V1 and higher order visual areas on one end, and the outer edge of the envelope that defines the functional boundaries of the HVAs in cortical space (Zhuang et al., 2017). The reviewer is correct that we should have referred to these as functional boundaries. The word ‘anatomical’ was meant to refer to cortical space, rather than visual field space.

      More generally though, there is no disagreement between the partitioning of visual cortex in Figure 1 and 2. Rather, the portioning in Figure 1 is directly taken from Zhuang et al., (2017) whereas those in Figure 2 are produced by mathematical model simulation. As such, one would not expect identical areal boundaries between Figure 2 and Figure 1. What we aimed to communicate with our modeling results, is that a single area can exhibit multiple visual field reversals and retinotopic redundancies if it is constrained to fit around V1 and cover a visual field approximately matched to the visual field coverage in V1. We defined this area explicitly as a single area with a single visual field (boundaries shown in Figure 2A). So  the point of our simulation is to show that even an explicitly defined single area can appear as multiple areas if it is constrained by the shape of mouse V1, and if visual field reversals are used to indicate areal boundaries. As in most models, different initial conditions and parameters produce a complex visual field which will appear as multiple HVAs when delineated by areal boundaries. What is consistent however, is the existence of complex single visual field that appears as multiple HVAs with partially overlapping coverage.

      Similarly, we would not expect a simple model to exactly reproduce the multi-color tracer injections in Wang and Burkhalter (2007). However, we find it quite compelling that the model can produce multiple groups of multi-colored axonal projections beyond V1 that can appear as multiple areas each with their own map of the visual field using current criteria, when the model is explicitly designed to map a single visual field. We will explain the results of the model, and their implications, better in the revised manuscript.

      (4) The Rowley and Sedigh-Sarvestani find that the partial coverage of the visual field in higher order areas shown by Zhuang et al (2017) is recreated by the model. It is important to caution that Zhuang et al's (2017) maps were derived from incomplete mappings of the visual field, which was confined to -25-35 deg of elevation. This underestimates the coverage we have found in LM and AL. Receptive field mappings show that LM covers 0-90 deg of azimuth and -30-80 elevation (Wang and Burkhalter, 2007). AL covers at least 0-90 deg of azimuth and -30-50 deg of elevation (Wang and Burkhalter, 2007; Wang et al., 2011). These are important differences. Partial coverage in LM and AL underestimates the size of these areas and may map two projection patches as inputs to subregions of a single area rather than inputs to two separate areas. Complete, or nearly complete, visual representations in LM and AL support that each is a single area. Importantly, both areas are included in a callosal-free zone (Wang and Burkhalter, 2007). The surrounding callosal connections align with the vertical meridian representation. The single map reversal is marked by a transition in m2AChR expression and cytoarchitecture (Wang et al., 2011).

      This is a good point. We do not expect that expanding the coverage of V1 will change the results of the model significantly. However, for the revised manuscript, we will update V1 coverage to be accurate, repeat our simulations, and report the results.  

      (5) The statement that the "lack of visual field overlap across areas is suggestive of a lack of hierarchical processing" is predicated on the full acceptance of the mappings by Zhuang et al (2017). Based on the evidence reviewed above, the reclassification of visual areas proposed in Figure 1C seems premature.

      The reviewer is correct. In the revised manuscript, we will be careful to distinguish bias in visual field coverage across areas from presence or lack of visual field overlap.  

      (6) The existence of lateral connections is not unique to rodent cortex and has been described in primates (Felleman and Van Essen, 1991).

      (7) Why the mouse and rat extrastriate visual cortex differ from those of many other mammals is unclear. One reason may be that mammals with V2 subregions are strongly binocular.

      This is an interesting suggestion, and careful visual topography data from rabbits and other lateral eyed animals would help to evaluate it. For what it’s worth, tree shrews are lateral eyed animals with only 50 degrees of binocular visual field and also show V2 subregions.

      Reviewer #3 (Public review):

      Summary:

      The authors review published literature and propose that a visual cortical region in the mouse that is widely considered to contain multiple visual areas should be considered a single visual area.

      Strengths:

      The authors point out that relatively new data showing reversals of visual-field sign within known, single visual areas of some species require that a visual field sign change by itself should not be considered evidence for a border between visual areas.

      Weaknesses:

      The existing data are not consistent with the authors' proposal to consolidate multiple mouse areas into a single "V2". This is because the existing definition of a single area is that it cannot have redundant representations of the visual field. The authors ignore this requirement, as well as the data and definitions found in published manuscripts, and make an inaccurate claim that "higher order visual areas in the mouse do not have overlapping representations of the visual field". For quantification of the extent of overlap of representations between 11 mouse visual areas, see Figure 6G of Garrett et al. 2014. [Garrett, M.E., Nauhaus, I., Marshel, J.H., and Callaway, E.M. (2014). Topography and areal organization of mouse visual cortex. The Journal of neuroscience 34, 12587-12600. 10.1523/JNEUROSCI.1124-14.2014.

      Thank you for this correction, we admit we should have chosen our words more carefully. In the revised manuscript, we will emphasize that higher order visual areas in the mouse do have some overlap in their representations but also exhibit bias in their coverage. This is consistent with our proposal and in fact our model simulations in Figure 2E also show overlapping representations along with differential bias in coverage. However, we also note Figure 6 of Garret et al. 2014 provides several pieces of evidence in support of our proposal that higher order areas are sub-regions of a single area V2. Specifically, the visual field coverage of each area is significantly less than that in V1 (Garret et al. 2014, Figure 6D). While the imaging methods used in Garret et al. likely under-estimate receptive fields, one would assume they would similarly impact measurements of coverage in V1 and HVAs. Secondly, each area exhibits a bias towards a different part of the visual field (Figure 6C and E), that this bias is distinct for different areas but proceeds in a retinotopic manner around V1 - with adjacent areas exhibiting biases for nearby regions of the visual field (Figure 6E). Thus, the biases in the visual field coverage across HVAs appear to be related and not independent of each other. As we show in our modeling and in Figure 2, such orderly and inter-related biases can be created from a single visual field constrained to share a border with mouse V1.   

      With regards to the existing definition of a single area: we did not ignore the requirement that single areas cannot have redundant representations of the visual field. Rather, we believe that this requirement should be relaxed considering new evidence collected from other species, where multiple visual field reversals exist within the same visual area. We understand this issue is nuanced and was not made clear in the original submission.  

      In the revised manuscript, we will clarify that visual field reversals often exhibit redundant retinotopic representation on either side of the reversal. In the revised manuscript we will clarify that our argument that multiple reversals can exist within a single visual area in the mouse, is an argument that some retinotopic redundancy can exist with single visual areas. Such a re-classification would align how we define visual areas in mice with existing classification in tree shrews, ferrets, cats, and primates – all of whom have secondary visual areas with complex retinotopic maps exhibiting multiple reversals and redundant retinotopic coverage.

    1. eLife Assessment

      In their important manuscript, Gangadharan, Kober and Rice focus on how Stu2/XMAP215-family microtubule polymerases use their TOG domains to catalytically promote microtubule growth, testing whether their mechanism follows an enzyme-like kinetic model similar to that of actin polymerases. The authors integrate measurements including microtubule polymerization rates and TOG-tubulin binding kinetics to convincingly show that Stu2 follows an enzyme-like model where tight tubulin binding enables efficient polymerization, revealing a shared mechanism with actin polymerases despite their evolutionary divergence. This work will be of general interest to the cell biology and biophysics communities.

    2. Reviewer #1 (Public review):

      This study by Gangadharan and colleagues provides significant progress towards a quantitative biochemical mechanism for Stu2 polymerase activity. A key conceptual advance is the novel application of an enzyme-like model, initially developed for the actin polymerase Ena/VASP, to Stu2.

      New refined affinity measurements for a Stu2 TOG domain using Bio-layer interferometry show more than an order of magnitude higher affinity of TOG domains to tubulin compared to previously published reports.

      The findings reinforce the "concentrating reactants" or, more specifically, for TOG-domain proteins, the "tubulin-shuttling antenna" model, compared to the "polarized unfurling" model, a more speculative structural hypothesis.

      The manuscript builds upon a series of previous manuscripts that showcase the profound intellectual engagement with microtubule polymerization mechanisms by TOG-domain proteins from the Rice lab, a thought leader in microtubule polymerization for over a decade.

      Minor remarks:

      (1) A major new experimental finding of this paper is the affinity of TOG domains, which is more than an order of magnitude lower (10 nM) than previous measurements from the same lab (~200 nM). The authors attribute this change to ionic strength differences between buffer conditions, citing the lab's previous work (Ayaz et al., 2014). This argument left me contemplating what the buffer conditions are in both experiments, and I wonder if other readers would feel the same. After going down the rabbit hole, I believe the difference in ionic strength is ~2.3 fold, and at least on the back of my envelope, this works out beautifully with the measured differences in affinities. A short version of this argument may strengthen the manuscript.

      (2) I am wondering if there may be an alternative explanation to tubulin binding by TOG being the kinetically rate-limiting step for polymerase function:

      TOG + Tubulin ⇌ TOG:Tubulin (fast binding rate, high-affinity binding)<br /> TOG:Tubulin + MT_end → TOG:MT (tubulin is incorporated into MT, fast transfer rate)<br /> The binding rate is 3/s, and the transfer rate is 5/s.

      I was wondering if the following step should be considered, which involves a conformational change of tubulin (e.g., straightening) TOG:MT → TOG + MT (rate-limiting straightening and unbinding of TOG from the lattice).

      Presumably, the affinity of TOGs for straight tubulin is practically zero for the purpose of this discussion, as there is no lattice binding, which means unbinding is likely very rapid; however, straightening may be the rate-limiting factor here.

      In theory, straightening should also be rapid; however, we lack measurements of how fast or slow this step occurs within the context of a TOG domain, which presumably skews the process towards curved tubulin.

      A hypothetical Stu2, when bound to the microtubule end and with the TOG domain not disengaged from tubulin, would not permit the processivity of that molecule or the binding of a new molecule.<br /> To emphasize the importance of unbinding, when it is not efficient, as reported for the T238 mutant that results in Stu2 lattice binding (Geyer et al., 2018), the polymerase becomes inefficient.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript from the Rice lab by Gangadharan et al. investigates the polymerization mechanism of the yeast microtubule polymerase Stu2. The lab has published a number of articles demonstrating the structural basis by which the two TOG domains of Stu2 each bind free tubulin heterodimers, and has developed a tethered polymerization model by which the TOG domains drive polymerization by shuttling those tubulin subunits onto the microtubule plus end. A second model was proposed by Nithianantham et al. (eLife, 2018) based on a closed-to-open transitional state in which Stu2 unfurls and loads two longitudinally associated tubulin heterodimers onto the microtubule plus end. While the second model is not directly tested, the current work aims to further characterize/model the tethered polymerization model using a kinetic framework developed by Breitsprecher et al. for Ena/VASP actin polymerization activity, using a model that is enzymatic (EMBO J., 2011). The general architecture and function of Ena/VASP on actin polymerization versus Stu2 on microtubule polymerization is a reasonable relation and hits upon, as the authors note, potential convergent mechanistic evolution across distinct cytoskeletal networks. The model effectively treats tubulin as the substrate, and the polymerized microtubule plus end as the product. If Stu2 is "enzymatic" in this framework, the model predicts it would behave with Michaelis-Menten kinetics, that there would a Vmax, and polymerase activity would either be "affinity limited" by TOG:tubulin affinity (KD) and/or "kinetically limited" by TOG:tubulin association (Kon) and transfer of tubulin to the microtubule plus end (Kt). The authors find that the Brietsprecher model works well for Stu2 activity, and that Stu2 best aligns with a "kinetically limited" model. The work is interesting and adds to the growing elucidation of the Stu2 microtubule polymerase model. While yeast microtubule polymerases are somewhat distinct in their architecture, there is significant overlap that findings from the manuscript can be utilized to inform the mechanisms of larger, more complex microtubule polymerases such as human ch-TOG.

      Strengths:

      The manuscript invokes the enzymatic model of Breitsprecher et al. used for Ena/VASP and conducts an elegant series of (mostly established) experiments to determine whether Stu2 microtubule polymerase activity aligns with the model, which they conclude does align, supported by the data/results obtained.

      Weaknesses:

      The authors used biolayer interferometry to measure TOG:tubulin affinity. The affinities obtained were significantly higher than the lab obtained in an earlier publication using analytical ultracentrifugation. While differences in buffer and salt conditions may underlie these differences, additional runs using comparable buffer systems, or the use of a third independent assay to measure affinities, would have added rigor.

      The discussion could be expanded to better compare and contrast the results with both existing polymerase models introduced in the introduction, as well as expanded to look at reversible enzymatic activity (microtubule depolymerization at low to zero tubulin concentrations) and microtubule plus versus minus end activity.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Gangadharan and colleagues seeks to establish a quantitative biochemical model for the microtubule polymerase activity of Stu2. Stu2 is the budding yeast member of the XMAP215 protein family, which is broadly conserved across eukaryotes. XMAP215 proteins play a wide variety of important roles in cells, and these are attributed to effects on microtubule dynamics. Many studies over the last ~20 years have shown that XMA215 proteins selectively associate with microtubule ends, where they increase rates of microtubule assembly and disassembly. More recently, structural biology and biochemical studies by the authors and other groups have shown that the multiple TOG domains on XMAP215 proteins are tubulin-binding domains that selectively bind to curved tubulin, which is present in solution and at microtubule ends, but not to straight tubulin which is present in the walls of the microtubule lattice. This has led to the general model that XMAP215 proteins promote polymerization by delivering soluble tubulin to the growing plus end, and two distinct models have been proposed to explain the mechanism. The 'concentrating reactants' model proposed previously by the authors suggests that TOG domains grab hold of tubulin in solution and concentrate at the microtubule end. The 'polarized unfurling' model proposed by the Al Bassam lab suggests that XMAP215 delivers multiple tubulins to the end, using a step-wise mechanism involving different roles for each TOG domain. The current study seeks to improve our understanding of the mechanism by developing a quantitative model to explain the binding and release of tubulins, the number of Stu2 molecules at the end, and the overall rate of tubulin addition. The authors accomplish this goal using new experimental data. The final model fills in new details of the mechanism. The authors draw a comparison between Stu2 and the actin polymerase, which bears similarity to Ena/VASP, and suggest a convergent strategy for cytoskeletal polymerases.

      Strengths:

      This is a focused and clearly written study that incorporates prior knowledge of XMAP215 and draws inspiration from the actin field. The data are clear and convincing, and the study accomplishes its goal of generating a new, quantitative model for Stu2. The model will be important for microtubule researchers to predict and test key points for altering XMAP215 activity across different organisms and potentially for different tubulin substrates. The comparison to Ena/VASP may also inspire similar comparisons across other microtubule and actin regulators, which could lead to new insights across the cytoskeletal fields.

      Weaknesses:

      The study is without major weaknesses, but there are several minor weaknesses worth noting. One is that the final model provides new details regarding the Stu2 mechanism, but does not provide a major new advance in our understanding of how the polymerase works. For example, the discussion does not clearly argue for whether the new results and model rule out either of the prior models. This appears consistent with the 'concentrating reactants' model, but does it clearly rule out the 'polarized unfurling' model? A second minor weakness is that the comparison to Ena/VASP is not developed at a deep level based on the final model. I found these ideas exciting and want more critical consideration here, but perhaps it is better suited for a commentary piece to follow.

    1. eLife Assessment

      This valuable study provides new insights into the movement of ions through the bacterial pump KdpFABC, which regulates intracellular potassium concentration, by solving a 2.1 Å cryo-EM structure of the nanodisc-embedded active wild-type protein, and carrying out mutagenesis and activity assays. Although the structural data and analysis are solid, additional information about other structural classes identified in the EM data, as well as a discussion of relevant work done by others, would further strengthen these findings. The description of the activity assays is currently incomplete because more information is required to rigorously assess these experiments. This work will be of interest to the membrane transporter and channel communities and to microbiologists interested in osmoregulation and potassium homeostasis.

    2. Reviewer #1 (Public review):

      Summary:

      This study on potassium ion transport by the protein complex KdpFABC from E. coli reveals a 2.1 Å cryo-EM structure of the nanodisc-embedded transporter under turnover conditions. The results confirm that K+ ions pass through a previously identified tunnel that connects the channel-like subunit with the P-type ATPase-type subunit.

      Strengths:

      The excellent resolution of the structure and the thorough analysis of mutants using ATPase and ion transport measurements help to strengthen new and previous interpretations. The evidence supporting the conclusions is solid, including biochemical assays and analysis of mutants. The work will be of interest to the membrane transporter and channel communities and to microbiologists interested in osmoregulation and potassium homeostasis.

      Weaknesses:

      There is insufficient credit and citation of previous work.

    3. Reviewer #2 (Public review):

      Summary:

      The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm.

      Strengths:

      The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the rate-limiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying well-defined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology.

      Weaknesses:

      While the results are overall compelling, several aspects of the work raised questions. First, the authors determined the structure of the pump in nanodiscs under turnover conditions and observed several structural classes, including E1-P, which is detailed in the paper. Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes?

      The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results. Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable?

      Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess.

      The authors propose that the tunnel connecting the subunits is filled with water and lacks potassium ions. This is an important mechanistic point that has been debated in the field. It would be interesting to calculate the volume of the tunnel and estimate the number of ions that might be expected in it, given their concentration in bulk. It may also be helpful to provide additional discussion on whether some of the observed densities correspond to bound ions with low occupancy.

    4. Reviewer #3 (Public review):

      Summary:

      By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wild-type protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway.

      Strengths:

      Although this structure is not so different from previous structures, its high resolution (2.1 Å) is impressive and allows the resolution of many new densities in the potassium transport pathway. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal.

      Weaknesses:

      The structures are supported by solid membrane electrophysiology. These data exhibit some weaknesses, including a lack of information to assess the rigor and reproducibility (i.e., the number of replicates, the number of sensors used, controls to assess proteoliposome reconstitution efficiency, and the stability of proteoliposome absorption to the sensor).

    1. eLife Assessment

      This important study reports that two distinct waves of ovarian follicles contribute to oocyte production in mice. The paper provides large amounts of data that will benefit future studies, although the methods and analysis are considered incomplete at present. Justification for the criteria of wave 1 follicles would benefit from further explanation and discussion. This work will be of interest to ovarian biologists and physicians working on female infertility.

    2. Reviewer #1 (Public review):

      Multiple waves of follicles have been proven to exist in multiple species, and different waves of follicles contribute differently to oogenesis and fertility. This work characterizes the wave 1 follicles in mouse comprehensively and compares different waves of follicles regarding their cellular and molecular features. Elegant mouse genetics methods are applied to provide lineage tracing of the wave 1 folliculogenesis, together with sophisticated microscopic imaging and analyses. Single-cell RNA-seq is further applied to profile the molecular features of cells in mouse ovaries from week 2 until week 6. While extensive details about the wave 1 follicles, especially the atresia process, are provided, the authors also identified another group of follicles located in the medullary-cortical boundary, which could also be labeled by the FoxL2-mediated lineage tracing method. The "boundary" or "wave 1.5" follicles are proposed by the authors to be the earliest wave 2 follicles, which contribute to the early fertility of puberty mice, instead of the wave 1 follicles, which undergo atresia with very few oocytes generated. The wave 1 follicle atresia, which degrades most oocytes, on the other hand, expands the number of theca cells and generates the interstitial gland cells in the medulla, where the wave 1 follicles are located. These gland cells likely contribute to the generation of androgen and estrogen, which shape oogenesis and animal development. By comparing scRNA-seq data from cells collected from week 2 until week 6 ovaries, the author profiled the changes in numbers of different cells and identified key genes that differ between wave 1 and wave 2 follicles, which could potentially be another driver of different waves of folliculogenesis. In summary, the authors provide a high amount of new results with good quality that illustrate the molecular and cellular features of different waves of mouse follicles, which could be further reused by other researchers in related fields. The findings related to the boundary follicles could potentially bring many new findings related to oogenesis.

      This paper is overall well-written with solid and intriguing conclusions that are well supported. The reviewer only has some minor comments for the authors' consideration that could potentially help with the readability of the paper.

      (1) The authors identify the wave 1.5 follicles at the medullary-cortex boundary, which begin to develop shortly after 2 weeks. Since the authors already collected scRNA-seq data from week 2 until week 6, could any special gene expression patterns be identified that make wave 1.5 follicle cells different from wave 1 and wave 2?

      (2) Are Figures 1C and 1E Z projections from multiple IF slices? If so, please provide representative IF slice(s) from Figures 1C and 1E and clearly label wave 1 and wave 2 follicles to better illustrate how the wave 1 follicles are clarified and quantified.

      (3) For Figure 3D, please also provide an image showing the whole ovary section, like in Figures 3A and 3C, to better illustrate the localization and abundance of different cells.

      (4) In Figure 4H, expressions of HSD3B1 and PLIN1 seem to be detected in almost all medulla cells. Does this mean all medulla cells gain gland cell features? Or there is only a subset of the medulla cells that are actively expressing these 2 proteins. Please provide image(s) with higher magnification to show more clearly how the expression of these 2 proteins differs among different cells.

      (5) Figure 5: The authors discussed cell number changes for different types of cells from week 2 to week 6. A table, or some plots, visualizing numbers of different cell types, instead of just providing original clusters in Dataset S6, at different time points, would make the changes easier to observe.

      (6) Figure S7: It would be more helpful to directly show the number of wave 1 follicles.

      (7) Did the fluorescence cryosection staining (Line 587 - 595) use the same buffers as in the whole-mount staining (Line 575 - 586)? Please clarify.

      (8) In Line 618, what tissue samples were collected? Please point out clearly.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores an important question concerning the developmental trajectory of wave 1 ovarian follicles, leveraging valuable tools such as lineage tracing and single-cell RNA sequencing. These approaches position the authors well to dissect early follicle dynamics. The study would benefit from more in-depth analysis, including quantification using the lineage-traced ovaries, and comparison of wave 1 and 2 follicular cells per stage within the single cell dataset.

      Strengths:

      This study aims to address an important question regarding the developmental trajectories of wave 1 ovarian follicles and how they differ from wave 2 follicles that contribute to long-term fertility. This is an important topic, as many studies on ovarian follicle development rely on samples collected at perinatal timepoints in the mouse, which primarily represent wave 1 follicles, to infer later fertility. The research group has the tools and expertise necessary to tackle these questions.

      Weaknesses:

      Wave 1 follicles are quantified based on the criteria of oocytes larger than 20 µm located within the medullary region, using whole-mount staining. However, the boundary between the medulla and cortex appears somewhat arbitrary. Quantification using FOXL2-lineage-traced ovaries provides a more reliable method for identifying wave 1 follicles. As the developmental trajectory of wave 1 follicles has been well described in Zhang et al. 2013, it would be valuable to provide a more detailed quantification of both labeled and unlabeled follicles by specific follicle stages. In fact, in Zhang et al. 2013, the authors demonstrated that lineage-labeled primordial follicles can be found at the cortex-medulla boundary, suggesting that the observation of labeled "border follicles" is not unexpected. Quantification by follicle stage would provide greater insight into the timing and development of these follicles.

      Similarly, the analysis of wave 1 follicle loss should be performed on lineage-traced ovaries using cell death markers to demonstrate the loss of oocytes and granulosa cells, while confirming the preservation of theca and interstitial cells. In particular, granulosa cell loss should be assessed directly with cell death markers in lineage-traced ovaries, rather than from the loss of tamoxifen-labeled cells, as labeling efficiency varies between follicles (Figure 2G).

      Single-cell RNA sequencing presents a valuable dataset capturing the development of first-wave follicles. The use of a 40µm cell strainer during cell collection for the 10x platform may explain the exclusion of larger oocytes. However, it is still surprising that no oocytes were captured at all. The central question, how wave 1 follicular cells differ from wave 2 cells, should be investigated in more depth, with results validated on FOXL2-lineage-traced ovaries (i.e., Wnt4 staining in wave 1 antral follicles versus wave 2 using lineage-traced ovaries). This analysis should span all stages of follicle development. It also appears to be a missed opportunity that the single-cell sequencing analysis was not performed on lineage-traced ovaries, which would have enabled more definitive identification of wave 1-derived cells.

      Finally, this study does not directly assess fertility outcomes and should therefore refrain from drawing conclusions about the fertility potential of wave 1 follicles.

    1. eLife Assessment

      This study presents an important computational framework, FLiSimBA (Fluorescence Lifetime Simulation for Biological Applications), for modeling experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM). FLiSimBA is readily available in MATLAB and Python, enables users to simulate effects of noise and varying sensor expression levels, and provides practical guidance for both lifetime imaging experiments and biosensor development. The analyses are robust, and the evidence supporting the tool's utility in distinguishing between multiple lifetime signals is compelling, indicating strong potential for multiplexed dynamic imaging. However, users should also consider that the tool's effectiveness depends on the suitability of a two-component discrete exponential model.

    2. Reviewer #1 (Public review):

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal to noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved.

      A major strength of the study is the effort to present results in a clear and understandable way given that most researcher do not think about these factors on a day-to-day basis. Additionally, the model code is readily available in Matlab and Python, which should allow for open access to a larger community.

      Overall, the authors' achieved their aims of demonstrating how common factors (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determine the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties.

    3. Reviewer #3 (Public review):

      Summary:

      This study presents a useful computational tool, termed FLiSimBA. The MATLAB-based FLiSimBA simulations allow users to examine the effects of various noise factors (such as autofluorescence, afterpulse of the photomultiplier tube detector, and other background signals) and varying sensor expression levels. Under the conditions explored, the simulations unveiled how these factors affect the observed lifetime measurements, thereby providing useful guidelines for experimental designs. Further simulations with two distinct fluorophores uncovered conditions in which two different lifetime signals could be distinguished, indicating multiplexed dynamic imaging may be possible.

      Strengths:

      The simulations and their analyses were done systematically and rigorously. FliSimba can be useful for guiding and validating fluorescence lifetime imaging studies. The simulations could define useful parameters such as the minimum number of photons required to detect a specific lifetime, how sensor protein expression level may affect the lifetime data, the conditions under which the lifetime would be insensitive to the sensor expression levels, and whether certain multiplexing could be feasible.

      Weaknesses:

      The analyses have relied on a key premise that the fluorescence lifetime in the system can be described as a two-component discrete exponential decay. This means that the experimenter should ensure that this is the right model for their fluorophores a priori.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal-to-noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on the analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved. 

      A major strength of the study is the effort to present results in a clear and understandable way given that most researchers do not think about these factors on a day-to-day basis. The model code is available and written in Matlab, which should make it readily accessible, although a version in other common languages such as Python might help with dissemination in the community. One potential weakness is that the model uses parameters that are determined in a

      specific way by the authors, and it is not clear how vastly other biological tissue and microscope setups may differ from the values used by the authors. 

      Overall, the authors achieved their aims of demonstrating how common factors

      (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determined the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties. 

      We appreciate the comments and helpful suggestions. We now also include FLiSimBA simulation code in Python in addition to Matlab to make it more accessible to the community.

      One advantage of FLiSimBA is that the simulation package is flexible and adaptable, allowing users to input parameters based on the specific sensors, hardware, and autofluorescence measurements for their biological and optical systems. We used parameters based on a FRETbased sensor, measured autofluorescence from mouse tissue, and measured dark count/after pulse of our specific GaAsP PMT in this manuscript as examples. In Discussion and Materials and methods, we now emphasize this advantage and further clarify how these parameters can be adapted to diverse tissues, imaging systems, and sensors based on individual experiments. We further explain that these input parameters will not affect the conclusions of our study, but the specific input parameters would alter the quantitative thresholds.

      Reviewer #2 (Public review): 

      Summary: 

      By using simulations of common signal artefacts introduced by acquisition hardware and the sample itself, the authors are able to demonstrate methods to estimate their influence on the estimated lifetime, and lifetime proportions, when using signal fitting for fluorescence lifetime imaging. 

      Strengths: 

      They consider a range of effects such as after-pulsing and background signal, and present a range of situations that are relevant to many experimental situations. 

      Weaknesses: 

      A weakness is that they do not present enough detail on the fitting method that they used to estimate lifetimes and proportions. The method used will influence the results significantly. They seem to only use the "empirical lifetime" which is not a state of the art algorithm. The method used to deconvolve two multiplexed exponential signals is not given. 

      We appreciate the comments and constructive feedback. Our revision based on the reviewer’s suggestions has made our manuscript clearer and more user friendly. We originally described the detail of the fitting methods in Materials and methods. Given the importance of these methodological details for evaluating the conclusions of this study, we have moved the description of the fitting method from Materials and methods to Results. In addition, we provide further clarification and more details of the rationale of using these different methods of lifetime estimates in Discussion to aid users in choosing the best metric for evaluating fluorescence lifetime data.

      More specifically, we modified our writing to highlight the following.

      (1) In Results, we describe that lifetime histograms were fitted to Equation 3 with the GaussNewton nonlinear least-square fitting algorithm and the fitted P<sub1</sub> was used as lifetime estimation.

      (2) In Results, we clarify that our simulation of multiplexed imaging was modeled with two sensors, each displaying a single exponential decay, but the two sensors have different decay constants. We also describe that Equation 3 with the Gauss-Newton nonlinear least-square fitting algorithm was used to deconvolve the two multiplexed exponential signals (Fig. 8)

      Reviewer #3 (Public review): 

      Summary: 

      This study presents a useful computational tool, termed FLiSimBA. The MATLAB-based FLiSimBA simulations allow users to examine the effects of various noise factors (such as autofluorescence, afterpulse of the photomultiplier tube detector, and other background signals) and varying sensor expression levels. Under the conditions explored, the simulations unveiled how these factors affect the observed lifetime measurements, thereby providing useful guidelines for experimental designs. Further simulations with two distinct fluorophores uncovered conditions in which two different lifetime signals could be distinguished, indicating multiplexed dynamic imaging may be possible. 

      Strengths: 

      The simulations and their analyses were done systematically and rigorously. FliSimba can be useful for guiding and validating fluorescence lifetime imaging studies. The simulations could define useful parameters such as the minimum number of photons required to detect a specific lifetime, how sensor protein expression level may affect the lifetime data, the conditions under which the lifetime would be insensitive to the sensor expression levels, and whether certain multiplexing could be feasible. 

      Weaknesses: 

      The analyses have relied on a key premise that the fluorescence lifetime in the system can be described as two-component discrete exponential decay. This means that the experimenter should ensure that this is the right model for their fluorophores a priori and should keep in mind that the fluorescence lifetime of the fluorophores may not be perfectly described by a twocomponent discrete exponential (for which alternative algorithms have been implemented: e.g., Steinbach, P. J. Anal. Biochem. 427, 102-105, (2012)). In this regard, I also couldn't find how good the fits were for each simulation and experimental data to the given fitting equation (Equation 2, for example, for Figure 2C data). 

      We thank the reviewer for the constructive feedback. We agree that the FLiSimBA users should ensure that the right decay equations are used to describe the fluorescent sensors. In this study, we used a FRET-based PKA sensor FLIM-AKAR to provide proof-of-principle demonstration of the capability of FLiSimBA. The donor fluorophore of FLIM-AKAR, truncated monomeric enhanced GFP, displays a single exponential decay. FLIM-AKAR, a FRET-based sensor, displays a double exponential decay. The time constants of the two exponential components were determined and reported previously (Chen, et al, Neuron (2017)).  Thus, a double exponential decay equation with known τ<sub>1</sub> and τ<sub>2</sub> was used for both simulation and fitting. The goodness of fit is now provided in Supplementary Fig. 1 for both simulated and experimental data. In addition to referencing our prior study characterizing the double exponential decay model of FLIM-AKAR in Materials and methods, we have emphasized in Discussion the versality of FLiSimBA to adapt to different sensors, tissues, and analysis methods, and the importance of using the right mathematical models to describe the fluorescence decay of specific sensors. 

      Also, in Figure 2C, the 'sensor only' simulation without accounting for autofluorescence (as seen in Sensor + autoF) or afterpulse and background fluorescence (as seen in Final simulated data) seems to recapitulate the experimental data reasonably well. So, at least in this particular case where experimental data is limited by its broad spread with limited data points, being able to incorporate the additional noise factors into the simulation tool didn't seem to matter too much.  

      In the original Fig 2C, the sensor fluorescence was much higher than the contributions from autofluorescence, afterpulse, and background signals, resulting in minimal effects of these other factors, as the reviewer noted. This original figure was based on photon counts from single neurons expressing FLIM-AKAR. For the rest of the manuscript, photon counts were based on whole fields of view (FOV). Since the FOV includes cells that do not express fluorescent sensors, the influence of autofluorescence, dark currents, and background is much more pronounced, as shown in Fig. 2B. 

      Both approaches – using photon counts from the whole FOV or from individual neurons – have their justifications. Photon counts from the whole FOV simulate data from fluorescence lifetime photometry (FLiP), whereas photon counts from individual neurons simulate data from fluorescence lifetime imaging microscopy (FLIM). However, the choice of approach does not affect the conclusions of the manuscript, as a range of photon count values are simulated. To maintain consistency throughout the manuscript, we have revised the photon counts in this figure (now Supplementary Fig. 1C) to match those from the whole FOV.

      Additionally, we have made some modifications in our analyses of Supplementary Fig. 1C and Fig. 2B, detailed in the “FLIM analysis” section of Materials and methods. For instance, to minimize system artifact interference at the histogram edges, we now use a narrower time range (1.8 to 11.5 ns) for fitting and empirical lifetime calculation.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors report how autofluorescence was measured from "imaged brain slices from mice at postnatal 15 to 19 days of age without sensor expression." However, it remains unclear how many acute slices and animals were used (for example, were all 15um x 15um FOV from a single slice) and if mouse age affects autofluorescence quantification. Furthermore, would in vivo measurements have different autofluorescence conditions given that blood flow would be active? It would help if the authors more clearly explained how reliable their autofluorescence measurement is by clarifying how they obtained it, whether this would vary across brain areas, and whether in vitro vs in vivo conditions would affect autofluorescence. 

      We have added description in Materials and methods that for autofluorescence ‘Fluorescence decay histograms from 19 images of two brain slices from a single mouse were averaged.’ We have added in Discussion that users should carefully ‘measure autofluorescence that matches the age, brain region, and data collection conditions (e.g., ex vivo or in vivo) of their tissue…’, and emphasize that FLiSimBA offers customization of inputs, and it is important for users to adapt the inputs such as autofluorescence to their experimental conditions. We also clarify in Discussion that the change of input parameters such as autofluorescence across age and brain region would not affect the general insights from this study, but will affect quantitative values.

      (2) Does sensor expression level issues arise more with in-utero electroporation compared to AAV-based delivery of biosensors? A brief comment on this in the discussion may help as most users in the field today may be using AAV strategies to deliver biosensors.

      In our experience, in-utero electroporation results in higher sensor expression than AAV-based delivery, and so pose less concern for expression-level dependence. However, both delivery methods can result in expression level dependence, especially with a sensor that is not bright. We have added in Discussion ‘For a sensor with medium brightness delivered via in utero electroporation, adeno-associated virus, or as a knock-in gene, the brightness may not always fall within the expression level-independent regime.’

      (3) Figure 1. Should the x-axis on the top figures be "Time (ns)" instead of "Lifetime (ns)"?

      Similarly in Figure 8A&B, wouldn't it make more sense to have the x-axis be Time not Lifetime?

      The x-axis labels in Fig. 1 and Fig. 8A-8B have been changed to ‘Time (ns)’.   

      (4) Figure 2b: why is the empirical lifetime close to 3.5ns? Shouldn't it be somewhere between

      2.14 and 0.69? 

      In our empirical lifetime calculation, we did not set the peak channel to have a time of 0.0488 ns (i.e. the laser cycle 12.5 ns divided by 256 time channels). Rather, we set the first time channel within a defined calculation range (i.e. 1.8 ns in Supplementary Fig. 1B) to have a time of 0.0488 ns (i.e.). Thus, the empirical lifetime exceeds 2.14 ns and depends on the time range of the histogram used for calculation. 

      For Fig. 2B and Supplementary Fig. 1C, we have now adjusted the range to 1.8-11.5 ns to eliminate FLIM artifacts at the histogram edges in our experimental data, resulting in an empirical lifetime around 2.255 ns. In contrast, the range for calculating the empirical lifetime of simulated data in the rest of the study (e.g. Fig. 4D) is 0.489-11.5 ns, yielding a larger lifetime of ~3.35 ns. 

      We have clarified these details and our rationale in Materials and methods.

      (5) Figure 2b: how come the afterpulse+background contributes more to the empirical lifetime than the autofluorescence (shorter lifetime). This was unclear in the results text why autofluorescence photons did not alter empirical lifetime as much as did the afterpulse/background.

      With a histogram range from 1.8 ns to 11.5 ns used in Fig. 2B, the empirical lifetime for FLIM-AKAR sensor fluorescence, autofluorescence, and background/afterpulse are: 2-2.3 ns, around 1.69 ns, and around 4.90 ns. The larger difference of background/afterpulse from FLIM-AKAR sensor fluorescence leads to larger influence of afterpulse+background than autofluorescence. We have added an explanation of this in Results.

      (6) One overall suggestion for an improvement that could help active users of lifetime biosensors understand the consequences would be to show either a real or simulated example of a "typical experiment" conducted using FLIM-AKAR and how an incorrect interpretation could be drawn as a consequence of these artifacts. For example, do these confounds affect experiments involving comparisons across animals more than within-subject experiments such as washing a drug onto the brain slice, and the baseline period is used to normalize the change in signal? I think this type of direct discussion will help biosensor users more deeply grasp how these factors play out in common experiments being conducted.

      We have added the following in Discussion, ‘…While this issue is less problematic when the same sample is compared over short periods (e.g. minutes), It can lead to misinterpretation when fluorescence lifetime is compared across prolonged periods or between samples when comparison is made across chronic time periods or between samples with different sensor expression levels. For example, apparent changes in fluorescence lifetime observed over days, across cell types, or subcellular compartments may actually reflect variations in sensor expression levels rather than true differences in biological signals (Fig. 6), Therefore, considering biologically realistic factors in FLiSimBA is essential, as it qualitatively impacts the conclusions.’

      Reviewer #2 (Recommendations for the authors): 

      The paper would be improved with more detail on the fitting methods, and the use of state-of-theart methods. Consult for example the introduction of this paper where many methods are listed: https://www.mdpi.com/1424-8220/22/19/7293

      We have moved the description of the Gauss-Newton nonlinear least-square fitting algorithm from Materials and methods to Results to enhance clarity. We appreciate the reviewer’s suggestion to combine FLiSimBA with various analysis methods. However, the primary focus of our manuscript is to call for attention of how specific contributing factors in biological experiments influence FLIM data, and to provide a tool that rigorously considers these factors to simulate FLIM data, which can then be used for fitting. Therefore, we did not expand the scope of our manuscript. Instead, we have added in the Discussion that ‘‘FLiSimBA can be used to test multiple fitting methods and lifetime metrics as an exciting future direction for identifying the best analysis method for specific experimental conditions’, citing relevant references.

      I would also improve the content of the GitHub repository as it is very hard to identify to source code used for simulation and fitting. 

      We have reorganized and relabeled our GitHub repository and now have three folders labeled as ‘Simulation_inMatlab’, ‘DataAnalysis_inMatlab’, and ‘SimulationAnalysis_inPython’. We also updated the clarification of the contents of each folder in the README file.

      Reviewer #3 (Recommendations for the authors): 

      (1) P. 10 "For example, to detect a P1 change of 0.006 or a lifetime change of 5 ps with one sample measurement in each comparison group, approximately 300,000 photons are needed." If I am reading the graphs in Figures 3B and C, this sentence is talking about the red line. However, the intersection of 0.006 in the MDD of P1 in 3B and red is not 3E5 photons. And the intersection of 0.005 ns and red in 3C is not 3E5 photons either. Are you sure you are talking about n=1? Maybe the values are correct for the blue curve with n=5.

      Thank you for catching our error. We have corrected the text to ‘with five sample measurements’.

      (2) Figure 2 (B) legend: It would be helpful to specify what is being compared in the legend. For example, consider revising "* p < 0.05 vs sensor only; n.s. not significant vs sensor + autoF; # p < 0.05 vs sensor + autoF. Two-way ANOVA with Šídák's multiple comparisons test" to "* p <0.05 for sensor + auto F (cyan) vs sensor only; n.s. not significant for final simulated data (purple) vs sensor + autoF; # p < 0.05 for final simulated data (purple) vs sensor + autoF. Twoway ANOVA with Šídák's multiple comparisons test".

      We’ve made the change and thanks for the suggestion to make it clearer.

      (3) Figure 2 (c) Can you please show the same Two-way ANOVA test values for Experimental vs. Sensor only and for Experimental vs. Sensor + autoF? Currently, the value (n.s.) is marked only for Experimental vs. Final simulation. Given that the experimental data are sparse (compared to the simulations), it seems likely that there may be no significant difference among the 3 different simulations regarding how well they match the experimental data. Also, can you specify the P1 and P2 of the experimental data  used to generate the simulated data on this panel? Also, what is the reason why P1=0.5 was used for panels A and B, instead of the value matching the experimental value?

      As the reviewer suggested, we have included statistical tests in the figure (now Supplementary Fig. 1C). Please see our response to the Public Review of Reviewer 3’s comments as well as our changes in Materials and Methods on other changes and their rationale for this figure. We have now specified the P<sub>1</sub> value of the experimental data used to generate the simulated data on this panel both in Figure Legends and Materials and Methods. Based on the suggestion, we have now used the same P<sub>1</sub> value in Fig. 2B.

    1. eLife Assessment

      This study presents important findings on increased ground beetle diversity in strip cropping compared with crop monocultures. Solid methods are used to analyze data from multiple sites with heterogeneous systems of mixed crops, allowing broad conclusions, albeit at the expense of lacking taxonomic specificity. The work will be of interest to all those applying plant diversity treatments to improve the diversity of associated animals in agricultural fields.

    2. Joint Public Review:

      Summary:

      In this paper the authors examined the effects of strip cropping, a relatively new agricultural technique of alternating crops in small strips of several meters wide, on ground beetle diversity. The results show an increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, unbalanced and taxonomically unspecific yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch. Moreover, after the first round of reviews, the authors have done a great job at rewriting the paper to make it less overstated, more relevant to the data at hand and more solid in the findings. Many of the weaknesses noted in the first review have been dealt with. The overall structure of the paper is good, with a clear introduction, hypotheses, results section and discussion.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #3 (Public Review)

      Summary:

      In this paper the authors examined the effects of strip cropping, a relatively new agricultural technique of alternating crops in small strips of several meters wide, on ground beetle diversity. The results show an increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, unbalanced and taxonomically unspecific yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch. Moreover, after the first round of reviews, the authors have done a great job at rewriting the paper to make it less overstated, more relevant to the data at hand and more solid in the findings. Many of the weaknesses noted in the first review have been dealt with. The overall structure of the paper is good, with a clear introduction, hypotheses, results section and discussion.

      We are grateful for this positive feedback. We are glad that our extensive revision after extensive review from three reviewers has paid off in addressing earlier weakness of our manuscript.

      Weaknesses:

      The weaknesses that remain are mainly due to a difficult dataset and choices that could have stressed certain aspects more, like the relationship between strip cropping and intercropping. The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similar to intercropping, a technique which has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness.

      Unfortunately, the authors do not go into this in the introduction or otherwise and simply state that they consider strip cropping a form of intercropping.

      We agree with the reviewer that a mechanistic understanding on how intercropping and strip cropping differ would be very interesting. However, we also feel that this topic is somewhat beyond the scope of the current manuscript. We are already planning work to elucidate mechanisms that may explain the pest and suppressive effects of strip cropping.

      I also do not like the exclusive focus on percentages, as these are dimensionless. I think more could have been done to show underlying structure in the data, even after rarefaction.

      While we generally agree with this point raised by the reviewer, for our heterogeneous dataset it was difficult to come up with meaningful units with dimensions. Therefore, we believe that percentages are the most suitable approach to present readers a fair comparison of the treatments.

      A further weakness is a limited embedding into the larger scientific discourses other than providing references. But this may be a matter of style and/or taste

      We believe our manuscript to be well-embedded within the relevant scientific discourse, but as indicated by reviewer 3 this might indeed be a matter of style/taste. Without exact examples it is difficult for us to judge this point.

      Reviewer #3 (Recommendations for the authors): 

      Suggestion for title: "Strip cropping shows promising preliminary increases in ground beetle community diversity compared to monocultures"

      We agree that the title could indeed be nuanced. We incorporated the suggested title, except for the word “preliminary”, as we felt that this is slightly misplaced for a 4-year study conducted at 4 locations.

      line 26: the word previous may be confusing to readers, as it suggests previous research on beetles or insects. I think it would be better to use for instance "related" or "productivity focused research"

      We agree that this wording might be confusing, and changed it to “other studies showed”.

      Line 84-85: this is vague. can you make explicit what you are trying to answer here?

      We made “biodiversity metric changes” more explicit, and changed the sentence accordingly.

      Line 88-89: I think this would fit better with the first question in line 83-84, so I suggest placing it upwards. Also, I think you mean abundant instead of common. Common suggests commonness in the entire population. Abundant suggests found often in this study. While these definitions may very much overlap, they are distinctly different.

      We have moved this sentence up and changed “common” to “abundant”. To make the result section more in line with this section, we also moved the section on the relationship between crop configuration and abundant genera up.  

      Line 146: defining rareness of species should be in the methods section. Also "following" would be better than "according"

      We now added a sentence on how we examine habitat preferences and rarity in the methods section (line 316-317). We also changed “according to” to “following”.

      Line 291: it is called being "flush" with the soil surface. This expression is not much used by non-native speakers, but is regularly encountered in studies on pitfalls, so the authors could decide to change the sentence using the proper English vernacular.

      Suggestion incorporated.

      Line 322-327, this method could do with a reference

      This method is a relatively standard calculation to calculate relative changes and to center variation around zero. Nevertheless, we added a reference to a paper that used the same method.

      Line: 333-335. I would still like to see a reference for this method.

      This methodology has not been described in literature to the best of our knowledge. As we compared two crops within strip cropping with their respective monoculture references, we compare one strip cropping field with two monocultural fields. Here we took a conservative approach by comparing the strip crop field with the monoculture with the highest richness and activity density, to see if strip cropped fields outperformed monocultures with diverse ground beetle communities.

      Line 364-366. references?

      We have added references for these R packages.

    1. eLife Assessment

      This study shows, for the first time, the structure and snapshots of the dynamics of the full-length soluble Angiotensin-I converting enzyme dimer. The combination of structural and computational analyses provides compelling evidence that reveals the conformational dynamics of the complex and key regions mediating the conformational change. This fundamental work illustrates how conformational heterogeneity can be used to gain insights into protein function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report four cryoEM structures (2.99 to 3.65 Å resolution) of the 180 kDa, full-length, glycosylated, soluble Angiotensin-I converting enzyme (sACE) dimer, with two homologous catalytic domains at the N- and C-terminal ends (ACE-N and ACE-C). ACE is a protease capable of effectively degrading Aβ. The four structures are C2 pseudo-symmetric homodimers and provide insight into sACE dimerization. These structures were obtained using discrete classification in cryoSPARC and show different combinations of open, intermediate, and closed states of the catalytic domains, resulting in varying degrees of solvent accessibility to the active sites.

      To deepen the understanding of the gradient of heterogeneity (from closed to open states) observed with discrete classification, the authors performed all-atom MD simulations and continuous conformational analysis of cryo-EM data using cryoSPARC 3DVA, cryoDRGN, and RECOVAR. cryoDRGN and cryoSPARC 3DVA revealed coordinated open-closed transitions across four catalytic domains, whereas RECOVAR revealed independent motion of two ACE-N domains, also observed with cryoSPARC focused classification. The authors suggest that the discrepancy in the results of the different methods for continuous conformational analysis in cryo-EM could results from different approaches used for dimensionality reduction and trajectory generation in these methods.

      Strengths:

      This is an important study that shows, for the first time, the structure and the snapshots of the dynamics of the full-length sACE dimer. Moreover, the study highlights the importance of combining insights from different cryo-EM methods that address questions difficult or impossible to tackle experimentally, while lacking ground truth for validation.

      Weaknesses (from the last round of review):

      The open, closed, and intermediate states of ACE-N and ACE-C in the four cryo-EM structures from discrete classification were designated quantitatively (based on measured atomic distances on the models fitted into cryo-EM maps). Unfortunately, atomic models were not fitted into cryo-EM maps obtained with cryoSPARC 3DVA, cryoDRGN, and RECOVAR, and the open/closed states in these cases were designated based on a qualitative analysis.

    3. Reviewer #2 (Public review):

      The manuscript presents a valuable contribution to the field of ACE structural biology and dynamics by providing the first complete full-length dimeric ACE structure in four distinct states. The study integrates cryo-EM and molecular dynamics simulations to offer important insights into ACE dynamics. The depth of analysis is commendable, and the combination of structural and computational approaches enhances our understanding of the protein's conformational landscape.

    4. Reviewer #3 (Public review):

      Summary:

      Mancl et al. report four Cryo-EM structures of glycosylated and soluble Angiotensin-I converting enzyme (sACE) dimer. This moves forward the structural understanding of ACE, as previous analysis yielded partially denatured or individual ACE domains. By performing a heterogeneity analysis, the authors identify three structural conformations (open, intermediate open, and closed) that define the openness of the catalytic chamber and structural features governing the dimerization interface. They show that the dimer interface of soluble ACE consists of an N-terminal glycan and protein-protein interaction regions, as well as C-terminal protein-protein interactions. Further heterogeneity mining and all-atom molecular dynamic simulations show structural rearrangements that lead to the opening and closing of the catalytic pocket, which could explain how ACE binds its substrate. These studies could contribute to future drug design targeting the active site or dimerization interface of ACE.

      Strengths:

      The authors make significant efforts to address ACE denaturation on cryo-EM grids, testing various buffers and grid preparation techniques. These strategies successfully reduce denaturation and greatly enhance the quality of the structural analysis. The integration of cryoDRGN, 3DVA, RECOVAR, and all-atom simulations for heterogeneity analysis proves to be a powerful approach, further strengthening the overall experimental methodology.

      Weaknesses:

      No weaknesses noted.

    5. Author response:

      The following is the authors’ response to the previous reviews

      We would like to thank you and your chosen reviewers for the diligent work and insightful comments. Following the latest round of feedback, we have made the following changes to the manuscript:

      (1) We have added details regarding the specific versions of Cryosparc and cryoDRGN used in our analysis.

      (2) We have addressed Reviewer 2’s comment concerning the negative RMSF values in Figure S12. The negative values occur because this display shows the difference in RMSF values from the MD simulations of glycosylated versus non-glycosylated ACE. To avoid similar confusion, we have split Figure S12 into three panels. Panels A and B show the RMSF values for each residue in the glycosylated and non-glycosylated sACE MD simulations, respectively, and all values here are positive. Panel C (the original Figure S12) now includes expanded labeling to clarify that it depicts the difference in RMSF values between the presence and absence of glycans. In this panel, a negative value indicates that the residues exhibit higher RMSF in simulations where glycans are present. The figure legend has been revised to accurately describe the updated figure.

    1. eLife Assessment

      This important study provides a potential framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. While FgDML1 remains under-explored with an unclear role in the biology of filamentous fungi, the supporting evidence in this study is incomplete. Providing details on methods and adding controls will strengthen the work.

    2. Reviewer #1 (Public review):

      Summary:

      In their study, the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely, and the manuscript is well written, albeit in some cases, details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease in the DON content associated with the deletion of FgDML1. Although some growth data are shown in Figure 6, indicating a severe growth defect, the DON production presented in Figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to decreased growth, and the specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation under the same conditions to the DON amount detected. Only then can a conclusion as to an altered production in the mutant strains be drawn.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper is innovative, but there are issues in the writing that need to be addressed and corrected.

      Weaknesses:

      (1) The authors speculate that cyazofamid treatment caused upregulation of the assembly factors, leading to a change in the conformation of the Qi protein, thus restoring the enzyme activity of complex III. But no speculation was given in the discussion as to why this would lead to the upregulation of assembly factors, and how the upregulation of assembly factors would change the protein conformation, and is there any literature reporting a similar phenomenon? I would suggest adding this to the discussion.

      (2) Would increased sensitivity of the mutant to cell wall stress be responsible for the excessive curvature of the mycelium?

      (3) The vertical coordinates of Figure 7B need to be modified with positive inhibition rates for the mutants.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript "Mitochondrial 1 protein FgDML1 regulates DON toxin biosynthesis and cyazofamid sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" describes the construction of a null mutant for the FgDML1 gene in F. graminearum and assays characterising the effects of this mutation on the pathogen's infection process and lifecycle. While FgDML1 remains underexplored with an unclear role in the biology of filamentous fungi, and although the authors performed several experiments, there are fundamental issues with the experimental design and execution, and interpretation of the results.

      Strengths:

      FgDML1 is an interesting target, and there are novel aspects in this manuscript. Studies in other organisms have shown that this protein plays important roles in mitochondrial DNA (mtDNA) inheritance, mitochondrial compartmentalisation, chromosome segregation, mitochondrial distribution, mitochondrial fusion, and overall mitochondrial dynamics. Indeed, in Saccharomyces cerevisiae, the mutation is lethal. The authors have carried out multi-faceted experiments to characterise the mutants.

      Weaknesses:

      However, I have concerns about how the study was conceived. Given the fundamental importance of mitochondrial function in eukaryotic cells and how the absence of this protein impacts these processes, it is unsurprising that deletion of this gene in F. graminearum profoundly affects fungal biology. Therefore, it is misleading to claim a direct link between FgDML1 and DON toxin biosynthesis (and virulence), as the observed effects are likely indirect consequences of compromised mitochondrial function. In fact, it is reasonable to assume that the production of all secondary metabolites is affected to some extent in the mutant strains and that such a strain would not be competitive at all under non-laboratory conditions. The order in which the authors present the results can be misleading, too. The results on vegetative growth rate appeared much later in the manuscript, which should have come first, as the FgDML1 mutant exhibited significant growth defects, and subsequent results should be discussed in that context. Moreover, the methodologies are not described properly, making the manuscript hard to follow and difficult to replicate.

    1. eLife Assessment

      This work presents potentially important findings suggesting that a combination of transcranial stimulation approaches applied for a short period could improve memory performance. However, the evidence supporting the conclusions is currently incomplete. In particular, the claims relating to the specific neural mechanisms and anatomical sites of action underlying effects were viewed as overstated in the current version. The results potentially have implications for non-invasive enhancement of cognitive functions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors make a bold claim that a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) causes slight improvements in memory in a face/name/profession task.

      Strengths:

      The idea of stimulating the human brain non-invasively is very attractive because, if it worked, it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application.

      Weaknesses:

      (1) The title refers to the "precuneus-hippocampus" network. A clear definition of what is meant by this terminology is lacking. More importantly, mechanistic evidence that the precuneus and the hippocampus are involved in the potential effects of stimulation remains unconvincing.

      (2) The question of the extent to which the stimulation approach and the stimulation parameters used in these experiments causes specific and functionally relevant neural effects remains open. Invasive recordings that could address this question remain out of the scope of this non-invasive study. The authors conducted scalp EEG experiments in an attempt to address this question using non-invasive methods. However, the results shown in Fig. 3 are unclear. The results are inconsistently reported in units of microvolts squared in some panels (3A, 3B) and in units of microvolts in other panels (3C). Also, there is insufficient consideration of potential contamination by signal components reflecting eye movements, other muscle artifacts, or another volume-conducted signal reflecting aggregate activity inside the brain.

      (3) Figure 3 indicates "Precuneus oscillatory activity ...", but evidence that the activity presented reflects precuneus activity is lacking. The maps shown at the bottom of Figure 3C suggest that the EEG signals recorded with scalp EEG reflect activity generated across a wide spatial range, with a peak encompassing at least tens of centimeters. Thus, evidence that effects specifically reflect precuneus activity, as the paper's title and text throughout the manuscript suggest, is lacking.

      (4) The paper as currently presented (e.g., Figure 3) also lacks rigorous evidence of relevant oscillatory activity. Prior to filtering EEG signals in a particular frequency band, clear evidence of oscillations in the frequency band of interest should be shown (e.g., demonstration of a clear peak that emerges naturally in the frequency range of interest when spectral analysis is applied to "raw" signals). The authors claim that gamma oscillations change because of the stimulation, but a clear peak in the gamma range prior to stimulation is not apparent in the data as currently presented. Thus, the extent to which spectral measurements during stimulation reflect physiological gamma oscillations remains unclear.

      (5) Concerns remain regarding the rigor of statistical analyses in the revised manuscript (see also point 8 below). Figure 3B shows an undefined statistical test with p<0.05. The statistical test that was used is not explained. Also, a description of how corrections for multiple comparisons were made is missing. Figures 3A and 3C are not accompanied by statistics, making the results difficult to interpret. For Figure 4C, a claim was made based on a significant p-value for one statistical test and a non-significant p-value in another test. This is a common statistical mistake (see Figure 1 and accompanying discussion in Makin and Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175).

      (6) In the second question posed in the original review, I highlighted that it was unclear how such stimulation would produce memory enhancement. The authors replied that, in the absence of mechanisms, there are many other studies that suffer from the same problem. This raises the question of placebo effects. The paper does not sufficiently address or discuss the possibility that any potential stimulation effects may reflect placebo effects.

      (7) The third major concern in the original review was the lack of evidence for a mechanism that is specific to the precuneus. Evidence for specific involvement of the precuneus remains lacking in the revised manuscript. The authors state: "the non-invasive stimulation protocol was applied to an individually identified precuneus for each participant". However, the meaning of this statement is unclear. Specifically, it is unclear how the authors know that they are specifically targeting the precuneus. Without directly recording from the precuneus and directly demonstrating effects, which is outside of the scope of the study, specific involvement of the precuneus seems speculative. Also, it does not seem as though a figure was included in the paper to show how the stimulation protocol specifically targets the precuneus. In their response to the original reviews, the authors state that posterior medial parietal areas are the only regions that show significant differences following the stimulation, but they did not cite a specific figure, or statistics reported in the text, that show this. In any event, posterior medial parietal areas encompass a wide area of the brain, so this would still not provide evidence for an effect specifically involving the precuneus.

      (8) Regarding chance levels, it is unfortunate that the authors cannot quantify what chance levels are in the immediate and delayed recall conditions. This makes interpretation of the results challenging. In the immediate and delayed conditions, the authors state that the chance level is 33%. It would be useful to mark this in the figures. If I understand correctly, chance is 33% in Fig. 2A. If this is the case and if I am interpreting the figure correctly:<br /> Gray bars for the sham condition appear to be below chance (~20-25%). Why is this condition associated with an accuracy level that is lower than chance?<br /> Cyan bars and red bars do not appear to be significantly different from chance (i.e., 33%), with red slightly higher than cyan. What statistic was performed to obtain the level of significance indicated in the figure? The highest average value for the red condition appears to be around 35%. More details are needed to fully explain this figure and to support the claims associated with this figure.

      (9) In the revised version of the paper, the authors did not address concerns associated with the block design (please see question 4d in the original review).

      In sum, this study presents an admirable aspirational goal, the notion that a non-invasive stimulation protocol could modulate activity in specific brain regions to enhance memory. However, the evidence presented at the behavioral level and at the mechanistic level (e.g. the putative involvement of specific brain regions) remains unconvincing.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

      Comments on Revision:

      I thank the authors for their thoughtful responses to my first review and their inclusion of more detailed methodological discussion of their rationale for the stimulation protocol conditions and timing. Regarding the apparent difference in connectivity at baseline between conditions, the explanation that this is due to intrinsic dynamics, state, or noise implies the baseline is reflecting transient changes in dynamics rather than a true or stable baseline. Based on this, it looks like iTBS solely is significantly greater than the baseline before the iTBS and <sub>γ</sub>tACS condition but maybe not that much lower than post-stimulation period for iTBS and <sub>γ</sub>tACS. A longer baseline period should be used to ensure transient states are not driving baseline levels such that these endogenous fluctuations would average out. This also raises questions about whether the effect of iTBS and <sub>γ</sub>tACS or iTBS alone are dependent on the intrinsic state at the time when stimulation begins. Their additional clarification of memory scoring is helpful but also reveals that the effect of dual iTBS+<sub>γ</sub>tACS specifically on the association between faces and names is just significant. This modest increase in associative memory should be taken into consideration when interpreting these findings.

    4. Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual <sub>γ</sub>tACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they find that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and <sub>γ</sub>tACS increases gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments. It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

      Weaknesses:

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. In the revised manuscript, the authors provide post-hoc sensitivity analyses that help contextualize the strength of the findings.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in separate experiments, but readers should keep in mind that this limits inferences about how exactly dual iTBS and <sub>γ</sub>tACS of the precuneus modulate learning and memory.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors claim that they can use a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) to cause slight improvements in memory in a face/name/profession task.

      Strengths:

      The idea of stimulating the human brain non-invasively is very attractive because, if it worked, it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application.

      Weaknesses:

      (1) It is highly unclear what, if anything, transpires in the brain with non-invasive stimulation. To cite one example of many, a rigorous study in rats and human cadavers, compellingly showed that traditional parameters of transcranial electrical stimulation lead to no change in brain activity due to the attenuation by the soft tissue and skull (Mihály Vöröslakos et al Nature Communications 2018): https://www.nature.com/articles/s41467-018-02928-3. It would be very useful to demonstrate via invasive neurophysiological recordings that the parameters used in the current study do indeed lead to any kind of change in brain activity. Of course, this particular study uses a different non-invasive stimulation protocol.

      Thank you for raising the important issue regarding the actual neurophysiological effects of non-invasive brain stimulation. Unfortunately, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints, while studies on cadavers or rodents would not fully resolve our question. Indeed, the authors of the cited study (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human brain and cadavers due to alterations in electrical conductivity that occur in postmortem tissue. Huang and colleagues addressed the difficulties in reaching direct evidence of non-invasive brain stimulation (NIBS) effects in a review published in Clinical Neurophysiology in 2017. They conclude that the use of EEG to assess brain response to TMS has great potential for a less indirect demonstration of plasticity mechanisms induced by NIBS in humans.

      To address this challenge, we conducted Experiments 3 and 4, which respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner using TMS-EEG and fMRI. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      We acknowledge that further exploration of this aspect would be highly valuable, and we agree that it is worth discussing both as a technical limitation and as a potential direction for future research. We therefore, modify the discussion accordingly (main text, lines 280-289).

      “Although we studied TMS and tACS propagation through the E-field modeling and observed an increase in the precuneus gamma oscillatory activity, excitability and connectivity with the hippocampi, we cannot exclude that our results might reflect the consequences of stimulating more superficial parietal regions other than the precuneus nor report direct evidence of microscopic changes in the brain after the stimulation. Invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. Studies on cadavers or rodents would not fully resolve our question due to significant differences between them (i.e. rodents do not have an anatomical correspondence while cadavers have an alterations in electrical conductivity occurring in postmortem tissue). However, further exploration of this aspect in future studies would help in the understanding of γtACS+iTBS effects.”

      (2) If there is any brain activity triggered by the current stimulation parameters, then it is extremely difficult to understand how this activity can lead to enhancing memory. The brain is complex. There are hundreds of neuronal types. Each neuron receives precise input from about 10,000 other neurons with highly tuned synaptic strengths. Let us assume that the current protocol does lead to enhancing (or inhibiting) simultaneously the activity of millions of neurons. It is unclear whether there is any activity at all in the brain triggered by this protocol, it is also unclear whether such activity would be excitatory, or inhibitory. It is also unclear how many neurons, let alone what types of neurons would change their activity. How is it possible that this can lead to memory enhancement? This seems like using a hammer to knock on my laptop and hope that the laptop will output a new Mozart-like sonata.

      Thank you for your comment. As you correctly point out, we still do not have precise knowledge of which neurons—and to what extent—are activated during non-invasive brain stimulation in humans. However, this challenge is not limited to brain stimulation but applies to many other therapeutic interventions, including psychiatric medications, without limiting their use.

      Nevertheless, a substantial body of research has investigated the mechanisms underlying the efficacy of TMS and tACS in producing behavioral after-effects, primarily through its ability to induce long-term potentiation (Bliss & Collingridge, The Journal of Physiology, 1993a; Ridding & Rothwell, Nature Reviews Neuroscience, 2007; Huang et al., Clinical Neurophysiology, 2017; Koch et al., Neuroimage 2018; Koch et al., Brain 2022; Jannati et al., Neuropsychopharmacology, 2023; Wischnewski et al., Trends in Cognitive Science, 2023; Griffiths et al., Trends in Neuroscience, 2023).

      We acknowledge that we took this important aspect for granted. We consequently expanded the introduction accordingly (main text, lines 48-60).

      “Repetitive transcranial magnetic stimulation (rTMS) and transcranial alternating current stimulation (tACS) are two forms of NIBS widely used to enhance memory performances (Grover et al., 2022; Koch et al., 2018; Wang et al., 2014). rTMS, based on the principle of Faraday, induces depolarization of cortical neuronal assemblies and leads to after-effects that have been linked to changes in synaptic plasticity involving mechanisms of long-term potentiation (LTP) (Huang et al., 2017; Jannati et al., 2023). On the other hand, tACS causes rhythmic fluctuations in neuronal membrane potentials, which can bias spike timing, leading to an entrainment of the neural activity (Wischnewski et al., 2023). In particular, the induction of gamma oscillatory a has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      (3) Even if there is any kind of brain activation, it is unclear why the authors seem to be so sure that the precuneus is responsible. Are there neurophysiological data demonstrating that the current protocol only activates neurons in the precuneus? Of note, the non-invasive measurements shown in Figure 3 are very weak (Figure 3A top and bottom look very similar, and Figure 3C left and right look almost identical). Even if one were to accept the weak alleged differences in Figure 3, there is no indication in this figure that there is anything specific to the precuneus, rather a whole brain pattern. This would be the kind of minimally rigorous type of evidence required to make such claims. In a less convincing fashion, one could look at different positions of the stimulation apparatus. This would not be particularly compelling in terms of making a statement about the precuneus. But at least it would show that the position does matter, and over what range of distances it matters, if it matters.

      Thank you for your feedback. Our assumption that the precuneus plays a key role in the observed effects is based on several factors:

      (1) The non-invasive stimulation protocol was applied to an individually identified precuneus for each participant. Given existing evidence on TMS propagation, we can reasonably assume that the precuneus was at least a mediator of the observed effects (Ridding & Rothwell, Nature Reviews Neuroscience 2007). For further details about target identification and TMS and tACS propagation, please refer to the MRI data acquisition section in the main text and Biophysical modeling and E-field calculation section in the supplementary materials.

      (2) To investigate the effects of the neuromodulation protocol on cortical responses, we conducted a whole-brain analysis using multiple paired t-tests comparing each data point between different experimental conditions. To minimize the type I error rate, data were permuted with the Monte Carlo approach and significant p-values were corrected with the false discovery rate method (see the Methods section for details). The results identified the posterior-medial parietal areas as the only regions showing significant differences across conditions.

      (3) To control for potential generalized effects, we included a control condition in which TMS-EEG recordings were performed over the left parietal cortex (adjacent to the precuneus). This condition did not yield any significant results, reinforcing the cortical specificity of the observed effects.

      However, as stated in the Discussion, we do not claim that precuneus activity alone accounts for the observed effects. As shown in Experiment 4, stimulation led to connectivity changes between the precuneus and hippocampus, a network widely recognized as a key contributor to long-term memory formation (Bliss & Collingridge, Nature 1993). These connectivity changes suggest that precuneus stimulation triggered a ripple effect extending beyond the stimulation site, engaging the broader precuneus-hippocampus network.

      Regarding Figure 3A, it represents the overall expression of oscillatory activity detected by TMS-EEG. Since each frequency band has a different optimal scaling, the figure reflects a graphical compromise. A more detailed representation of the significant results is provided in Figure 3B. The effect sizes for gamma oscillatory activity in the delta T1 and T2 conditions were 0.52 and 0.50, respectively, which correspond to a medium effect based on Cohen’s d interpretation.

      We add a paragraph in the discussion to improve the clarity of the manuscript regarding this important aspect (lines 193-198).

      “Given the existing evidence on TMS propagation and the computation of the Biophysical model with the Efield, we can reasonably assume that the individually identified PC was a mediator of the observed effects (Ridding and Rothwell, 2007). Moreover, we observed specific cortical changes in the posteromedial parietal areas, as evidenced by the whole-brain analysis conducted on TMS-EEG data and the absence of effect on the lateral posterior parietal cortex used as a control condition.”

      (4) In the absence of any neurophysiological documentation of a direct impact on the brain, an argument in this type of study is that the behavioral results show that there must be some kind of effect. I agree with this argument. This is also the argument for placebo effects, which can be extremely powerful and useful even if the mechanism is unrelated to what is studied. Then let us dig into the behavioral results.

      Hoping to have already addressed your concern regarding the neurophysiological impact of the stimulation on the brain, we would like to emphasize that the behavioral results were obtained controlling for placebo effects. This was achieved by having participants perform the task under different stimulation conditions, including a sham condition.

      4a. There does not seem to be any effect on the STMB task, therefore we can ignore this.

      4b. The FNAT task is minimally described in the supplementary material. There are no experimental details to understand what was done. What was the size of the images? How long were the images presented for? Were there any repetitions of the images? For how long did the participants study the images? Presumably, all the names and occupations are different? What were the genders of the faces? What is chance level performance? Presumably, the same participant saw different faces across the different stimulation conditions. If not, then there can be memory effects across different conditions that are even more complex to study. If yes, then it would be useful to show that the difficulty is the same across the different stimuli.

      We thank you for signaling the lack in the description of FNAT task. We added the information required in the supplementary information (lines 93-101).

      “Each picture's face size was 19x15cm. In the learning phase, faces were shown along with names and occupations for 8 seconds each (totaling approximately 2 minutes). During immediate recall, the faces were displayed alone for 8 seconds. In the delayed recall and recognition phase, pictures were presented until the subject provided answers. We used a different set of stimuli for each stimulation condition, resulting in a total of 3 parallel task forms balanced across conditions and session order. All parallel forms comprised 6 male and 6 female faces; for each sex, there were 2 young adults (around 30 years old), 2 middle-aged adults (around 50 years old), and 2 elderly adults (around 70 years old). Before the experiments, we conducted a pilot study to ensure no differences existed between the parallel forms of the task.”

      The chance level in the immediate and delayed recall is not quantifiable since the participants had to freely recall the name and the occupation without a multiple choice. In the recognition, the chance level was around 33% (since the possible answers were 3).

      4c. Although not stated clearly, if I understand FNAT correctly, the task is based on just 12 presentations. Each point in Figure 2A represents a different participant. Unfortunately, there is no way of linking the performance of individual participants across the conditions with the information provided. Lines joining performance for each participant would be useful in this regard. Because there are only 12 faces, the results are quantized in multiples of 100/12 % in Figure 3A. While I do not doubt that the authors did their homework in terms of the statistical analyses, it is difficult to get too excited about these 12 measurements. For example, take Figure 3A immediate condition TOTAL, arguably the largest effect in the whole paper. It seems that on average, the participants may remember one more face/name/occupation.

      Thank you for the suggestion. We added graphs showing lines linking the performance of individual participants across conditions to improve clarity, please see Fig.2 revised. We apologize for the lack of clarity in the description of the FNAT. As you correctly pointed out, we used the percentage based on the single association between face, name and occupation (12 in total). However, each association consisted of three items, resulting in a total of 36 items to learn and associate – we added a paragraph to make it more explicit in the manuscript (lines 425-430).

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      In the example you mentioned, participants were, on average, able to correctly recall and associate three more items compared to the other conditions. While this difference may not seem striking at first glance, it is important to consider that we assessed memory performance after a single, three-minute stimulation session. Similar effects are typically observed only after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022). Moreover, memory performance changes are often measured by a limited set of stimuli due to methodological constraints related to memory capacity. For example, Rey Auditory Verbal learning task, requiring to learn and recall 15 words, is a typical test used to detect memory changes (Koch et al., Neuroimage, 2018; Benussi et al., Brain stimulation 2021; Benussi et al., Annals of Neurology, 2022). 

      4d. Block effects. If I understand correctly, the experiments were conducted in blocks. This is always problematic. Here is one example study that articulated the big problems in block designs (Li et al TPAMI 2021):https://ieeexplore.ieee.org/document/9264220

      Thank you for the interesting reference. According to this paper, in a block design, EEG or fMRI recordings are performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design where both TMS-EEG and fMRI were conducted in resting state on different days according to the different stimulation conditions.

      4e. Even if we ignore the lack of experimental descriptions, problems with lack of evidence of brain activity, the minimalistic study of 12 faces, problems with the block design, etc. at the end of the day, the results are extremely weak. In FNAT, some results are statistically significant, some are not. The interpretation of all of this is extremely complex. Continuing with Figure 3A, it seems that the author claims that iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham. I am struggling to interpret such a result. When separating results by name and occupation, the results are even more perplexing. There is only one condition that is statistically significant in Figure 3A NAME and none in the occupation condition.

      Thank you again for your feedback. Hoping to have thoroughly addressed your initial concerns in our previous responses, we now move on to your observations regarding the behavioral results, assuming you were referring to Figure 2A. The main finding of this study is the improvement in long-term memory performance, specifically the ability to correctly recall the association between face, name, and occupation (total FNAT), which was significantly enhanced in both Experiments 1 and 2. However, we also aimed to explore the individual contributions of name and occupation separately to gain a deeper understanding of the results. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall. We understand that this may have caused some confusion. We consequently modified the manuscript in the (lines 97-99; 107-111; 425-430) to make it clearer and moved the graph relative to FNAT NAME and OCCUPATION from fig.2 in the main text to fig. S4 in supplementary information.

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18; p =0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86; p =0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall reveald that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      Regarding the stimulation conditions, your concerns about the performance pattern (iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham) are understandable. However, this new protocol was developed precisely in response to the variability observed in behavioral outcomes following non-invasive brain stimulation, particularly when used to modulate memory functions (Corp et al., 2020; Pabst et al., 2022). As discussed in the manuscript, it is intended as a boost to conventional non-invasive brain stimulation protocols, leveraging the mechanisms outlined in the Discussion section.

      (5) In sum, it would be amazing to be able to use non-invasive stimulation for any kind of therapeutic purpose as the authors imagine. More work needs to be done to convince ourselves that this kind of approach is viable. The evidence provided in this study is weak.

      We hope our response will be carefully considered, fostering a constructive exchange and leading to a reassessment of your evaluation.

      Reviewer #2 (Public review):

      Summary:

      The manuscript "Dual transcranial electromagnetic stimulation of the precuneus-hippocampus network boosts human long-term memory" by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

      Weaknesses:

      (1) The study did not include a condition where γtACS was applied alone. This was likely because a previous work indicated that a single 3-minute γtACS did not produce significant effects, but this limits the ability to isolate the specific contribution of γtACS in the context of this target and memory function

      Thank you for your comments. As you pointed out, we did not include a condition where γtACS was applied alone. This decision was based on the findings of Guerra et al. (Brain Stimulation 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. However, you raise an important aspect that should be further discussed, we modified the limitation section accordingly (lines 290-297).

      “We did not assess the effects of γtACS alone. This decision was based on the findings of Guerra et al. (Guerra et al., 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. While examining the effects of γtACS alone could help isolate its specific contribution to this target and memory function, extensive research has shown that achieving a cognitive enhancement aftereffect with tACS alone typically requires around 20–25 minutes of stimulation (Grover et al., 2023).”

      (2) The authors applied stimulation for 3 minutes, which seems to be based on prior tACS protocols. It would be helpful to present some rationale for both the duration and timing relative to the learning phase of the memory task. Would you expect additional stimulation prior to recall to benefit long-term associative memory?

      Thank you for your comment and for raising this interesting point. As you correctly noted, the protocol we used has a duration of three minutes, a choice based on previous studies demonstrating its greater efficacy with respect to single stimulation from a neurophysiological point of view. Specifically, these studies have shown that the combined stimulation enhanced gamma-band oscillations and increased cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) are all associated with memory formation and encoding processes, we decided to apply the co-stimulation immediately before it to enhance the efficacy. We added this paragraph to the manuscript rationale (lines 48-60).

      “Repetitive transcranial magnetic stimulation (rTMS) and transcranial alternating current stimulation (tACS) are two forms of NIBS widely used to enhance memory performances (Grover et al., 2022; Koch et al., 2018; Wang et al., 2014). rTMS, based on the principle of Faraday, induces depolarization of cortical neuronal assemblies and leads to after-effects that have been linked to changes in synaptic plasticity involving mechanisms of long-term potentiation (LTP) (Huang et al., 2017; Jannati et al., 2023). On the other hand, tACS causes rhythmic fluctuations in neuronal membrane potentials, which can bias spike timing, leading to an entrainment of the neural activity (Wischnewski et al., 2023). In particular, the induction of gamma oscillatory a has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      Regarding the question of whether stimulation could also benefit recall, the answer is yes. We can speculate that repeating the stimulation before recall might provide an additional boost. This is supported by evidence showing that both the precuneus and gamma oscillations are involved in recall processes (Flanagin et al., Cerebral Cortex 2023; Griffiths et al., Trends in Neurosciences 2023). Furthermore, previous research suggests that reinstating the same brain state as during encoding can enhance recall performance (Javadi et al., The Journal of Neuroscience 2017). We added this consideration to the discussion (lines 305-311).

      “Future studies should further investigate the effects of stimulation on distinct memory processes. In particular, stimulation could be applied before retrieval (Rossi et al., 2001), to better elucidate its specific contribution to the observed enhancements in memory performance. Additionally, it would be worth examining whether repeated stimulation - administered both before encoding and before retrieval - could produce a boosting effect. This is especially relevant in light of findings showing that matching the brain state between retrieval and encoding can significantly enhance memory performance (Javadi et al., 2017).”

      (3) How was the burst frequency of theta iTBS and gamma frequency of tACS chosen? Were these also personalized to subjects' endogenous theta and gamma oscillations? If not, were increases in gamma oscillations specific to patients' endogenous gamma oscillation frequencies or the tACS frequency?

      The stimulation protocol was chosen based on previous studies (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022).  Gamma tACS sinusoid frequency wave was set at 70 Hz while iTBS consisted of ten bursts of three pulses at 50 Hz lasting 2 s, repeated every 10 s with an 8 s pause between consecutive trains, for a total of 600 pulses total lasting 190 s (see iTBS+γtACS neuromodulation protocol section). In particular, the theta iTBS has been inspired by protocols used in animal models to elicit LTP in the hippocampus (Huang et al., Neuron 2005). Consequently, neither Theta iTBS nor the gamma frequency of tACS were personalized. The increase in gamma oscillations was referred to the patient’s baseline and did not correspond to the administrated tACS frequency.

      (4) The authors do a thorough job of analyzing the increase in gamma oscillations in the precuneus through TMS-EEG; however, the authors may also analyze whether theta oscillations were also enhanced through this protocol due to the iTBS potentially targeting theta oscillations. This may also be more robust than gamma oscillations increases since gamma oscillations detected on the scalp are very low amplitude and susceptible to noise and may reflect activity from multiple overlapping sources, making precise localization difficult without advanced techniques.

      Thank you for the suggestion. We analyzed theta oscillations, finding no changes.

      (5) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (6) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their partial contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we revised the manuscript accordingly (lines 97-98; 107-111; 425-430).

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18 ;p=0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86;p=0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall revealed that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      We also moved the data regarding the specific contribution of name and occupation recall in the supplementary information (fig.S4) and further specified how we computed the score in the score (lines 102-104).

      “The score was computed by deriving an accuracy percentage index dividing by 12 and multiplying by 100 the correct association sum. The partial recall scores were computed in the same way only considering the sum of face-name (NAME) and face-occupation (OCCUPATION) correctly recollected.”

      Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual γtACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they found that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate the neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and γtACS increase gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting-state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for the treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments (with the only caveat that I am not an expert in fMRI functional connectivity measures and DTI). It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They are also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

      Weaknesses:

      I want to state clearly that I think the strengths of this study far outweigh the concerns I have. I still list some points that I think should be clarified by the authors or taken into account by readers when interpreting the presented findings.

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. That being said, the authors do report significant effects, so they were per definition powered to find those. However, the effect sizes reported for their main findings are all relatively large and it is known that significant findings from small samples may represent inflated effect sizes, which may hamper the generalizability of the current results. Ideally, the authors would replicate their main findings in a larger sample. Alternatively, I think running a sensitivity analysis to estimate the smallest effect the authors could have detected with a power of 80% could be very informative for readers to contextualize the findings. At the very least, however, I think it would be necessary to address this point as a potential limitation in the discussion of the paper.

      Thank you for the observation. As you mentioned, our power analysis was based on our previous study investigating the same neuromodulation protocol with a corresponding experimental design. The relatively small sample could be considered a possible limitation of the study which we will add to the discussion. A fundamental future step will be to replay these results on a larger population, however, to strengthen our results we performed the sensitivity analysis you suggested.

      In detail, we performed a sensitivity analysis for repeated-measures ANOVA with α=0.05 and power(1-β)=0.80 with no sphericity correction. For experiment 1, a sensitivity analysis with 1 group and 3 measurements showed a minimal detectable effect size of f=0.524 with 20 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η<sup>2</sup>=0.274 corresponding to f=0.614; the ANOVA on FNAT delayed performance revealed an effect size of η<sup>2</sup>=0.236 corresponding to f=0.556. For experiment 2, a sensitivity analysis for total FNAT immediate performance (1 group and 3 measurements) showed a minimal detectable effect size of f=0.797 with 10 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η<sup>2</sup>=0.448 corresponding to f=0.901. The sensitivity analysis for total FNAT delayed performance (1 group and 6 measurements) showed a minimal detectable effect size of f=0.378 with 10 participants. In our paper, the ANOVA on total FNAT delayed performance revealed an effect size of η<sup>2</sup>=0.484 corresponding to f=0.968. Thus, the sensitivity analysis showed that both experiments were powered enough to detect the minimum effect size computed in the power analysis. We have now added this information to the manuscript and we thank the reviewer for her/his suggestion in the statistical analysis and results section (lines 99-100; 127-128; 130-131; 543-545).

      “The sensitivity analysis showed a minimal detectable effect size of  η<sup>2</sup>=0.215 with 20 participants.”

      “The sensitivity analysis showed a minimal detectable effect size of  η<sup>2</sup>=0.388 with 10 participants.”

      “The sensitivity analysis showed a minimal detectable effect size of η<sup>2</sup>=0.125 with 10 participants.”

      “Since we do not have an a priori effect size for experiment 1 and 2, we performed a sensitivity power analysis to ensure that these experiments were able to detect the minimum effect size with 80% power and alpha level of 0.05.”

      It seems that the statistical analysis approach differed slightly between studies. In experiment 1, the authors followed up significant effects of their ANOVAs by Bonferroni-adjusted post-hoc tests whereas it seems that in experiment 2, those post-hoc tests where "exploratory", which may suggest those were uncorrected. In experiment 3, the authors use one-tailed t-tests to follow up their ANOVAs. Given some of the reported p-values, these choices suggest that some of the comparisons might have failed to reach significance if properly corrected. This is not a critical issue per se, as the important test in all these cases is the initial ANOVA but non-significant (corrected) post-hoc tests might be another indicator of an underpowered experiment. My assumptions here might be wrong, but even then, I would ask the authors to be more transparent about the reasons for their choices or provide additional justification. Finally, the authors sometimes report exact p-values whereas other times they simply say p < .05. I would ask them to be consistent and recommend using exact p-values for every result where p >= .001.

      Thank you again for the suggestions. Your observations are correct, we used a slightly different statistical depending on our hypothesis. Here are the details:

      In experiment 1, we used a repeated-measure ANOVA with one factor “stimulation condition” (iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS). Following the significant effect of this factor we performed post-hoc analysis with Bonferroni correction.

      In experiment 2, we used a repeated-measures with two factors “stimulation condition” and “time”. As expected, we observed a significant effect of condition, confirming the result of experiment 1, but not of time. Thus, this means that the neuromodulatory effect was present regardless of the time point. However, to explore whether the effects of stimulation condition were present in each time point we performed some explorative t-tests with no correction for multiple comparisons since this was just an explorative analysis.

      In experiment 3, we used the same approach as experiment 1. However, since we had a specific hypothesis on the direction of the effect already observed in our previous study, i.e. increase in spectral power (Maiella et al., Scientific Report 2022), our tests were 1-tailed.

      For the p-values, we corrected the manuscript reporting the exact values for every result.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in sperate experiments, but it is still worth pointing out to readers that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      Thank you for your comment. We fully agree with your observation, which is why this aspect has been considered in the study's limitations. To address your concern, we add this sentence to the limitation discussion (lines 299-301).

      “Consequently, these findings do not allow precise inferences regarding the specific mechanisms by which dual iTBS and γtACS of the precuneus modulate learning and memory.”

      There were no stimulation-related performance differences in the short-term memory task used in experiments 1 and 2. The authors argue that this demonstrates that the intervention specifically targeted long-term associative memory formation. While this is certainly possible, the STM task was a spatial memory task, whereas the LTM task relied (primarily) on verbal material. It is thus also possible that the stimulation effects were specific to a stimulus domain instead of memory type. In other words, could it be possible that the stimulation might have affected STM performance if the task taxed verbal STM instead? This is of course impossible to know without an additional experiment, but the authors could mention this possibility when discussing their findings regarding the lack of change in the STM task.

      Thank you for your interesting observation. We argue that the intervention primarily targeted long-term associative memory formation, as our findings demonstrated effects only on FNAT. However, as you correctly pointed out, we cannot exclude the possibility that the stimulation may also influence short-term verbal associative memory. We add this aspect when discussing the absence of significant findings in the STM task (lines 205-210).

      “Visual short-term associative memory, measured by STBM performance, was not modulated by any experimental condition. Even if we cannot exclude the possibility that the stimulation could have influenced short-term verbal associative memory, we expected this result since short-term associative memory is known to rely on a distinct frontoparietal network while FNAT, used to investigate long-term associative memory, has already been associated with the neural activity of the PC and the hippocampus (Parra et al., 2014; Rentz et al., 2011).”

      While the authors discuss the potential neural mechanisms by which the combined stimulation conditions might have helped memory formation, the psychological processes are somewhat neglected. For example, do the authors think the stimulation primarily improves the encoding of new information or does it also improve consolidation processes? Interestingly, the beneficial effect of dual iTBS and γtACS on recall performance was very stable across all time points tested in experiments 1 and 2, as was the performance in the other conditions. Do the authors have any explanation as to why there seems to be no further forgetting of information over time in either condition when even at immediate recall, accuracy is below 50%? Further, participants started learning the associations of the FNAT immediately after the stimulation protocol was administered. What would happen if learning started with a delay? In other words, do the authors think there is an ideal time window post-stimulation in which memory formation is enhanced? If so, this might limit the usability of this procedure in real-life applications.

      Thank you for your comment and for raising these important points.

      We hypothesized that co-stimulation would enhance encoding processes. Previous studies have shown that co-stimulation can enhance gamma-band oscillations and increase cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) have all been associated with encoding processes, we decided to apply co-stimulation before the encoding phase, to boost it. We enlarged the introduction to specify the link between neural mechanisms and the psychological process of the encoding (lines 55-60).

      “In particular, the induction of gamma oscillatory activity has been proposed to play an important role in a type of LTP known as spike timing-dependent plasticity, which depends on a precise temporal delay between the firing of a presynaptic and a postsynaptic neuron (Griffiths and Jensen, 2023). Both LTP and gamma oscillations have a strong link with memory processes such as encoding (Bliss and Collingridge, 1993; Griffiths and Jensen, 2023; Rossi et al., 2001), pointing to rTMS and tACS as good candidates for memory enhancement.”

      We applied the co-stimulation immediately before the learning phase to maximize its potential effects. While we observed a significant increase in gamma oscillatory activity lasting up to 20 minutes, we cannot determine whether the behavioral effects we observed would have been the same with a co-stimulation applied 20 minutes before learning. Based on existing literature, a reduction in the efficacy of co-stimulation over time could be expected (Huang et al., Neuron 2005; Thut et al., Brain Topography 2009). However, we hypothesize that multiple stimulation sessions might provide an additional boost, helping to sustain the effects over time (Thut et al., Brain Topography 2009; Koch et al., Neuroimage 2018; Koch et al., Brain 2022).

      Regarding the absence of further forgetting in both stimulation conditions, we think that the clinical and demographical characteristics of the sample (i.e. young and healthy subjects) explain the almost absence of forgetting after one week.

      Reviewer #1 (Recommendations for the authors):

      To address the concerns, the authors should:

      (1) Include invasive neuronal recordings (e.g., in rats or monkeys if not possible in humans) demonstrating that the current stimulation protocol leads to direct changes in brain activity.

      We understand the interest of the first reviewer in the understanding of neurophysiological correlates of the stimulation protocol, however, we are skeptical about this request as we think it goes beyond the aims of the study. As already mentioned in the response to the reviewer, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. At the same time, studies on cadavers or rodents would not fully resolve the question. Indeed, the authors of the study cited by the reviewer (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human cadavers due to alterations in electrical conductivity that occur in postmortem tissue. Huang and colleagues addressed the difficulties in reaching direct evidence of non-invasive brain stimulation (NIBS) effects in a review published in Clinical Neurophysiology in 2017. They conclude that the use of EEG to assess brain response to TMS has a great potential for a less indirect demonstration of plasticity mechanisms induced by NIBS in humans.

      It is exactly to meet the need to investigate the changes in brain activity after the stimulation protocol that we conducted Experiments 3 and 4. These experiments respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner using TMS-EEG and fMRI. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      Acknowledging the reviewer's point of view, we modified the manuscript accordingly, discussing this aspect both as a technical limitation and as a potential direction for future research (main text, lines 280-289).

      “Although we studied TMS and tACS propagation through the E-field modeling and observed an increase in the precuneus gamma oscillatory activity, excitability and connectivity with the hippocampi, we cannot exclude that our results might reflect the consequences of stimulating more superficial parietal regions other than the precuneus nor report direct evidence of microscopic changes in the brain after the stimulation. Invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints. Studies on cadavers or rodents would not fully resolve our question due to significant differences between them (i.e. rodents do not have an anatomical correspondence while cadavers have an alterations in electrical conductivity occurring in postmortem tissue). However, further exploration of this aspect in future studies would help in the understanding of γtACS+iTBS effects.”

      (2) Address all the technical questions about the experimental design.

      We addressed all the technical questions about the experimental design.

      (3) Repeat the experiments with randomized trial order and without a block design.

      The experiments were conducted with randomized trial order and we did not use a block design.

      (4) Add many more faces to the study. It is extremely difficult to draw any conclusion from merely 12 faces. Ideally, there would be lots of other relevant memory experiments where the authors show compelling positive results.

      We understand your perplexity about drawing conclusions from 12 faces, however, this is not the case. As we explained in the response reviewer, the task we implemented did not rely on the recall of merely 12 faces. Instead, participants had to correctly learn, associate and recall 12 faces, 12 names and 12 occupations for a total of 36 items. To improve the clarity of the manuscript, we added a paragraph to make this aspect more explicit (lines 425-430).

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      The behavioral changes we observed are similar to those who are typically observed after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022, Benussi et al., Annals of Neurology, 2022). Moreover, memory performance changes are often measured by a limited set of stimuli due to methodological constraints related to memory capacity. For example, Rey Auditory Verbal learning task, requiring to learn and recall 15 words, is a typical test used to detect memory changes (Koch et al., Neuroimage, 2018; Benussi et al., Brain stimulation 2021; Benussi et al., Annals of Neurology, 2022). 

      (5) Provide a clear explanation of the apparent randomness of which results are statistically significant or not in Figure 3. But perhaps with many more experiments, a lot more memory evaluations, many more stimuli, and addressing all the other technical concerns, either the results will disappear or there will be a more interpretable pattern of results.

      We provided explanations for all the concerns shown by the reviewer.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (2) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their partial contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we revised the manuscript accordingly (lines 97-98; 107-111; 425-430).

      “Dual iTBS+γtACS increased the performances in recalling the association between face, name and occupation (FNAT accuracy) both for the immediate (F<sub>2,38</sub>=7.18; p=0.002; η<sup>2</sup><sub>p</sub>=0.274) and the delayed (F<sub>2,38</sub>=5.86; p =0.006; η<sup>2</sup><sub>p</sub>=0.236) recall performances (Fig. 2, panel A).”

      “The in-depth analysis of the FNAT accuracy investigating the specific contribution of face-name and face-occupation recall revealed that dual iTBS+γtACS increased the performances in the association between face and name (FNAT NAME) delayed recall (F<sub>2,38</sub> =3.46; p =0.042; η<sup>2</sup>p =0.154; iTBS+γtACS vs. sham-iTBS+sham-tACS: 42.9±21.5 % vs. 33.8±19 %; p=0.048 Bonferroni corrected) (Fig. S4, supplementary information).”

      “We considered a correct association when a subject was able to recall all the information for each item (i.e. face, name and occupation), resulting in a total of 36 items to learn and associate. To further investigate the effect on FNAT we also computed a partial recall score accounting for those items where subjects correctly matched only names with faces (FNAT NAME) and only occupations with faces (FNAT OCCUPATION). See supplementary information for score details.”

      We also moved the data regarding the specific contribution of name and occupation recall in the supplementary information (fig.S4) and further specified how we computed the score in the score (lines 102-104).

      “The score was computed by deriving an accuracy percentage index dividing by 12 and multiplying by 100 the correct association sum. The partial recall scores were computed in the same way only considering the sum of face-name (NAME) and face-occupation (OCCUPATION) correctly recollected.”

      Reviewer #3 (Recommendations for the authors):

      A very small detail, in the caption for Figure 2A, OCCUPATION is described as being shown on the 'left' but it should be 'right'.

      We corrected this error.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Figure 1: It might be simpler to streamline  acronyms for different test cases, e.g,  E01contra, E01 ipsi (rather than EO1IPS), E02, and control. Thus, it would be possible to label  each of the three schematic panels as E01, E02, control.

      Please describe what the dots in the brain mean and move the V1 label so it does not occlude  dots.

      Please make clear that the "track reconstructions" are the bright spheres in the micrographs (there are track-like elements in some micrographs which may be tears or?)

      Thank you. We relabeled the groups as control, EO1contra, EO1ipsi, and EO2. These were  changed in all figures and in the document at several places.

      We indicated in the new caption that “Dots schematize ocular dominance columns”.

      We indicated that electrode track penetrations were the “(bright spots at right/posterior)”.

      (2) Figure 2: Should "horizontal" be vertical (line  556) of the caption? When describing the  scale bar for firing rate, please explain the meaning of italicized vs regular font.

      Please make the purple lines in Figures I and J easier to see (invisible in my PDF).

      Not quite clear what is significantly different from what when viewing the figure at a glance.  Would it be possible to clarify using standard methods?

      Yes, it should say vertical, thank you. We explained the italics (they denote the standard scale  bar size if no number is provided.)

      We changed the purple lines to yellow in all figures.

      We added comparison bars that help indicate significance.

      (3) Figures 3-5. Please make corrections like those  noted above.

      Yes, we applied the previous changes to Figures 3 - 5.

      (4) Minor. Sometimes the authors spell out temporal  frequency and sometimes abbreviate it.  Perhaps adopt a consistent style.

      Fixed, thanks.

      Reviewer #2 (Public Review):

      (1) The assessment of the tuning properties is  based on fits to the data. Presumably,  neurons for which the fits were poor were excluded? It would be useful to know what the criteria  were, how many neurons were excluded, and whether there was a significant difference  between the groups in the numbers of neurons excluded (which could further point to  differences between the groups).

      Yes, this is an important omission, thank you for catching it. We now write in methods (line 213):  “ Inclusion/exclusion: For each stimulus type, we examined  the set of all responses to visual  stimuli and blanks with an ANOVA test to evaluate the null hypothesis that the mean response  to all of these stimuli were the same; cells with a p<0.05 to this visual responsiveness test were  included in fits and analyses, and cells with p>0.05 were excluded. ”

      (2) For the temporal frequency data, low- and high-frequency  cut-offs are defined, but then  only used for the computation of the bandwidth. Given that the responses to low temporal  frequencies change profoundly with premature eye opening, it would be useful to directly  compare the low- and high-frequency cut-offs between groups, in addition to the index that is  currently used.

      We now provide this data in Figure 3 - figure supplement  1 .

      (3) In addition to the tuning functions and firing  rates that have been analyzed so far, are  there any differences in the temporal profiles of neural responses between the groups  (sustained versus transient responses, rates of adaptation, latency)? If the temporal dynamics  of the responses are altered significantly, that could be part of an explanation for the altered  temporal tuning.

      This is a great topic for future studies. Unfortunately, with drifting gratings, it is difficult to  establish these properties, which could be better assessed with standing or  square-wave-modulated gratings or other stimuli. We did not run standing gratings in our battery  of stimuli for this initial study.

      (4) It would be beneficial for the general interpretation  of the results to extend the discussion. First, it would be useful to provide a more detailed discussion of what type of visual information might make it through the closed eyelids (the natural state), in contrast to the structured  information available through open eyes. Second, it would be useful to highlight more clearly  that these data were collected in peripheral V1 by discussing what might be expected in  binocular, more central V1 regions. Third, it would be interesting to discuss the observed  changes in firing rates in the context of the development of inhibitory neurons in V1 (which still  undergo significant changes through the time period of premature visual experience chosen  here).

      Thank you, good ideas. Let’s take these three suggestions in turn.

      First, in the discussion, we added a subsection “ Biology  of early development in mustelids ” that  focuses on the developmental conditions of wild and laboratory animals:

      In the wild, mustelids raise their young in nests in the ground, in cavities such as holes in trees  or caves, or in areas of dense vegetation (Ruggiero et al. 1994). They may move the young  from one nest to another as they grow, but otherwise the young are primarily in the relatively  dark nest. It is highly likely that some light penetrates and that information about the 24-hour  cycle is available, but the light is likely to be dim and unlikely to provide a basis for high  luminance, high contrast stimulation through the closed lids. The animals begin to spend  substantial time outside the nest after eye opening.

      The ferret is a domesticated strain of the European polecat. In laboratory settings, ferret  jills give birth and keep their kits in a nest box. A laboratory typically maintains a 24-hour cycle  with 12 or 14 hours of light, and the light reaching the closed lids must first pass through the  cage, the nest box, and the nesting material. Therefore, developing ferrets have an obvious  circadian light signal but the light available for image formation is likely dim and of low contrast.

      Although the light that reaches the close lids in developing ferrets is likely to be relatively  dim, and any image-forming signal passing through the closed lids would be highly filtered in  luminance, spatial frequency, and contrast, it is important to remember that visual input before  natural eye opening (through the closed lids) can drive activity in retina, LGN, and cortex  (Huttenlocher 1967, Chapman and Stryker 1993, Krug et al., 2001, Akerman et al., 2002,Akerman et al., 2004). Further, orientation selectivity can be observed through the closed lids  (Krug et al., 2001), indicating that some coarse image-forming information does make it through  the closed lids.

      Second, we added text speculating about binocular cortex (lines 492 - 500): … our recordings  were performed in monocular cortex so that we could be sure of the developmental condition of  the eye that drove the classic responses. It is interesting to speculate about what might occur  more centrally in binocular visual cortex. Ocular dominance shifts are not induced when one eye  is opened prematurely (Issa et al 1999), indicating that ocular dominance plasticity is not  engaged at this early stage, but one might imagine that the impacts on temporal frequency and  spontaneous firing rates would still be present.

      Third, on inhibition, we added a paragraph (lines 502 - 509):

      We introduced premature patterned vision at a time when cortical inhibition is undergoing  substantial changes. GABAergic signaling has already undergone its switch (Ben-Ari, 2002)  from providing primarily depolarizing input to hyperpolarizing input by P21-23 (Mulholland et al.,  2021). In the days prior to eye opening, inhibitory cells exhibit activity that is closely associated  with the emerging functional modules that will reflect orientation columns (Mulholland et al.,  2021), but do not yet exhibit selectivity to orientation, in contrast to excitatory neurons, which do  exhibit selectivity to orientation at that time (Chang and Fitzpatrick, 2022).

      (5) In the methods section, the statement 'actively  kept in nesting box' is unclear. Presumably  this means that the jill prevents the kits from leaving the nesting box? It also would be worth at  least mentioning in this context that there obviously are still visual events in the nesting box too.

      Thanks. We improved this description (lines 118 - 121):  Ferret kits in laboratory housing receive  limited visual stimulation through their closed lids, as the mother actively keeps the kits in their  relatively dark nest . In order to ensure that animals  with early-opened eyes actually had  patterned visual experience  (and animals with closed  lids had the same stimulation filtered  through the lids) , animals were brought to the lab  for 2 hours a day for 4 consecutive days  beginning at P25.

      (6) The stimulus presentation could be more clearly  described. Is every stimulus presented in  an individual trial (surrounded by periods with a blank screen), or are all stimuli shown as a  continuous sequence? The description of the parameter screening is also potentially confusing  ('orientation was co-varied with stimuli consisting of drifting gratings at different spatial  frequencies' sounds as if there are separate stimuli for orientation; might be better to say  something like 'in the first set, orientation, spatial frequency, ... were covaried...')

      Yes, thank you, we fixed this (lines 184 - 201). We deleted the text indicated and added a  sentence “Each individual grating stimulus was full screen and had a single set of parameters  (direction, spatial frequency, temporal frequency), and was separated from the other stimuli by a  gray screen interstimulus interval.”. We also deleted a repetition of 100% contrast in the  description of the second set.

      (7) Description of low-pass index is unclear. What  is the 'largest temporal frequency response  observed'? The maximum response or the response to the largest temporal frequency tested?

      Thanks. We added a paragraph at line 236:

      We defined a low pass index as the response to the lowest temporal frequency tested (in this case 0.5 Hz) to the maximum response obtained to the set of temporal frequencies shown. LPI =  R(TF=0.5 Hz)/max(R(TF=0.5Hz), R(TF=1Hz), … R(TF=32Hz)).  If a cell exhibited the highest  firing for a temporal frequency of 0.5 Hz, then it would have an low pass index of 1. If it  exhibited a similar firing rate in response to a temporal frequency of 0.5 Hz even if the preferred  temporal frequency were higher, then the low pass index would still be near 1. If the cell  responded poorly at a temporal frequency of 0.5 Hz, then it would have a low pass index near 0.

      (8) The discussion should also cite the results  of strobe-reared cats by Pasternak et al (1981  and 1985).

      Thank you for pointing out the omission. We now write (lines 430-435):  Cats raised in a  strobe-light environment (mostly after eye opening) exhibited strong changes in subsequent  direction selectivity (Kennedy and Orban 1983; Humphrey and Saul 1998)  and behavioral  sensitivity to motion (Pasternak et al., 1981; Pasternak et al., 1985) that partially recovers with  motion detection training . However, temporal frequency  tuning of these animals has not been  reported in detail.  Pasternak et al (1981) reported  that strobe-reared ferrets exhibited greater  difficulty in distinguishing slow moving stimuli from static stimuli compared to controls, an  ability that slightly improved with practice, suggesting possible temporal frequency deficits.

      (9) Finally, it would be useful to include a mention  of the early development of MT in  marmosets in the discussion of impacts of prematurity on motion vision (Bourne & Rosa 2006).

      Yes, thank you. We cited Bourne & Rosa and also Lempel and Nielsen (for ferret PSS). (Lines  492-501):

      Several other basic mechanistic questions remain unanswered. It is unclear where in the visual  circuit cascade these deficits first arise. Does the lateral geniculate nucleus or retina exhibit  altered temporal frequency tuning? Is the influence of the patterned visual stimulation  instructive, so that if one provided premature stimulation with only certain temporal frequencies,  one would see selectivity for those temporal frequencies, or would tuning always be broad?  Other questions remain concerning the top-down influence on V1 from “higher” motion areas  such as MT (monkeys) or PSS (ferret); MT exhibits mature neural markers earlier than V1  (Bourne and Rosa, 2006), and suppression of PSS impacts motion selectivity in V1 (Lempel and  Nielsen, 2021).  Future studies will be needed to  address these questions.

    2. eLife Assessment

      This carefully conducted study aims to understand how the early visual experience of premature infants induces lasting deficits, including compromised motion processing. The authors address this important question in a ferret animal model, exposing the developing visual system prematurely to patterned visual input by opening one or both eyes at a time when both retinal waves and light traveling through closed lids can drive sensory responses. Convincing evidence is presented, suggesting that eye opening at this time impacts temporal frequency tuning and elevates spontaneous firing rates. These findings will have great relevance for neuroscientists studying visual system development, particularly in the context of premature birth.

    3. Reviewer #1 (Public review):

      The authors note that very premature infants experience the visual world early and, as a consequence, sustain lasting deficits including compromised motion processing. Here they investigate the effects of early eye opening in ferret, choosing a time point after birth when both retinal waves and light traveling through closed lids drive sensory responses. The laboratory has long experience in quantitative studies of visual response properties across development and this study reflects their expertise.

      The investigators find little or no difference in mean orientation and direction selectivity, or in spatial frequency tuning, as a result of early eye opening but marked differences in temporal frequency tuning. These changes are especially interesting as they relate to deficits seen in prematurely delivered children. Temporal frequency bandwidth for responses evoked from early-opened contralateral eyes were broader than for controls; this is the case for animals in which either one or both eyes were opened prematurely. Further, when only one eye was opened early, responses to low temporal frequencies were relatively stronger.

      The investigators also found changes in firing rate and sign of response to visual stimuli. Premature eye-opening increased spontaneous rates in all test configurations. When only one eye was opened early, firing rates recorded from the ipsilateral cortex were strongly suppressed, with more modest effects in other test cases.

      As the authors' discussion notes, these observations are just a starting point for studies underlying mechanism. The experiments are so difficult to perform and so carefully described that the results will be foundational for future studies of how premature birth influences cortical development.

    4. Reviewer #2 (Public review):

      In this paper, Griswold and Van Hooser investigate what happens if animals are exposed to patterned visual experience too early, before its natural onset. To this end, they make use of the benefits of the ferret as a well-established animal model for visual development. Ferrets naturally open their eyes around postnatal day 30; here, Griswold and Van Hooser opened either one or both eyes prematurely. Subsequent recordings in the mature primary visual cortex show that while some tuning properties like orientation and direction selectivity developed normally, the premature visual exposure triggered changes in temporal frequency tuning and overall firing rates. These changes were widespread, in that they occurred even for neurons responding to the eye that was not opened prematurely. These results demonstrate that the nature of the visual input well before eye opening can have profound consequences on the developing visual system.

      The conclusions of this paper are well supported by the data, but in the initially submitted version of the paper, there were a few questions regarding the data processing and suggestions for the discussion:

      (1) The assessment of the tuning properties is based on fits to the data. Presumably, neurons for which the fits were poor were excluded? It would be useful to know what the criteria were, how many neurons were excluded, and whether there was a significant difference between the groups in the numbers of neurons excluded (which could further point to differences between the groups).

      (2) For the temporal frequency data, low- and high-frequency cut-offs are defined, but then only used for the computation of the bandwidth. Given that the responses to low temporal frequencies change profoundly with premature eye opening, it would be useful to directly compare the low- and high-frequency cut-offs between groups, in addition to the index that is currently used.

      (3) In addition to the tuning functions and firing rates that have been analyzed so far, are there any differences in the temporal profiles of neural responses between the groups (sustained versus transient responses, rates of adaptation, latency)? If the temporal dynamics of the responses are altered significantly, that could be part of an explanation for the altered temporal tuning.

      (4) It would be beneficial for the general interpretation of the results to extend the discussion. First, it would be useful to provide a more detailed discussion of what type of visual information might make it through the closed eyelids (the natural state), in contrast to the structured information available through open eyes. Second, it would be useful to highlight more clearly that these data were collected in peripheral V1 by discussing what might be expected in binocular, more central V1 regions. Third, it would be interesting to discuss the observed changes in firing rates in the context of the development of inhibitory neurons in V1 (which still undergo significant changes through the time period of premature visual experience chosen here).

    1. eLife Assessment

      In this valuable study, the authors show the physiological response and molecular pathway mediating the effect of quinofumelin, a developed fungicide with an unknown mechanism. The authors present convincing data suggesting the involvement of the uridine/uracil biosynthesis pathway, by combining in vivo microbiology characterization as well as in vitro biochemical binding results.

    2. Reviewer #2 (Public review):

      Summary:

      In the current study, the authors aim to identify the mode of action/molecular mechanism of characterized a fungicide, quinofumelin, and its biological impact on transcriptomics and metabolomics in Fusarium graminearum and other Fusarium species. Two sets of data were generated between quinofumelin and no treatment group, and differentially abundant transcripts and metabolites were identified, suggesting a potential role of pyrimidine biosynthesis. Upon studying the genetic mutants of the uridine/uracil biosynthesis pathway with quinofumelin treatment and metabolite supplementation, combining in vitro biochemical assay of quinofumelin and F.graminearum dihydroorotate dehydrogenase protein, the authors identified that quinofumelin inhibits the dihydroorotate dehydrogenase and blocks downstream metabolite biosynthesis, limiting fungal metabolism and growth.

      Strengths:

      Omics datasets were leveraged to understand the physiological impact of quinofumelin, showing the intracellular impact of the fungicide. The characterization of FgDHODHII deletion strains with supplemented metabolites clearly showed the impact of the enzyme on fungal growth. Corroborating in vitro and in vivo data revealed the direct interaction of quinofumelin with Fusarium protein target.

      Potential Impact:

      Understanding this new mechanism could facilitate rational design or screen for molecules targeting the same pathway, or improve binding affinity and inhibitor potency. Confirming the target of quinofumelin may also help understand its resistance mechanism, and further development of other inhibitory molecules against the target.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript shows the mechanism of action of quinofumelin, a novel fungicide, against the fungus Fusarium graminearum. Through omics analysis, phenotypic analysis and in silico approaches, the role of quinofumelin in targeting DHODH is uncovered.

      Strengths:

      The phenotypic analysis and mutant generation are nice data and add to the role of metabolites in bypassing pyrimidine biosynthesis.

      Weaknesses:

      The role of DHODH in this class of fungicides has been known and this data does not add any further significance to the field.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Phytophathogens including fungal pathogens such as F. graminearum remain a major threat to agriculture and food security. Several agriculturally relevant fungicides including the potent Quinofumelin have been discovered to date, yet the mechanisms of their action and specific targets within the cell remain unclear. This paper sets out to contribute to addressing these outstanding questions.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      The paper is generally well-written and provides convincing data to support their claims for the impact of Quinofumelin on fungal growth, the target of the drug, and the potential mechanism. Critically the authors identify an important pyrimidine pathway dihydroorotate dehydrogenase (DHODH) gene FgDHODHII in the pathway or mechanism of the drug from the prominent plant pathogen F. graminearum, confirming it as the target for Quinofumelin. The evidence is supported by transcriptomic, metabolomic as well as MST, SPR, molecular docking/structural biology analyses.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      Whilst the study adds to our knowledge about this drug, it is, however, worth stating that previous reports (although in different organisms) by Higashimura et al., 2022 https://pmc.ncbi.nlm.nih.gov/articles/PMC9716045/ had already identified DHODH as the target for Quinofumelin and hence this knowledge is not new and hence the authors may want to tone down the claim that they discovered this mechanism and also give sufficient credit to the previous authors work at the start of the write-up in the introduction section rather than in passing as they did with reference 25? other specific recommendations to improve the text are provided in the recommendations for authors section below.

      We appreciate the reviewer's suggestion. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of previous work on quinofumelin by Higashimura et al., 2022 in the discussion section to more effectively contextualize their contributions. Moreover, we have made revisions and provided responses in accordance with the recommendations.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors aim to identify the mode of action/molecular mechanism of characterized a fungicide, quinofumelin, and its biological impact on transcriptomics and metabolomics in Fusarium graminearum and other Fusarium species. Two sets of data were generated between quinofumelin and no treatment group, and differentially abundant transcripts and metabolites were identified. The authors further focused on uridine/uracil biosynthesis pathway, considering the significant up- and down-regulation observed in final metabolites and some of the genes in the pathways. Using a deletion mutant of one of the genes and in vitro biochemical assays, the authors concluded that quinofumelin binds to the dihydroorotate dehydrogenase.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      Omics datasets were leveraged to understand the physiological impact of quinofumelin, showing the intracellular impact of the fungicide. The characterization of FgDHODHII deletion strains with supplemented metabolites clearly showed the impact of the enzyme on fungal growth.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      Some interpretation of results is not accurate and some experiments lack controls. The comparison between quinofumelin-treated deletion strains, in the presence of different metabolites didn't suggest the fungicide is FgDHODHII specific. A wild type is required in this experiment.

      Potential Impact: Confirming the target of quinofumelin may help understand its resistance mehchanism, and further development of other inhibitory molecules against the target.

      The manuscript would benefit more in explaining the study rationale if more background on previous characterization of this fungicide on Fusarium is given.

      We appreciate the reviewer's suggestion. Under no treatment with quinofumelin, mycelial growth remains normal and does not require restoration. In the presence of quinofumelin treatment, the supplementation of downstream metabolites in the de novo pyrimidine biosynthesis pathway can restore mycelial growth that is inhibited by quinofumelin. The wild-type control group is illustrated in Figure 4. Figure 5b depicts the phenotypes of the deletion mutants. With respect to the relationship among quinofumelin, FgDHODHII, and other metabolites, quinofumelin specifically targets the key enzyme FgDHODHII in the de novo pyrimidine biosynthesis pathway, disrupting the conversion of dihydroorotate to orotate, which consequently inhibits the synthesis downstream metabolites including uracil. In our previous study, quinofumelin not only exhibited excellent antifungal activity against the mycelial growth and spore germination of F. graminearum, but also inhibited the biosynthesis of deoxynivalenol (DON). We have added this part to the introduction section.

      Reviewer #3 (Public review):

      Summary:

      The manuscript shows the mechanism of action of quinofumelin, a novel fungicide, against the fungus Fusarium graminearum. Through omics analysis, phenotypic analysis, and in silico approaches, the role of quinofumelin in targeting DHODH is uncovered.

      We appreciate the reviewer's accurate summary of our manuscript.

      Strengths:

      The phenotypic analysis and mutant generation are nice data and add to the role of metabolites in bypassing pyrimidine biosynthesis.

      We appreciate the reviewer's recognition of the strengths of our manuscript.

      Weaknesses:

      The role of DHODH in this class of fungicides has been known and this data does not add any further significance to the field. The work of Higashimura et al is not appreciated well enough as they already showed the role of quinofumelin upon DHODH II.

      There is no mention of the other fungicide within this class ipflufenoquin, as there is ample data on this molecule.

      We appreciate the reviewer's suggestion. We sincerely appreciate the reviewer's insightful comment regarding the work of Higashimura et al. We agree that their investigation into the role of quinofumelin in DHODH II inhibition provides critical foundational insights for this field. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of their work in the discussion section to more effectively contextualize their contributions. The information regarding action mechanism of ipflufenoquin against filamentous fungi was added in discussion section.

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the DHODH gene had been identified as a target earlier, could the authors perform blast experiments with this gene instead and let us know the percentage similarity between the FgDHODHII gene and the Pyricularia oryzae class II DHODH gene in the report by Higashimura et al., 2022.

      BLAST experiment revealed that the percentage similarity between the FgDHODHII gene and the class II DHODH gene of P. oryzae was 55.41%. We have added the description ‘Additionally, the amino acid sequence of the FgDHODHII exhibits 55.41% similarity to that of DHODHII from Pyricularia oryzae, as previously reported (Higashimura et al., 2022)’ in section Results.

      (2) Abstract:

      The authors started abbreviating new terms e.g. DEG, DMP, etc but then all of a sudden stopped and introduced UMP with no full meaning of the abbreviation. Please give the full meaning of all abbreviations in the text, UMP, STC, RM, etc.

      We have provided the full meaning for all abbreviations as requested.

      (3) Introduction section:

      The introduction talks very little about the work of other groups on quinofumelin. Perhaps add this information in and reference them including the work of Higashimura et al., 2022 which has done quite significant work on this topic but is not even mentioned in the background

      We have added the work of other groups on quinofumelin in section introduction.

      (4) General statements:

      Please show a model of the pyrimidine pathway that quinofumelin attacks to make it easier for the reader to understand the context. They could just copy this from KEGG

      We have added the model (Fig. 7).

      (5) Line 186:

      The authors did a great job of demonstrating interactions with the Quinofumelin and went to lengths to perform MST, SPR, molecular docking, and structural biology analyses yet in the end provide no details about the specific amino acid residues involved in the interaction. I would suggest that site-directed mutagenesis studies be performed on FgDHODHII to identify specific amino acid residues that interact with Quinofumelin and show that their disruption weakens Quinofumelin interaction with FgDHODHII.

      Thank you for this insightful suggestion. We fully agree with the importance of elucidating the interaction mechanism. At present, we are conducting site-directed mutagenesis studies based on interaction sites from docking results and the mutation sites of FgDHODHII from the resistant mutants; however, due to the limitations in the accuracy of existing predictive models, this work remains ongoing. Additionally, we are undertaking co-crystallization experiments of FgDHODHII with quinofumelin to directly and precisely reveal their interaction pattern

      (6) Line 76:

      What is the reference or evidence for the statement 'In addition, quinofumelin exhibits no cross-resistance to currently extensively used fungicides, indicating its unique action target against phytopathogenic fungi.

      If two fungicides share the same mechanism of action, they will exhibit cross resistance. Previous studies have demonstrated that quinofumelin retains effective antifungal activity against fungal strains resistant to commercial fungicides, indicating that quinofumelin does not exhibit cross-resistance with other commercially available fungicides and possesses a novel mechanism of action. Additionally, we have added the relevant inference.

      (7) Line 80-82:

      Again, considering the work of previous authors, this target is not newly discovered. Please consider toning down this statement 'This newly discovered selective target for antimicrobial agents provides a valuable resource for the design and development of targeted pesticides.'

      We have rewritten the description of this sentence.

      (8) Line 138: If the authors have identified DHODH in experimental groups (I assume in F. graminearum), what was the exact locus tag or gene name in F. graminearum, and why not just continue with this gene you identified or what is the point of doing a blast again to find the gene if the DHODH gene if it already came up in your transcriptomic or metabolic studies? This unfortunately doesn't make sense but could be explained better.

      The information of FgDHODHII (gene ID: FGSG_09678) has been added. We have revised this part.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 40:

      Please add a reference.

      We have added the reference

      (2) Line 47:

      Please add a reference.

      We have added the reference.

      (3) Line 50:

      The lack of target diversity in existing fungicides doesn't necessarily serve as a reason for discovering new targets being more challenging than identifying new fungicides within existing categories, please consider adjusting the argument here. Instead, the authors can consider reasons for the lack of new targets in the field.

      We have revised the description.

      (4) Line 63:

      Please cite your source with the new technology.

      We have added the reference.

      (5) Line 68:

      What are you referring to for "targeted medicine", do you have a reference?

      We have revised the description and the reference.

      (6) Line 74:

      One of the papers referred to "quinoxyfen", what are the similarities and differences between the two? Please elaborate for the readership.

      Quinoxyfen, similar to quinofumelin, contains a quinoline ring structure. It inhibits mycelial growth by disrupting the MAP kinase signaling pathway in fungi (https://www.frac.info). In addition, quinoxyfen still exhibits excellent antifungal activity against the quinofumelin-resistant mutants (the findings from our group), indicating that action mechanism for quinofumelin and quinoxyfen differ.

      (7) Line 84:

      Please introduce why RNA-Seq was designed in the study first. What were the groups compared? How was the experiment set up? Without this background, it is hard to know why and how you did the experiment.

      According to your suggestions, we have added the description in Section Results. In addition, the experimental process was described in Section Materials and methods as follows: A total of 20 mL of YEPD medium containing 1 mL of conidia suspension (1×105 conidia/mL) was incubated with shaking (175 rpm/min) at 25°C. After 24 h, the medium was added with quinofumelin at a concentration of 1 μg/mL, while an equal amount of dimethyl sulfoxide was added as the control (CK). The incubation continued for another 48 h, followed by filtration and collection of hyphae. Carry out quantitative expression of genes, and then analyze the differences between groups based on the results of DESeq2 for quantitative expression.

      (8) Figures:

      The figure labeling is missing (Figures 1,2,3 etc). Please re-order your figure to match the text

      The figures have been inserted.

      (9) Line. 97:

      "Volcano plot" is a common plot to visualize DEGs, you can directly refer to the name.

      We have revised the description.

      (10) Figure 1d, 1e:

      Can you separate down- and up-regulated genes here? Does the count refer to gene number?

      The expression information for down- and up-regulated genes is presented in Figure 1a and 1b. However, these bubble plots do not distinguish down- and up-regulated genes. Instead, they only display the significant enrichment of differentially expressed genes in specific metabolic pathways. To more clearly represent the data, we have added the detailed counts of down- and up-regulated genes for each metabolic pathway in Supplementary Table S1 and S2. Here, the term "count" refers to differentially expressed genes that fall within a certain pathway.

      (11) Line 111:

      Again, no reasoning or description of why and how the experiment was done here.

      Based on the results of KEGG enrichment analysis, DEMs are associated with pathways such as thiamine metabolism, tryptophan metabolism, nitrogen metabolism, amino acid sugar and nucleotide sugar metabolism, pantothenic acid and CoA biosynthesis, and nucleotide sugar production compounds synthesis. To specifically investigate the metabolic pathways involved action mechanism of quinofumelin, we performed further metabolomic experiments. Therefore, we have added this description according the reviewer’s suggestions.

      (12) Figure 2a:

      It seems many more metabolites were reduced than increased. Is this expected? Due to the antifungal activity of this compound, how sick is the fungus upon treatment? A physiological study on F. graminearum (in a dose-dependent manner) should be done prior to the omics study. Why do you think there's a stark difference between positive and negative modes in terms of number of metabolites down- and up-regulated?

      Quinofumelin demonstrates exceptional antifungal activity against Fusarium graminearum. The results indicate that the number of reduced metabolites significantly exceeds the number of increased metabolites upon quinofumelin treatment. Mycelial growth is markedly inhibited under quinofumelin exposure. Prior to conducting omics studies, we performed a series of physiological and biochemical experiments (refer to Qian Xiu's dissertation https://paper.njau.edu.cn/openfile?dbid=72&objid=50_49_57_56_49_49&flag=free). Upon quinofumelin treatment, the number of down-regulated metabolites notably surpasses that of up-regulated metabolites compared to the control group. Based on the findings from the down-regulated metabolites, we conducted experiments by exogenously supplementing these metabolites under quinofumelin treatment to investigate whether mycelial growth could be restored. The results revealed that only the exogenous addition of uracil can restore mycelial growth impaired by quinofumelin.

      Quinofumelin exhibits an excellent antifungal activity against F. graminearum. At a concentration of 1 μg/mL, quinofumelin inhibits mycelial growth by up to 90%. This inhibitory effect indicates that life activities of F. graminearum are significantly disrupted by quinofumelin. Consequently, there is a marked difference in down- and up-regulated metabolites between quinofumelin-treated group and untreated control group. The detailed results were presented in Figures 1 and 2.

      (13) Figure 2e:

      This is a good analysis. To help represent the data more clearly, the authors can consider representing the expression using fold change with a p-value for each gene.

      To more clearly represent the data, we have incorporated the information on significant differences in metabolites in the de novo pyrimidine biosynthesis pathway, as affected by quinofumelin, in accordance with the reviewer’s suggestions.

      (14) Line 142:

      Please indicate fold change and p-value for statistical significance. Did you validate this by RT-qPCR?

      We validated the expression level of the DHODH gene under quinofumelin treatment using RT-qPCR. The results indicated that, upon treatment with the EC50 and EC90 concentrations of quinofumelin, the expression of the DHODH gene was significantly reduced by 11.91% and 33.77%, respectively (P<0.05). The corresponding results have been shown in Figure S4.

      (15) Line 145:

      It looks like uracil is the only metabolite differentially abundant in the samples - how did you conclude this whole pathway was impacted by the treatment?

      The experiments involving the exogenous supplementation of uracil revealed that the addition of uracil could restore mycelial growth inhibited by quinofumelin. Consequently, we infer that quinofumelin disrupts the de novo pyrimidine biosynthesis pathway. In addition, as uracil is the end product of the de novo pyrimidine biosynthesis pathway, the disruption of this pathway results in a reduction in uracil levels.

      (16) Figure 3:

      What sequence was used as the root of the tree? Why were the species chosen? Since the BLAST query was Homo sapiens sequence, would it be good to use that as the root?

      FgDHODHII sequence was used as the root of the tree. These selected fungal species represent significant plant-pathogenic fungi in agriculture production. According to your suggestion, we have removed the BLAST query of Homo sapiens in Figure 3.

      (17) Figure 4:

      How were the concentrations used to test chosen?

      Prior to this experiment, we carried out concentration-dependent exogenous supplementation experiments. The results indicated that 50 μg/mL of uracil can fully restore mycelial growth inhibited by quinofumelin. Consequently, we chose 50 μg/mL as the testing concentration.

      (18) Line 164:

      Why do you hypothesize supplementing dihydroorotate would restore resistance? The metabolite seemed accumulated in the treatment condition, whereas downstream metabolites were comparable or even depleted. The DHODH gene expression was suppressed. Would accumulation of dihydroorotate be associated with growth inhibition by quinofumelin? Please include the hypothesis and rationale for the experimental setup.

      DHODH regulates the conversion of dihydroorotate to orotate in the de novo pyrimidine biosynthesis pathway. The inhibition of DHODH by quinofumelin results in the accumulation of dihydroorotate and the depletion of the downstream metabolites, including UMP, uridine and uracil. Consequently, downstream metabolites were considered as positive controls, while upstream metabolite dihydroorotate served as a negative control. This design further demonstrates DHODH as action target of quinofumelin against F. graminearum. In addition, the accumulation of dihydroorotate is not associated with growth inhibition by quinofumelin; however, but the depletion of downstream metabolites in the de novo pyrimidine biosynthesis pathway is closely associated with growth inhibition by quinofumelin.

      (19) Line 168:

      I'm not sure if this conclusion is valid from your results in Figure 4 showing which metabolites restore growth.

      o minimize the potential influence of strain-specific effects, five strains were tested in the experiments shown in Figure 4. For each strain, the first row (first column) corresponds to control condition, while second row (first column) represents treatment with 1 μg/mL of quinofumelin, which completely inhibits mycelial growth. The second row (second column) for each strain represents the supplementation with 50 μg/mL of dihydroorotate fails to restore mycelial growth inhibited by quinofumelin. In contrast, the second row (third column, fourth column, fifth colomns) for each strain demonstrated that the supplementation of 50 μg/mL of UMP, uridine and uracil, respectively, can effectively restore mycelial growth inhibited by quinofumelin.

      (20) Figure 5a:

      The fact you saw growth of the deletion mutant means it's not lethal. However, the growth was severely inhibited.

      Our experimental results indicate that the growth of the deletion mutant is lethal. The mycelial growth observed originates from mycelial plugs that were not exposed to quinofumelin, rather than from the plates amended with quinofumelin.

      (21) Figure 5b:

      Would you expect different restoration of growth in the presence of quinofumelin vs. no treatment? The wild type control is missing here. Any conclusions about the relationship between quinofumelin, FgDHODHII, and other metabolites in the pathway?

      Under no treatment with quinofumelin, mycelial growth remains normal and does not require restoration. In the presence of quinofumelin treatment, the supplementation of downstream metabolites in the de novo pyrimidine biosynthesis pathway can restore mycelial growth that is inhibited by quinofumelin. The wild-type control group is illustrated in Figure 4. Figure 5b depicts the phenotypes of the deletion mutants. With respect to the relationship among quinofumelin, FgDHODHII, and other metabolites, quinofumelin specifically targets the key enzyme FgDHODHII in the de novo pyrimidine biosynthesis pathway, disrupting the conversion of dihydroorotate to orotate, which consequently inhibits the synthesis downstream metabolites including uracil.

      (22) Figure 6b:

      Lacking positive and negative controls (known binder and non-binder). What does the Kd (in comparison to other interactions) indicate in terms of binding strength?

      We tested the antifungal activities of publicly reported DHODH inhibitors (such as leflunomide and teriflunomide) against F. graminearum. The results showed that these inhibitors exhibited no significant inhibitory effects against the strain PH-1. Therefore, we lacked an effective chemical for use as a positive control in subsequent experiments. Biacore experiments offers detailed insights into molecular interactions between quinofumelin and DHODHII. As shown in Figure 6b, the left panel illustrates the time-dependent kinetic curve of quinofumelin binding to DHODHII. Within the first 60 s after quinofumelin was introduced onto the DHODHII surface, it bound to the immobilized DHODHII on the chip surface, with the response value increasing proportionally to the quinofumelin concentration. Following cessation of the injection at 60 s, quinofumelin spontaneously dissociated from the DHODHII surface, leading to a corresponding decrease in the response value. The data fitting curve presented on the right panel indicates that the affinity constant KD of quinofumelin for DHODHII is 6.606×10-6 M, which falls within the typical range of KD values (10-3 ~ 10-6 M) for protein-small molecule interaction patterns. A lower KD value indicates a stronger affinity; thus, quinofumelin exhibits strong binding affinity towards DHODHII.

      Reviewer #3 (Recommendations for the authors):

      The authors should add information about the other molecule within this class, ipflufenoquin, and what is known about it. There are already published data on its mode of action on DHODH and the role of pyrimidine biosynthesis.

      We have added the information regarding action mechanism of ipflufenoquin against filamentous fungi in discussion section.

      The work of Higashimura et al is not appreciated well enough as they already showed the role of quinofumelin upon DHODH II.

      We sincerely appreciate the reviewer's insightful comment regarding the work of Higashimura et al. We agree that their investigation into the role of quinofumelin in DHODH II inhibition provides critical foundational insights for this field. In the revised manuscript, we have incorporated the reference in the introduction section and expanded the discussion of their work in the discussion section to more effectively contextualize their contributions.

      It is unclear how the protein model was established and this should be included. What species is the molecule from and how was it obtained? How are they different from Fusarium?

      The three-dimensional structural model of F. graminearum DHODHII protein, as predicted by AlphaFold, was obtained from the UniProt database. Additionally, a detailed description along with appropriate citations has been incorporated in the ‘Manuscript’ file.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      We thank the reviewer for the positive feedback on the work. The reviewer has raised two weaknesses and in the following we discuss how those can be addressed.  

      Weaknesses:

      The impact of the article is limited by using a network with discrete time- steps, and only a small number of time steps from stimulus to reward. They assume that each time step is on the order of hundreds of ms. They justify this by pointing to some slow intrinsic mechanisms, but they do not implement these slow mechanisms is a network with short time steps, instead they assume without demonstration that these could work as suggested. This is a reasonable first approximation, but its validity should be explicitly tested.

      Our goal here was to give a proof of concept that online random feedback is sufficient to train an RNN to estimate value. Indeed, it is important to show that the idea works in a model where the slow mechanisms are explicitly implemented. However, this is a non-trivial task and desired to be addressed in future works.  

      As the delay between cue and reward increases the performance decreases. This is not surprising given the proposed mechanism, but is still a limitation, especially given that we do not really know what a is the reasonable value of a single time step.

      In reply to this comment and the other reviewer's related comment, we have conducted two sets of additional simulations, one for examining incorporation of eligibility traces, and the other for considering (though not mechanistically implementing) behavioral time-scale synaptic plasticity (BTSP). We have added their results to the revised manuscript as Appendix. We think that the results addressed this point to some extent while how longer cue-reward delay can be learnt by elaboration of the model remains as a future issue.

      Reviewer #2 (Public Review):

      We thank the reviewer for the positive feedback on the work. The reviewer gave comments on our revisions, and here we discuss how those can be addressed.

      Comments on revisions: I would still want to see how well the network learns tasks with longer time delays (on the order of 100 or even 1000 timesteps). Previous work has shown that random feedback struggles to encode longer timescales (see Murray 2019, Figure 2), so I would be interested to see how that translates to the RL context in your model.

      We would like to note that in Murray et al 2019 the random feedback per se appeared not to be primarily responsible for the difficulty in encoding longer timesclaes. In the Figure 2d (Murray 2019), the author compared his RFLO (random feedback local online) and BPTT with two intermediate algorithms, which incorporated either one of the two approximations made in RFLO: i) random feedback instead of symmetric feedback, and ii) omittance of non-local effect (i.e., dependence of the derivative of the loss with respect to a given weight on the other weights). The performance difference between RFLO and BPTT was actually mostly explained by ii), as the author mentioned "The results show that the local approximation is essentially fully responsible for the performance difference between RFLO and BPTT, while there is no significant loss in performance due to the random feedback alone. (Line 6-8, page 7 of Murray, 2019, eLife)".

      Meanwhile, regarding the difference in the performance of the model with random feedback vs the model with symmetric feedback in our settings, actually it appeared (already) in the case with 6 time-steps or less (the biologically constrained model with random feedback performed worse: Fig. 6J, left).

      In practice, our model, either with random or symmetric feedback, would not be able to learn the cases with very long delays. This is indeed a limitation of our model. However, our model is critically different from the model of Murray 2019 in that we use RL rather than supervised learning and we use a scalar bootstrapped (TD) reward-prediction-error rather than the true output error. We would think that these differences may be major reasons for the limited learning ability of our model.

      Regarding the feasibility of the model when tasks involve longer time delays: Indeed this is a problem and the other reviewers have also raised the same point. Our model can be extended by incorporating either a kind of eligibility trace (similar one to those contained in RFLO and e-prop) or behavioral time-scale synaptic plasticity (BTSP), and we have added the results of simulations incorporating each to the revised manuscript as Appendix. But how longer cue-reward delay can be learnt by elaboration of the model remains as a future issue.

      Reviewer #3 (Public Review):

      Comments on revisions: Thank you for addressing all my comments in your reply.

      We are happy to learn that all concerns raised by the reviewer in the previous round were addressed adequately. We agree with the reviewer that there are several ways the work can be improved.

      The various points raised by the reviewers at weaknesses are desired to be taken up in future works.

    2. eLife Assessment

      In this important study, the authors model reinforcement-learning experiments using a recurrent neural network. The work examines if the detailed credit assignment necessary for back-propagation through time can be replaced with random feedback. The authors provide solid evidence that the solution is adequate within relatively simple tasks.

    3. Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are consistent with previous results on random feedback.

      Strengths:

      The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

    4. Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropogation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain non-negative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to larger networks or more complicated tasks with long temporal delays (>100 timesteps), so it remains unclear to what degree these methods can scale or can be used more generally.

    5. Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant, since it connects with experimental settings of reward conditioning with possible plasticity measurements.

    1. eLife Assessment

      This important study reveals how Drosophila may be used to investigate the role of missense variants in the PLCG1 phospholipase gene in human diseases. The experimental evidence is compelling and brings together rigorous analysis of clinical and model organism phenotypes with a structural analysis of the PLCG1 protein.

    2. Reviewer #2 (Public review):

      The manuscript by Ma et al. reports the identification of three unrelated people who are heterozygous for de novo missense variants in PLCG1, which encodes phospholipase C-gamma 1, a key signaling protein. These individuals present with partially overlapping phenotypes, including hearing loss, ocular pathology, cardiac defects, abnormal brain imaging results, and immune defects. None of the patients present with all of the above phenotypes. PLCG1 has also been implicated as a possible driver for cell proliferation in cancer.

      The three missense variants found in the patients result in the following amino acid substitutions: His380Arg, Asp1019Gly, and Asp1165Gly. PLCG1 (and the closely related PLCG2) have a single Drosophila ortholog called small wing (sl). sl-null flies are viable but have small wings with ectopic wing veins and supernumerary photoreceptors in the eye. As all three amino acids affected in the patients are conserved in the fly protein, in this work Ma et al. tested whether they are pathogenic by expressing either reference or patient variant fly or human genes in Drosophila and determining the phenotypes produced by doing so.

      Expression in Drosophila of the variant forms of PLCG1 found in these three patients is toxic; highly so for Asp1019Gly and Asp1165Gly, much more modestly for His380Arg. Another variant, Asp1165His which was identified in lymphoma samples and shown by others to be hyperactive, was also found to be toxic in the Drosophila assays. However, a final variant, Ser1021Phe, identified by others in an individual with severe immune dysregulation, produced no phenotype upon expression in flies.

      Based on these results, the authors conclude that the PLCG1 variants found in patients are pathogenic, producing gain-of-function phenotypes through hyperactivity. In my view, the data supporting this conclusion are robust, despite the lack of a detectable phenotype with Ser1021Phe, and I have no concerns about the core experiments that comprise the paper.

      Fig. 6, the last in the paper, provides information about PLCG1 structure and how the different variants would affect it. It shows that His380, Asp1019 and Asp1165 all lie within catalytic domains or intramolecular interfaces, and that variants in the latter two affect residues essential for autoinhibition. It also shows that Ser1021 falls outside the key interface occupied by Asp1019, but more could have been said about the potential effects of Ser1021Phe.

      Overall, I believe the authors fully achieved the aims of their study. The work will have a substantial impact because it reports the identification of novel disease-linked genes, and because it further demonstrates the high value of the Drosophila model for finding and understanding gene-disease linkages.

      Comments on revisions:

      The single recommendation I made on the original version, which was to further examine H380 mutants, has been satisfactorily addressed in the revised version.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript provides an initial characterization of three new missense variants of the PLCG1 gene associated with diverse disease phenotypes, utilizing a Drosophila model to investigate their molecular effects in vivo. Through the meticulous creation of genetic tools, the study assesses the small wing (sl) phenotype - the fly's ortholog of PLCG1 - across an array of phenotypes from longevity to behavior in both sl null mutants and variants. The findings indicate that the Drosophila PLCG1 ortholog displays aberrant functions. Notably, it is demonstrated that overexpression of both human and Drosophila PLCG1 variants in fly tissue leads to toxicity, underscoring their pathogenic potential in vivo.

      Strengths:

      The research effectively highlights the physiological significance of sl in Drosophila. In addition, the study establishes the in vivo toxicity of disease-associated variants of both human PLCG1 and Drosophila sl.

      Weaknesses:

      The study's limitations include the human PLCG1 transgene's inability to compensate for the Drosophila sl null mutant phenotype, suggesting potential functional divergence between the species. This discrepancy signals the need for additional exploration into the mechanistic nuances of PLCG1 variant pathogenesis, especially regarding their gain-of-function effects in vivo.

      Overall:

      The study offers compelling evidence for the pathogenicity of newly discovered disease-related PLCG1 variants, manifesting as toxicity in a Drosophila in vivo model, which substantiates the main claim by the authors. Nevertheless, a deeper inquiry into the specific in vivo mechanisms driving the toxicity caused by these variants in Drosophila could significantly enhance the study's impact.

      Reviewer #2 (Public Review):

      The manuscript by Ma et al. reports the identification of three unrelated people who are heterozygous for de novo missense variants in PLCG1, which encodes phospholipase C-gamma 1, a key signaling protein. These individuals present with partially overlapping phenotypes including hearing loss, ocular pathology, cardiac defects, abnormal brain imaging results, and immune defects. None of the patients present with all of the above phenotypes. PLCG1 has also been implicated as a possible driver for cell proliferation in cancer.

      The three missense variants found in the patients result in the following amino acid substitutions: His380Arg, Asp1019Gly, and Asp1165Gly. PLCG1 (and the closely related PLCG2) have a single Drosophila ortholog called small wing (sl). sl-null flies are viable but have small wings with ectopic wing veins and supernumerary photoreceptors in the eye. As all three amino acids affected in the patients are conserved in the fly protein, in this work Ma et al. tested whether they are pathogenic by expressing either reference or patient variant fly or human genes in Drosophila and determining the phenotypes produced by doing so.

      Expression in Drosophila of the variant forms of PLCG1 found in these three patients is toxic; highly so for Asp1019Gly and Asp1165Gly, much more modestly for His380Arg. Another variant, Asp1165His which was identified in lymphoma samples and shown by others to be hyperactive, was also found to be toxic in the Drosophila assays. However, a final variant, Ser1021Phe, identified by others in an individual with severe immune dysregulation, produced no phenotype upon expression in flies.

      Based on these results, the authors conclude that the PLCG1 variants found in patients are pathogenic, producing gain-of-function phenotypes through hyperactivity. In my view, the data supporting this conclusion are robust, despite the lack of a detectable phenotype with Ser1021Phe, and I have no concerns about the core experiments that comprise the paper.

      Figure 6, the last in the paper, provides information about PLCG1 structure and how the different variants would affect it. It shows that the His380, Asp1019, and Asp1165 all lie within catalytic domains or intramolecular interfaces and that variants in the latter two affect residues essential for autoinhibition. It also shows that Ser1021 falls outside the key interface occupied by Asp1019, but more could have been said about the potential effects of Ser1021Phe.

      Overall, I believe the authors fully achieved the aims of their study. The work will have a substantial impact because it reports the identification of novel disease-linked genes, and because it further demonstrates the high value of the Drosophila model for finding and understanding gene-disease linkages.

      Reviewer #3 (Public Review):

      Summary:

      The paper attempts to model the functional significance of variants of PLCG2 in a set of patients with variable clinical manifestations.

      Strengths:

      A study attempting to use the Drosophila system to test the function of variants reported from human patients.

      Weaknesses:

      Additional experiments are needed to shore up the claims in the paper. These are listed below.

      Major Comments:

      (1) Does the pLI/ missense constraint Z score prediction algorithm take into consideration whether the gene exhibits monoallelic or biallelic expression?

      To our knowledge, pLI and missense Z don't consider monoallelic or biallelic expression. Instead, they reflect sequence constraint and are calculated based on the observed versus expected variant frequencies in population databases.

      (2) Figure 1B: Include human PLCG2 in the alignment that displays the species-wide conserved variant residues.

      We have updated Figure 1B and incorporated the alignment of PLCG2.

      (3) Figure 4A:

      Given that

      (i) sl is predicted to be the fly ortholog for both mammalian PLCγ isozymes: PLCG1 and PLCG2 [Line 62]

      (ii) they are shown to have non-redundant roles in mammals [Line 71]

      (iii) reconstituting PLCG1 is highly toxic in flies, leading to increased lethality.

      This raises questions about whether sl mutant phenotypes are specifically caused by the absence of PLCG1 or PLCG2 functions in flies. Can hPLCG2 reconstitution in sl mutants be used as a negative control to rule out the possibility of the same?

      The studies about the non-redundant roles of PLCG1 and PLCG2 mainly concern the immune system.

      We have assessed the phenotypes in the sl<sup>T2A</sup>/Y; UAS-hPLCG2 flies. Expression of human PLCG2 in flies is also toxic and leads to severely reduced eclosion rate.

      We have updated the manuscript with these results, and included the eclosion rate of sl<sup>T2A</sup>/Y; UAS-hPLCG2 flies in the new Figure 4B.

      (4) Do slT2A/Y; UAS-PLCG1Reference flies survive when grown at 22{degree sign}C? Since transgenic fly expressing PLCG1 cDNA when driven under ubiquitous gal4s, Tubulin and Da, can result in viable progeny at 22{degree sign}C, the survival of slT2A/Y; UAS-PLCG1Reference should be possible.

      The eclosion rate of sl<sup>T2A</sup>/Y >PLCG1<sup>Reference</sup> flies at 22°C is slightly higher than at 25°C, but remains severely reduced compared to the UAS-Empty control. We have presented these results in the updated Figure S3.

      and similarly

      Does slT2A flies exhibit the phenotypes of (i) reduced eclosion rate (ii) reduced wing size and ectopic wing veins and (iii) extra R7 photoreceptor in the fly eye at 22{degree sign}C?

      The mutant phenotypes are still observed at 22 °C.

      If so, will it be possible to get a complete rescue of the slT2A mutant phenotypes with the hPLCG1 cDNA at 22{degree sign}C? This dataset is essential to establish Drosophila as an ideal model to study the PLCG1 de novo variants.

      Thank you for the suggestion. It is difficult to directly assess the rescue ability of the PLCG1 cDNAs due to the toxicity. However, our ectopic expression assays show that the variants are more toxic than the reference with variable severities, suggesting that the variants are deleterious.

      The ectopic expression strategy has been used to evaluate the consequence of genetic variants and has significantly contributed to the interpretation of their pathogenicity in many cases (reviewed in Her et al., Genome, 2024, PMID: 38412472).

      (5) Localisation and western blot assays to check if the introduction of the de novo mutations can have an impact on the sub-cellular targeting of the protein or protein stability respectively.

      Thank you for the suggestion.

      We expressed PLCG1 cDNAs in the larval salivary glands and performed antibody staining (rabbit anti-Human PLCG1; 1:100, Cell Signaling Technology, #5690). The larval salivary gland are composed of large columnar epithelia cells that are ideal for analyzing subcellular localization of proteins. The PLCG1 proteins are cytoplasmic and localize near the cell surface, with some enrichment in the plasma membrane region. The variant proteins are detected, and did not show significant difference in expression level or subcellular distribution compared to the reference. We did not include this data.

      (6) Analysing the nature of the reported gain of function (experimental proof for the same is missing in the manuscript) variants:

      Instead of directly showing the effect of introducing the de novo variant transgenes in the Drosophila model especially when the full-length PLCG1 is not able to completely rescue the slT2A phenotype;

      (i) Show that the gain-of-function variants can have an impact on the protein function or signalling via one of the three signalling outputs in the mammalian cell culture system: (i) inositol-1,4,5-trisphosphate production, (ii) intracellular Ca2+ release or (iii) increased phosphorylation of extracellular signal-related kinase, p65, and p38.

      We appreciate the reviewer’s suggestion. We utilized the CaLexA (calcium-dependent nuclear import of LexA) system (Masuyama et al., J Neurogenet, 2012, PMID: 22236090) to assess the intracellular Ca<sup>2+</sup> change associated with the expression of PLCG1 cDNAs in fly wing discs. The results show that, compared to the reference, expression of the D1019G or D1165G variants leads to elevated intracellular Ca<sup>2+</sup> levels, similar to the hyperactive S1021F and D1165H variants. However, the H380R or L597F variants did not show a detectable phenotype in this assay. These results suggest that D1019G and D1165G are hyperactive variants, whereas H380R and L597F variant are not, or their effect is too mild to be detected in this assay. We have updated the related sections in the manuscript and Figures 5A and S5.

      OR

      (ii) Run a molecular simulation to demonstrate how the protein's auto-inhibited state can be disrupted and basal lipase activity increased by introducing D1019G and D1165G, which destabilise the association between the C2 and cSH2 domains. The H380R variant may also exhibit characteristics similar to the previously documented H335A mutation which leaves the protein catalytically inactive as the residue is important to coordinate the incoming water molecule required for PIP2 hydrolysis.

      We utilized the DDMut platform, which predicts changes in the Gibbs Free Energy (ΔΔG) upon single and multiple point mutations (Zhou et al., Nucleic Acid Res, 2023, PMID: 37283042), to gain insight into the molecular dynamics changes of variants. The results are now presented in Figure S7.

      Additionally, we performed Molecular dynamics (MD) simulations. The results show that, similar to the hyperactive D1165H variant, the D1019G and D11656G variants exhibit increased disorganization, with a higher root mean square deviations (RMSD) compared to the reference PLCG1.The data are also presented in the updated Figure S7.

      (7) Clarify the reason for carrying out the wing-specific and eye-specific experiments using nub-gal4 and eyless-gal4 at 29˚C despite the high gal4 toxicity at this temperature.

      We used high temperature and high expression level to see if the mild H380R and L597F variants could show phenotypes in this condition.

      The toxicity of the two strong variants (D1019G and D1165G) has been consistently confirmed in multiple assays at different temperatures.

      (8) For the sake of completeness the authors should also report other variants identified in the genomes of these patients that could also contribute to the clinical features.

      Thank you!

      The additional variants and their potential contributions to the clinical features are listed and discussed in Table 1 and its legend.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript's significant contribution is tempered by a lack of comprehensive analysis using the generated genetic reagents in Drosophila. To enhance our understanding of the PLCG1 orthologs, I suggest the following:

      (1) A more detailed molecular analysis to distinguish the actions of sl variants from the wild-type could be very informative. For example, utilizing the HA-epitope tag within the current UAS-transgenes could reveal more about the cellular dynamics and abundance of these variants, potentially elucidating mechanisms beyond gain-of-function.

      We appreciate the reviewer’s suggestion. The UAS-sl cDNA constructs contain stop codon and do not express an HA-epitope tag. Alternatively, we utilized commercially available antibodies against human PLCG1 antibodies to assess the subcellular localization and protein stability by expressing the reference and variant PLCG1 cDNAs in Drosophila larval salivary glands. The reference proteins are cytoplasmic with some enrichment along the plasma membrane. However, we did not observe significant differences between the reference and variant proteins in this assay. We did not include this data.

      (2) I suggest further investigating the relative contributions of developmental processes and acute (Adult) effects on the sl-variant phenotypes observed. For example, employing systems that allow for precise temporal control of gene expression, such as the temperature-sensitive Gal80, could differentiate between these effects, shedding light on the mechanisms that affect longevity and locomotion. This knowledge would be vital for a deeper understanding of the corresponding human disorders and for developing therapeutic interventions.

      We appreciate the reviewer’s suggestion. We utilized Tub-GAL4, Tub-GAL80<sup>ts</sup> to drive the expression of sl wild-type or variant cDNAs, and performed temperature shifts after eclosion to induce expression of the cDNAs only in adult flies. The sl<sup>D1184G</sup> variant (corresponding to PLCG1<sup>D1165G</sup>) caused severely reduced lifespan and the flies mostly die within 10 days. The sl<sup>D1041G</sup> variant (corresponding to PLCG1<sup>D1019G</sup>) led to reduced longevity and locomotion. The sl<sup>H384R</sup> variant (corresponding to PLCG1<sup>H380R</sup>) showed only a mild effect on longevity and no significant effect on climbing ability. These results suggest that the two strong variants (sl<sup>D1041G<sup> and sl<sup>D1184G</sup>) contribute to both developmental and acute effects while the H384R variant mainly contributes to developmental stages.

      I also suggest a more refined analysis of overexpression toxicity. Rather than solely focusing on ubiquitous transgene expression, overexpressing transgene in endogenous pattern using sl-t2a-Gal4 may yield a more nuanced understanding of the pathogenic mechanisms of gain-of-function mutations, particularly in the pathogenesis associated with these variants exclusively located in the coding regions.

      We appreciate the reviewer’s suggestion. We therefore performed the experiments using sl<sup>T2A</sup> to drive overexpression ofPLCG1cDNAs in heterozygous female progeny with one copy of wild-type sl+ (sl<sup>T2A</sup>/ yw > UAS-cDNAs). In this context, expression of PLCG1<sup>Reference<sup>, PLCG1<sup>H380R</sup>orPLCG1<sup>L597F</sup> is viable whereas expression of PLCG1<sup>D1019G</sup> or PLCG1<sup>D1165G</sup> is lethal, suggesting that the PLCG1<sup>D1019G</sup> and PLCG1<sup>D1165G</sup> variants exert a strong dominant toxic effect while the PLCG1<sup>H380R</sup>and PLCG1<sup>L597F<sup> are comparatively milder. Similar patterns have been consistently observed in other ectopic expression assays with varying degrees of severity. These results are updated in the manuscript and figures.

      Reviewer #2 (Recommendations For The Authors):

      The work in the paper could be usefully extended by determining the effects of expressing His380Phe and His380Ala in flies. These variants suppress PLCG1 activity, so their phenotype, if any, would be predicted not to be the same as His380Arg. Determining this would add further strength to the conclusions of the paper.

      We thank the reviewer for the constructive suggestions! We have tested the enzymatic-dead H380A variant, which still exhibits toxicity when expressed in sl<sup>T2A</sup>/Y hemizygous flies, but it is not toxic in heterozygous females suggesting that the reduced eclosion rate is likely not directly associated with enzymatic activity. We have updated the manuscript and figures accordingly.

    1. eLife Assessment

      This useful study characterizes the evolution of medial prefrontal cortex activity during the learning of an odor-based choice task. While the evidence for an increase in task-informative cells with learning, the emergence of population sequences, and the presence of replay events is intriguing, it remains incomplete; notably, the study does not adequately consider the extensive literature on the role of olfactory and hippocampal networks in similar odor-guided tasks. Furthermore, the experimental design appears insufficient to support strong conclusions regarding pre-existing representations or the functional relevance of neural sequences. The study will be of interest to neuroscientists investigating learning and decision-making processes.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use longitudinal in vivo 1-photon calcium recordings in mouse prefrontal cortex throughout the learning of an odor-guided spatial memory task, with the goal of examining the development of task-related prefrontal representations over the course of learning in different task stages and during sleep sessions. They report replication of their previous results, Muysers et al. 2025, that task and representations in prefrontal cortex arise de novo after learning, comprising of goal selective cells that fire selectively for left or right goals during the spatial working memory component of the task, and generalized task phase selective cells that fire equivalently in the same place irrespective of goal, together comprising task-informative cells. The number of task-informative cells increases over learning, and covariance structure changes resulting in increased sequential activation in the learned condition, but with limited functional relevance to task representation. Finally, the authors report that similar to hippocampal trajectory replay, prefrontal sequences are replayed at reward locations.

      Strengths:

      The major strength of the study is the use of longitudinal recordings, allowing identification of task-related activity in the prefrontal cortex that emerges de novo after learning, and identification of sub-second sequences at reward wells.

      Weaknesses:

      (1) The study mainly replicates the authors' previously reported results about generalized and trajectory-specific coding of task structure by prefrontal neurons, and stable and changing representations over learning (Muysers et al., 2024, PMID: 38459033; Muysers et al., 2025, PMID: 40057953), although there are useful results about changes in goal-selective and task-phase selective cells over learning. There are basic shortcomings in the scientific premise of two new points in this manuscript, that of the contribution of pre-existing spatial representations, and the role of replay sequences in the prefrontal cortex, both of which cannot be adequately tested in this experimental design.

      (2) The study denotes neurons that show precise spatial firing equivalently irrespective of goal, as generalized task representations, and uses this as a means to testing whether pre-existing spatial representations can contribute to task coding and learning. A previous study using this data has already shown that these neurons preferentially emerge during task learning (Muysers et al., 2025, PMID: 40057953). Furthermore, in order to establish generalization for abstract task rules or cognitively flexibility, as motivated in the manuscript, there is a need to show that these neurons "generalize" not just to firing in the same position during learning of a given task, but that they can generalize across similar tasks, e.g., different mazes with similar rules, different rules with similar mazes, new odor-space associations, etc. For an adequate test of pre-existing spatial structure, either a comparison task, as in the examples above, is needed, or at least a control task in which animals can run similar trajectories without the task contingencies. An unambiguous conclusion about pre-existing spatial structure is not possible without these controls.

      (3) The scientific premise for the test of replay sequences is motivated using hippocampal activity in internally guided spatial working memory rule tasks (Fernandez-Ruiz et al., 2019, PMID: 31197012; Kay et al., PMID: 32004462; Tang et al., 2021, PMID: 33683201), and applied here to prefrontal activity in a sensory-cue guided spatial memory task (Muysers et al., 2024, PMID: 38459033; Symanski et al., PMID: 36480255; Taxidis et al, 2020, PMID: 32949502). There are several issues with the conclusion in the manuscript that prefrontal replay sequences are involved in evaluating behavioral outcomes rather than planning future outcomes.

      (4) First, odor sampling in odor-guided memory tasks is an active sensory processing state that leads to beta and other oscillations in olfactory regions, hippocampus, prefrontal cortex, and many other downstream networks, as documented in a vast literature of studies (Martin et al., 2007, PMID: 17699692; Kay, 2014, PMID: 24767485; Martin et al., 2014; Ramirez-Gordillo, 2022, PMID: 36127136; Symanski et al., 2022, PMID: 36480255). This is an active sensory state, not conducive to internal replay sequences, unlike references used in this manuscript to motivate this analysis, which are hippocampal spatial memory studies with internally guided rather than sensory-cue guided decisions, where internal replay is seen during immobility at reward wells. These two states cannot be compared with the expectation of finding similar replay sequences, so it is trivially expected that internal replay sequences will not be seen during odor sampling.

      (5) Second, sequence replay is not the only signature of reactivation. Many studies have quantified prefrontal replay using template matching and reactivation strength metrics that do not involve sequences (Peyrache et al., 2009, PMID: 19483687; Sun et al., 2024, PMID: 38872470). Third, previous studies have explicitly shown that prefrontal activity can be decoded during odor sampling to predict future spatial choices - this uses sensory-driven ensemble activity in prefrontal cortex and not replay, as odor sampling leads to sensory driven processing and recall rather than a reactivation state (Symanski et al., 2022, PMID: 36480255). It is possible that 1-photon recordings do not have the temporal resolution and information about oscillatory activity to enable these kinds of analyses. Therefore, an unambiguous conclusion about the existence and role of prefrontal reactivation is not possible in this experimental and analytical design.

    3. Reviewer #2 (Public review):

      Summary:

      The first part of the manuscript quantifies the proportion of goal-arm specific and task-phase specific cells during the learning and learned conditions, and similar to their previously published Muysers et al. (2025) paper, find that the task-phase coding cells (Muysers et al. call them path equivalent cells) increase in the learned condition. However, compared to the Muysers et al. 2025 paper, this work quantifies the proportion of cells that change coding type across learning and learned conditions. The second part of the paper reports firing sequences using a sequence similarity clustering-based method that the group developed previously and applied to hippocampal data in the past.

      Strengths:

      Identifying sequences by a clustering method in which sequence patterns of individual events are compared is an interesting idea.

      Weaknesses:

      Further controls are needed to validate the results.

    4. Reviewer #3 (Public review):

      In the study, the authors performed longitudinal 1P calcium imaging of mouse mPFC across 8 weeks during learning of an olfactory-guided task, including habituation, training, and sleep periods. The task had 3 arms. Odor was sampled at the end of the middle arm (named the "Sample" period). The animal then needed to run to one of the two other arms (R or L) based on the odor. The whole period until they reached the end of one of the choice arms was the "Outward" period. The time at the reward end was the "Reward" period. They noted several changes from the learning condition to the learned condition (there are some questions for the authors interspersed):

      (1) They classified cells in a few ways. First, each cell was classified as SI (spatially informative) if it had significantly more spatial information than shuffled activity, and ~50% of cells ended up being SI cells. Then, among the SI cells, they classified a cell as a TC (task cell) if it had statistically similar activity maps for R versus L arms, and a GC (goal arm cell) otherwise. Note that there are 4 kinds of these cells: outer arm TCs and GCs, and middle arm TCs and GCs (with middle arm GCs essentially being like "splitter cells" since they are not similarly active in the middle arm for R versus L trials). There was an increase in TCs from the learning to the learned condition sessions.

      (2) They analyze activity sequences across cells. They extracted 500 ms duration bursts (defined as periods of activity > 0.5 standard deviations over what I assume is the mean - if so, the authors can add "over the mean" to the burst definition in the methods). They first noted that the resulting "Burst rates were significantly larger during behavioral epochs than during sleep and during periods of habituation to the arena", and "Moreover, burst rates during correct trials were significantly lower than during error trials". For the sequence analysis, they only considered bursts consisting of at least 5 active cells. A cell's activity within the burst was set to the center of mass of calcium activity. Then they took all the sequences from all learned and learning sessions together and hierarchically clustered them based on Spearman's rank correlation between the order of activity in each pair of sequences (among the cells active in both). The iterative hierarchical clustering process produces groups (clusters) of sequences such that there are multiple repeats of sequences within a cluster. Different sequences are expressed across all the longitudinally recorded sessions. They noted "large differences of sequence activation between learning and learned condition, both in the spatial patterns (example animal in Figure 3D) and the distribution of the sequences (Figures 3D, E). Rastermap plots (Figure 3D) also reveal little similarity of sequence expression between task and habituation or sleep condition." They also note that the difference in the sequences between learning and learned conditions was larger than the difference between correct and error trials within each condition. They conclude that during task learning, new representations are established, as measured by the burst sequence content. They do additional analyses of the sequence clusters by assessing the spatial informativeness (SI) of each sequence cluster. Over learning, they find an increase in clusters that are spatially informative (clusters that tend to occur in specific locations). Finally, they analyzed the SI clusters in a similar manner to SI cells and classified them as task phase selective sequences (TSs) and goal arm selective sequences (GSs), and did some further analysis. However, they themselves conclude that the frequency of TSs and GSs is limited (I believe because most sequence clusters were non-SI - the authors can verify this and write it in the text?). In the discussion, they say, "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior".

      (3) As an alternative to analyzing individual cells and sequences of individual cells, they then look for trajectory replay using Bayesian population decoding of location during bursts. They analyze TS bursts, GS bursts, and non-SI bursts. They say "we found correlations of decoded position with time bin (within a 500 ms burst) strongly exceeding chance level only during outward and reward phase, for both GSs and TSs (Fig 4H)." Figure 4H shows distributions indicating statistically significant bias in the forward direction (using correlations of decoded location versus time bin across 10 bins of 50 ms each within each 500-ms burst). They find that the Outward trajectories appear to reflect the actual trajectory during running itself, so they are likely not replay. But the sequences at the Reward are replay as they do not reflect the current location. Furthermore, replay at the Reward is in the forward direction (unlike the reverse replay at Reward seen in the hippocampus), and this replay is only seen in the learned and not the learning condition. At the same time, they find that replay is not seen during odor Sampling, from which they conclude there is no evidence of replay used for planning. Instead, they say the replay at the Reward could possibly be for evaluation during the Reward phase, though this would only be for the learned condition. They conclude "Together with our finding of strong changes in sequence expression after learning (Figure 3E) these findings suggest that a representation of task develops during learning, however, it does not reflect previous network structure." I am not sure what is meant here by the second part of this sentence (after "however ..."). Is it the idea that the replay represents network structure, and the lack of Reward replay in the learning condition means that the network structure must have been changed to get to the learned condition? Please clarify.

      This study provides valuable new information about the evolution of mPFC activity during the learning of an odor-based 2AFC T-maze-like task. They show convincing evidence of changes in single-cell tuning, population sequences, and replay events. They also find novel forward replay at the Reward, and find that this is present only after the animal has learned the task. In the discussion, the authors note "To our knowledge, this study identified for the first time fast recurring neural sequence activity from 1-p calcium data, based on correlation analysis."

      (1) There are some statements that are not clear, such as at the end of the introduction, where the authors write, "Both findings suggest that the mPFC task code is locally established during learning." What is the reasoning behind the "locally established" statement? Couldn't the learning be happening in other areas and be inherited by the mPFC? Or are the authors assuming that newly appearing sequences within a 500-ms burst period must be due to local plasticity? I have also pointed out a question about the statement "however, it does not reflect previous network structure" in (3) above.

      (2) The threshold for extracting burst events (0.5 standard deviations, presumably above the mean, but the authors should verify this) seems lower than what one usually sees as a threshold for population burst detection. What fraction of all data is covered by 500 ms periods around each such burst? However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

    1. eLife Assessment

      This important study presents JABS, an open-source platform that integrates hardware and user-friendly software for standardized mouse behavioral phenotyping. The work has practical implications for improving reproducibility and accessibility in behavioral neuroscience, especially for linking behavior to genetics across diverse mouse strains. The strength of evidence is convincing, with validation of key platform components, although incomplete methodological details and limited documentation, particularly around pose estimation and classifier generalizability, currently limit its interpretability and broader adoption.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript provides an open-source tool including hardware and software, and a dataset to facilitate and standardize behavioral classification in laboratory mice. The hardware for behavioral phenotyping was extensively tested for safety. The software is GUI-based, facilitating the usage of this tool across the community of investigators who do not have a programming background. The behavioral classification tool is highly accurate, and the authors deposited a large dataset of annotations and pose tracking for many strains of mice. This tool has great potential for behavioral scientists who use mice across many fields; however, there are many missing details that currently limit the impact of this tool and publication.

      Strengths:

      (1) There is software-hardware integration for facilitating cross-lab adaptation of the tool and minimizing the need to annotate new data for behavioral classification.

      (2) Data from many strains of mice were included in the classification and genetic analyses in this manuscript.

      (3) A large dataset was annotated and deposited for the use of the community.

      (4) The GUI-based software tool decreases barriers to usage across users with limited coding experience.

      Weaknesses:

      (1) The authors only report the quality of the classification considering the number of videos used for training, but not considering the number of mice represented or the mouse strain. Therefore, it is unclear if the classification model works equally well in data from all the mouse strains tested, and how many mice are represented in the classifier dataset and validation.

      (2) The GUI requires pose tracking for classification, but the software provided in JABS does not do pose tracking, so users must do pose tracking using a separate tool. Currently, there is no guidance on the pose tracking recommendations and requirements for usage in JABS. The pose tracking quality directly impacts the classification quality, given that it is used for the feature calculation; therefore, this aspect of the data processing should be more carefully considered and described.

      (3) Many statistical and methodological details are not described in the manuscript, limiting the interpretability of the data presented in Figures 4,7-8. There is no clear methods section describing many of the methods used and equations for the metrics used. As an example, there are no details of the CNN used to benchmark the JABS classifier in Figure 4, and no details of the methods used for the metrics reported in Figure 8.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents the JAX Animal Behavior System (JABS), an integrated mouse phenotyping platform that includes modules for data acquisition, behavior annotation, and behavior classifier training and sharing. The manuscript provides details and validation for each module, demonstrating JABS as a useful open-source behavior analysis tool that removes barriers to adopting these analysis techniques by the community. In particular, with the JABS-AI module, users can download and deploy previously trained classifiers on their own data, or annotate their own data and train their own classifiers. The JABS-AI module also allows users to deploy their classifiers on the JAX strain survey dataset and receive an automated behavior and genetic report.

      Strengths:

      (1) The JABS platform addresses the critical issue of reproducibility in mouse behavior studies by providing an end-to-end system from rig setup to downstream behavioral and genetic analyses. Each step has clear guidelines, and the GUIs are an excellent way to encourage best practices for data storage, annotation, and model training. Such a platform is especially helpful for labs without prior experience in this type of analysis.

      (2) A notable strength of the JABS platform is its reuse of large amounts of previously collected data at JAX Labs, condensing this into pretrained pose estimation models and behavioral classifiers. JABS-AI also provides access to the strain survey dataset through automated classifier analyses, allowing large-scale genetic screening based on simple behavioral classifiers. This has the potential to accelerate research for many labs by identifying particular strains of interest.

      (3) The ethograph analysis will be a useful way to compare annotators/classifiers beyond the JABS platform.

      Weaknesses:

      (1) The manuscript as written lacks much-needed context in multiple areas: what are the commercially available solutions, and how do they compare to JABS (at least in terms of features offered, not necessarily performance)? What are other open-source options? How does the supervised behavioral classification approach relate to the burgeoning field of unsupervised behavioral clustering (e.g., Keypoint-MoSeq, VAME, B-SOiD)? What kind of studies will this combination of open field + pose estimation + supervised classifier be suitable for? What kind of studies is it unsuited for? These are all relevant questions that potential users of this platform will be interested in.

      (2) Throughout the manuscript, I often find it unclear what is supported by the software/GUI and what is not. For example, does the GUI support uploading videos and running pose estimation, or does this need to be done separately? How many of the analyses in Figures 4-6 are accessible within the GUI?

      (3) While the manuscript does a good job of laying out best practices, there is an opportunity to further improve reproducibility for users of the platform. The software seems likely to perform well with perfect setups that adhere to the JABS criteria, but it is very likely that there will be users with suboptimal setups - poorly constructed rigs, insufficient camera quality, etc. It is important, in these cases, to give users feedback at each stage of the pipeline so they can understand if they have succeeded or not. Quality control (QC) metrics should be computed for raw video data (is the video too dark/bright? are there the expected number of frames? etc.), pose estimation outputs (do the tracked points maintain a reasonable skeleton structure; do they actually move around the arena?), and classifier outputs (what is the incidence rate of 1-3 frame behaviors? a high value could indicate issues). In cases where QC metrics are difficult to define (they are basically always difficult to define), diagnostic figures showing snippets of raw data or simple summary statistics (heatmaps of mouse location in the open field) could be utilized to allow users to catch glaring errors before proceeding to the next stage of the pipeline, or to remove data from their analyses if they observe critical issues.

    1. eLife Assessment

      This important study uses long-term behavioural observations to understand the factors that influence female-on-female aggression in gorilla social groups. The evidence supporting the claims is convincing, as it includes novel methods of assessing aggression and considers other potential factors. The work will be of interest to broad biologists working on the social interactions of animals.

    2. Joint Public Review:

      Summary:

      This work aims to improve our understanding of the factors that influence female-on-female aggressive interactions in gorilla social hierarchies, using 25 years of behavioural data from five wild groups of two gorilla species. Researchers analysed aggressive interactions between 31 adult females, using behavioural observations and dominance hierarchies inferred through Elo-rating methods. Aggression intensity (mild, moderate, severe) and direction (measured as the rank difference between aggressor and recipient) were used as key variables. A linear mixed-effects model was applied to evaluate how aggression direction varied with reproductive state (cycling, trimester-specific pregnancy, or lactation) and sex composition of the group. This study highlights the direction of aggressive interactions between females, with most interactions being directed from higher- to lower-ranking adult females close in social rank. However, the results show that 42% of these interactions are directed from lower- to higher-ranking females. Particularly, lactating and pregnant females targeted higher-ranking individuals, which the authors suggest might be due to higher energetic needs, which increase risk-taking in lactating and pregnant females. Sex composition within the group also influenced which individuals were targeted. The authors suggest that male presence buffers female-on-female aggression, allowing females to target higher-ranking females than themselves. In contrast, females targeted lower-ranking females than themselves in groups with a larger ratio of females, which supposes a lower risk for the females since the pool of competitors is larger. The findings provide an important insight into aggression heuristics in primate social systems and the social and individual factors that influence these interactions, providing a deeper understanding of the evolutionary pressures that shape risk-taking, dominance maintenance, and the flexibility of social strategies in group-living species.

      The authors achieved their aim by demonstrating that aggression direction in female gorillas is influenced by factors such as reproductive condition and social context, and their results support the broader claim that aggression heuristics are flexible. However, some specific interpretations require further support. Despite this, the study makes a valuable contribution to the field of behavioural ecology by reframing how we think about intra-sexual competition and social rank maintenance in primates.

      Strengths:

      One of the study's major strengths is the use of an extensive dataset that compiles 25 years of behavioural data and 6871 aggressive interactions between 31 adult females in five social groups, which allows for a robust statistical analysis. This study uses a novel approach to the study of aggression in social groups by including factors such as the direction and intensity of aggressive interactions, which offers a comprehensive understanding of these complex social dynamics. In addition, this study incorporates ecological and physiological factors such as the reproductive state of the females and the sex composition of the group, which allows an integrative perspective on aggression within the broader context of body condition and social environment. The authors successfully integrate their results into broader evolutionary and ecological frameworks, enriching discussions around social hierarchies and risk sensitivity in primates and other animals.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Suggestions:

      Although this study has an impressive dataset, I felt that some parts of the discussion would benefit from further explanation, specifically when discussing the differences in female aggression direction between groups with different sex compositions. In the discussion is suggested that males buffer female-on-female aggression and that they 'support' lower-ranking females (see line 212), however, the study only tested the sex composition of the group and does not provide any evidence of this buffering. Thus, I would suggest adding more information on how this buffering or protection from males might manifest (for example, listing male behaviours that might showcase this protection) or referencing other studies that support this claim. Another example of this can be found in lines 223-224, which suggests that females choose lower-ranking individuals when they are presented with a larger pool of competitors; however, in lines 227-228, it's stated that this result contradicts previous work in baboons, which makes the previous claim seem unjustified. I recommend adding other examples from studies that support the results of this paper and adding a line that addresses reasons why these differences between gorillas and baboons might be caused (for example, different social dynamics or ecological constraints). In addition, I suggest the inclusion of physiological data such as direct measures of energy expenditure, caloric intake, or hormone levels, as it would strengthen the claims made in the second paragraph of the discussion. However, I understand this might not be possible due to data or time constraints, so I suggest adding more robust justification on why lactation and pregnancy were used as a proxy for energetic need. In the methods (lines 127-128), it is unclear which phase of the pregnancy or lactation is more energetically demanding. I would also suggest adding a comment on the limitations of using reproductive state to infer energetic need. Lastly, if the data is available, I believe it would be interesting to add body size and age of the females or the size difference between aggressor and target as explanatory variables in the models to test if physiological characteristics influence female-on-female aggression.

      Male support:

      We have now added more references (Watts 1994, 1997) and enriched our arguments regarding male presence buffering aggression. Previous research suggests that male gorillas may support lower-ranking females and they may intervene in female-female conflicts (Sicotte 2002). Unfortunately, our dataset did not allow us to test for male protection. We conduct proximity scans every 10 minutes and these scans are not associated to each interaction, meaning that we cannot reliably test if proximity to a male influences the likelyhood to receive aggression.

      Number of competitors and choice of weaker competitors:

      We added a very relevant reference in humans, showing that people choose weaker competitors when they have they can choose. We removed the example to baboons because it used sex ratio and the relevance to our study was not that straightforward.

      Reproductive state as a proxy for energetic needs:

      We now mention clearly that reproductive state is an indirect measure of energetic needs.

      We rephrased our methods to: “Lactation is often considered more energetically demanding than pregnancy as a whole but the latest stages of pregnancy are highly energetically demanding, potentially even more than lactation”

      Unfortunately, we do not have access to physiological and body size data. Regarding female age, for many females, ages are estimates with errors up to a decade, and thus, we choose not to use them as a reliable predictor. Having accurate values for all these variables, would indeed be very valuable and improve the predicting power of our study.

      Recommendations for writing and presentation:

      Overall, the manuscript is well-organised and well-written, but there are certain areas that could improve in clarity. In the introduction, I believe that the term 'aggression heuristic' should be introduced earlier and properly defined in order to accommodate a broader audience. The main question and aims of the study are not stated clearly in the last paragraph of the introduction. In the methods, I think it would improve the clarity to add a table for the classification of each type of agonistic interactions instead of naming them in the text. For example, a table that showcase the three intensity categories (severe, mild and moderate), than then dives into each behaviour (e.g. hit, bite, attack, etc.) and a short description of these behaviours, I think this would be helpful since some of the behaviours mentioned can be confusing (what's the difference between attack, hit and fight?). In addition, in line 104, it states that all interactions were assigned equal intensity, which needs to be explained.

      We now define aggression heuristics in both the abstract and the first paragraph of the introduction. We have also explained aggressive interactions that their nature was not obvious from their names. Hopefully, these explanations make clear the differences among the recorded behaviours.

      We have now specified that the “equal intensity” refers to avoidances and displacements used to infer power relationships: “We assigned to all avoidance/displacement interactions equal intensity, that is, equal influence to the power relationship of the interacting individuals”

      Minor corrections:

      (1) In line 41, there is a 1 after 'similar'. I am unsure if it's a mistake or a reference.

      We corrected the typo.

      (2) In lines 68-69, there is mention of other studies, but no references are provided.

      We added citations as suggested.

      (3) Remove the reference to Figure 1 (line 82) from the introduction; the figure should be referenced in the text just before the image, however, your figure is in a different section.

      We removed the reference as suggested.

      (4) Line 98 and 136, it's written 'ad libtum' but the correct spelling is 'ad libitum'.

      We corrected the typo.

      (5) Figure 3, remove the underscores between the words in the axis titles.

      We removed the underscores.

      Reviewer #2 (Recommendations for the authors):

      Here, I have outlined some specific suggestions that require attention. Addressing these comments will enhance the readability and enhance the quality of the manuscript.

      (1) L69. Add citation here, indicating the studies focusing on aggression rates.

      We added citations as suggested.

      (2) L88. The study periods used in this study and the authors' previous study (Reference 11) are different. So please add one table as Table 1 showing the details info on the sampling efforts and data included in their analysis of this study. For example, the study period, the numbers of females and males, sampling hours, the number of avoidance/displacement behaviors used to calculate individual Elo-ratings, and the number of mild/moderate/severe aggressive interactions, etc.

      We have now added another table, as suggested (new Table 1) and we have also made clear that we used the hierarchies presented in detail in (Smit & Robbins 2025).

      (3) L103. If readers do not look over Reference 25 on purpose, they do not know what the authors want to talk about and why they mention the optimized Elo-rating method. Clarify this statement and add more content explaining the differences between the two methods, or just remove it.

      We rephrased the text and in response to the previous comment, we clearly state that there are more details about our approach in Smit & Robbins 2025. At the end of the relevant sentence, we added the following parenthesis “(see “traditional Elo rating method”; we do not use the “optimized Elorating method” as it yields similar results and it is not widely used)” and we removed the sentence referring to the optimized Elo-rating method.

      (4) L110. Here, the authors stated that the individual with the standardized Elo-score 1 was the highest-ranking. L117, the "aggression direction" score of each aggressive interaction was the standardized Elo-score of the aggressor, subtracting that of the recipient. So, when the "aggression direction" score was 1, it should mean that the aggressor was the highest-ranking and the recipient was the lowest-ranking female. This is not as the authors stated in L117-120 (where the description was incorrectly reversed). Please clarify.

      The highest ranking individual has indeed Elo_score equal to 1 and we calculated the interaction score (or "aggression direction score") of each aggressive interaction by subtracting the standardized Elo-score of the aggressor from that of the recipient (Elo_recepient – Elo_aggressor). So, when the aggressor is the lowest-ranking female (Elo_score=0) and the recipient the highestranking female one (Elo_score=1), the "aggression direction score" is 1-0 = 1.

      (5) Regarding point 3 of the Public Review, please also revise/expand the paragraph L193-208 in the Discussion section accordingly.

      Please see our response to the public review. We have enriched the results section, added pairwise comparisons in a new table (Table 2) and modified the discussion accordingly.

      (6) Table 1. It's not clear why authors added the column 'Aggression Rate' but did not provide any explanation in the Methods/Results section. How did they calculate the correlation between each tested variable and the "overall adult female aggression rates"? Correlating the number of females in the first trimester of female pregnancy with the female aggression rates in each study group? What did the correlation coefficients mean? L202-204 may provide some hints as to why the authors introduced the Aggression Rate. But it should be made clear in the previous text.

      We now added more details in the legend of the table to make our point clear: “To highlight that aggression rates can increase due to increase in interactions of different score, we also include the effect of some of the tested variables on overall adult female aggression rates, based on results of linear mixed effects models from (Smit & Robbins 2024).”  We did not include detailed methods to calculate those results because they are detailed in (Smit & Robbins 2024). We find it valuable to show the results of both aggression rates and aggression directionality according to the same predictor variables as a means to clarify that aggression rates and aggression directionality are not always coordinated to one another (they do not always change in a consistent manner relative to one another).

      (7) L166.This is not rigorous. Please rephrase. There is only one western gorilla group containing only one resident male included in the analysis.

      We have toned down our text: “Our results did not show any significant difference between femalefemale aggression patterns within the one western and four mountain gorillas groups”

      (8) L167. I don't think the interaction scores in the third trimester of female pregnancy were significantly higher than those in the first trimester. The same concern applies in L194-195.

      We have now added a new table with post hoc pairwise comparisons among the different reproductive states that clarifies that.

      (9) L202. There is no column 'Aggression rates' in Table 1 of Reference 11.

      We have rephrased to make clear that we refer to Table 1 of the present study.

      (10) L204-205. Reference 49. Maybe not a proper citation here. This claim requires stronger evidence or further justification. Additionally, please rephrase and clarify the arguments in L204208 for better readability and precision.

      We have added three more references and rephrased to clarify our argument.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 41: The word "similar" is misspelled.

      We corrected the typo.

    1. eLife Assessment

      This work investigates ZC3H11A as a cause of high myopia through the analysis of human data and experiments with genetic knockout of Zc3h11a in mouse, providing a useful model of myopia. The evidence supporting the conclusion is still incomplete in the revised manuscript as the concerns raised in the previous review were not fully addressed. The article would benefit from a more robust genetic analysis and comprehensive presentation of human phenotypic data to clarify the modes of inheritance in the families, currently limited by loss of patient follow-up and addressing whether there is a reduction in bipolar cell number or decreased marker protein expression through cell counts or quantifiable, less saturated Western blots. The work will be of interest to ophthalmologists and researchers working on myopia

    2. Reviewer #1 (Public Review):

      The authors reported that mutations were identified in the ZC3H11A gene in four adolescents from 1015 high myopia subjects in their myopia cohort. They further generated Zc3h11a knockout mice utilizing the CRISPR/Cas9 technology.

      The main claims are only partially supported. The reviewers still have the concerns of 1) the modes of inheritance for the families need to be shown; 2) the phenotype of heterozygous mutant mice is too weak; 3) the authors still have not addressed the biological question of whether there are fewer bipolar cells or decreased expression of the marker protein. This would involve counting cells, which they have not done. The blots they show do not appear to support their quantifications. Considering the sensitivity of quantifying nearly saturated blots, the authors should show blots that are not exposed to that level of saturation.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public review):

      Summary:

      The authors reported that mutations were identified in the ZC3H11A gene in four adolescents from 1015 high myopia subjects in their myopia cohort. They further generated Zc3h11a knockout mice utilizing the CRISPR/Cas9 technology.

      Comments on revisions:

      Chong Chen and colleagues revised the manuscript; however, none of my suggestions from the initial review have been sufficiently addressed.

      (1) I indicated that the pathogenicity and novelty of the mutation need to be determined according to established guidelines and databases. However, the conclusion was still drawn without sufficient justification.

      Thank you for your valuable feedback on the assessment of mutation pathogenicity and novelty. We regret to inform you that complete familial genetic information required for segregation analysis is currently unavailable in this study. Despite our exhaustive efforts to contact the four mutation carriers and their relatives, we encountered the following uncontrollable limitations: Two patients could not be further traced due to invalid contact information, one patient had relocated to another region, making sample collection logistically unfeasible, the remaining patient explicitly declined family participation in genetic testing due to privacy concerns.

      We fully acknowledge that the lack of pedigree data may affect the certainty of pathogenicity evaluation. To address this limitation, we systematically analyzed the four ZC3H11A missense mutations (c.412G>A p.V138I, c.128G>A p.G43E, c.461C>T p.P154L, and c.2239T>A p.S747T) based on ACMG guidelines and database evidence. The key findings are summarized below: All of the identified mutations exhibited very low frequencies or does not exist in the Genome Aggregation Database (gnomAD) and Clinvar, and using pathogenicity prediction software SIFT, PolyPhen2, and CADD, most of them display high pathogenicity levels. Among them, c.412G>A, c.128G>A and c.461C>T were located in or around a domain named zf-CCCH_3 (Figure 1A and B). Furthermore, all of the mutation sites were located in highly conserved amino acids across different species (Figure 1C). The four mutations induced higher structural flexibility and altered the negative charge at corresponding sites, potentially disrupting protein-RNA interactions (Figure 1D and E). Concurrently, overexpression of mutant constructs (ZC3H11A-V138I, ZC3H11A-G43E, ZC3H11A-P154L, and ZC3H11A-S747T) revealed significantly reduced nuclear IκBα mRNA levels compared to the wild-type, suggesting impaired NF-κB pathway regulation (Supplementary Figure 4). Zc3h11a knockout mice also exhibited a myopic phenotype, with alterations in the PI3K-AKT and NF-κB signaling pathways. Integrating this evidence, the mutations meet the following ACMG criteria: PM1 (domain-located mutations), PM2 (extremely low population frequency), PP3 (computational predictions supporting pathogenicity), PS3 (functional validation via experimental assays). Under the ACMG framework, these mutations are classified as "Likely Pathogenic".

      Regarding the novelty of this mutation, comprehensive searches in ClinVar, dbSNP, and HGMD databases revealed no prior reports associating this variant with myopia. Similarly, a PubMed literature search identified no direct evidence linking this mutation to myopia. Based on this evidence, we classify this variant as a likely pathogenic and novel mutation.

      On the other hand, we acknowledge that the absence of family segregation data may reduce the confidence in pathogenicity assessment. Nevertheless, functional experiments and converging multi-level evidence strongly support the reliability of our conclusion. Future studies will prioritize family-based validation to strengthen the evidence chain. We sincerely appreciate your attention to this matter and kindly request your understanding of the practical limitations inherent to this research.

      (2) The phenotype of heterozygous mutant mice is too weak to support the gene's contribution to high myopia. The revised manuscript does not adequately address these discrepancies. Furthermore, no explanation was provided for why conditional gene deletion was not used to avoid embryonic lethality, nor was there any discussion on tissue- or cell-specific mechanistic investigations.

      We sincerely appreciate your insightful comments regarding the relationship between murine phenotypes and human disease. We fully acknowledge your concerns about the phenotypic strength of Zc3h11a heterozygous mutant mice and their association with high myopia (HM) pathogenesis. Here we provide point-by-point responses to your valuable comments: Our study demonstrates that Zc3h11a heterozygous mutant mice exhibit myopic refractive phenotypes with upregulated myopia-associated factors (TGF-β1, MMP2, and IL6), although axial elongation did not reach statistical significance. Notably, at 4 and 6 weeks of age, Het mice did display longer axial lengths and vitreous chamber depths compared to WT mice. While these differences did not reach statistical significance at other time points, an increasing trend was still observed. Several technical considerations may explain these findings: The small murine eye size (where 1D refractive change corresponds to only 5-6μm axial length change). The theoretical resolution limit of 6μm for the SD-OCT device used in this study. These factors likely contributed to the marginal statistical significance observed in the subtle changes of vitreous chamber depth and axial length measurements. Additionally, existing research indicates that axial length measurements from frozen sections in age-matched mice tend to be longer than those obtained through in vivo measurements. This phenomenon may reflect species differences between humans and mice - while both show significant refractive power changes, the axial length differences are less pronounced in mice. These results align with previous reports of phenotypic differences between mouse models and human myopia.

      To address these issues comprehensively, we have added a dedicated discussion section in the revised manuscript specifically examining these axial length measurement considerations, following your valuable suggestion.

      Additionally, we regret to inform you that the currently available floxed ZC3H11A mouse strain requires a minimum of 12-18 months for custom construction, which exceeds our research timeline due to current resource limitations in our team. To address this gap, we have supplemented the discussion section with additional content regarding tissue- and cell-specific mechanisms. Based on your constructive suggestions, we will prioritize the following in our subsequent work: Collaborate with transgenic animal centers to generate Zc3h11a conditional knockout mice. Evaluate the impact of specific knockouts on myopia progression using form-deprivation (FDM) models. While we recognize the limitations of our current study, we believe that by integrating clinical cohort data, phenotypic evidence, and functional experiments, this research provides valuable directional evidence for ZC3H11A's potential role in myopia pathogenesis. Your comments will significantly contribute to improving our future research design, and we sincerely hope you can recognize the exploratory significance of our current findings.

      (3) The title, abstract, and main text continue to misrepresent the role of the inflammatory intracellular PI3K-AKT and NF-κB signaling cascade in inducing high myopia. No specific cell types have been identified as contributors to the phenotype. The mice did not develop high myopia, and no relationship between intracellular signaling and myopia progression has been demonstrated in this study.

      Thank you for your valuable comments regarding the interpretation of signaling pathways in our study. We fully acknowledge your rigorous concerns about the role of PI3K-AKT and NF-κB signaling cascades in high myopia and recognize that we did not identify specific cell types contributing to the observed phenotype. In response to your feedback, we have removed the hypothetical statement linking genetic changes within inflammatory cells to the development of myopia. The current interpretation is strictly based on experimental evidence of pathway relevance and is supported by the theoretical basis presented in the reference, specifically that loss of Zc3h11a leads to activation of the PI3K-AKT and NF-κB pathways in retinal cells, contributing to the myopic phenotype.

      Author response image 1.

      Model of the association between inflammation and myopia progression. Activated mAChR3 (M3R) activates phosphoinositide 3-kinase (PI3K)–AKT and mitogen-associated protein kinase (MAPK) signaling pathways, in turn activating NF-κB and AP1 (i.e., the Jun.-Fos heterodimer) and stimulating the expression of the target genes NF-κB, MMP2, TGFβ, IL- 1β and -6, and TNF-α. MMP2 and TGF-β promote tissue remodeling and TNF-α may act in a paracrine feedback loop in the retina or sclera to activate NF-κB during myopia progression.

      To address the limitations raised, we will prioritize the following in future studies: Cell-type-specific knockout models to identify key cellular contributors. Mechanistic investigations to establish causal relationships between signaling pathways and myopia progression. We sincerely appreciate your rigorous review, which has significantly improved the scientific accuracy and clarity of our manuscript. We believe the revised version better reflects both the novelty and limitations of our findings. We kindly request your recognition of the study’s contributions while acknowledging its current constraints.

      Reviewer #3 (Public review):

      Chen et al have identified a new candidate gene for high myopia, ZC3H11A, and using a knock-out mouse model, have attempted to validate it as a myopia gene and explain a potential mechanism. They identified 4 heterozygous missense variants in highly myopic teenagers. These variants are in conserved regions of the protein, and predicted to be damaging, but the only evidence the authors provide that these specific variants affect protein function is a supplement figure showing decreased levels of IκBα after transfection with overexpression plasmids (not specified what type of cells were transfected). This does not prove that these mutations cause loss of function, in fact it implies they have a gain-of-function mechanism. They then created a knock-out mouse. Heterozygotes show myopia at all ages examined but increased axial length only at very early ages. Unfortunately, the authors do not address this point or examine corneal structure in these animals. They show that the mice have decreased B-wave amplitude on electroretinogram (a sign of retinal dysfunction associated with bipolar cells), and decreased expression of a bipolar cell marker, PKCα. On electron microscopy, there are morphologic differences in the outer nuclear layer (where bipolar, amacrine, and horizontal cell bodies reside). Transcriptome analysis identified over 700 differentially expressed genes. The authors chose to focus on the PI3K-AKT and NF-κB signaling pathways and show changes in expression of genes and proteins in those pathways, including PI3K, AKT, IκBα, NF-κB, TGF-β1, MMP-2 and IL-6, although there is very high variability between animals. They propose that myopia may develop in these animals either as a result of visual abnormality (decreased bipolar cell function in the retina) or by alteration of NF-κB signaling. These data provide an interesting new candidate variant for development of high myopia, and provide additional data that MMP2 and IL6 have a role in myopia development. For this revision, none of my previous suggestions have been addressed.

      Reviewer #3 (Recommendations for the authors):

      None of these suggestions were addressed in the revision:

      Major issues:

      (1) Figure 2: refraction is more myopic but axial length is not longer - why is this not discussed and explored? The text claims the axial length is longer, but that is not supported by the figure. If this is a measurement issue, that needs to be discussed in the text.

      We sincerely appreciate your valuable comments regarding the relationship between refractive status and axial length in our study. In response to your concerns, we have conducted an in-depth analysis and would like to address the issues as follows:

      Our data demonstrate significant differences in refractive error between heterozygous (Het) and wild-type (WT) mice during the 4-10 weeks. Notably, at 4 and 6 weeks of age, Het mice did exhibit longer axial lengths and greater vitreous chamber depth compared to WT mice, although these differences did not reach statistical significance at other time points while still showing an increasing trend. Additional measurements of corneal curvature revealed no significant differences between groups. Considering the small size of mouse eyes (where a 1D refractive change corresponds to only 5-6μm axial length change) and the theoretical resolution limit of 6μm for the SD-OCT device used in this study, these technical factors may account for the marginal statistical significance of the observed small changes in vitreous chamber depth and axial length measurements. Furthermore, existing studies have shown that axial length measurements from frozen sections tend to be longer than those obtained from in vivo measurements in age-matched mice. These considerations provide plausible explanations for the apparent discrepancy between refractive changes and axial length parameters. Following your suggestion, we have added a dedicated discussion section addressing these axial length measurement issues in the revised manuscript. We fully understand your concerns regarding data consistency, and your comments have prompted us to conduct more comprehensive and thorough analysis of our results. We believe the revised manuscript now more accurately reflects our findings while providing important technical references for future studies.

      (2)  Slipped into the methods is a statement that mice with small eyes or ocular lesions were excluded. How many mice were excluded? Are the authors ignoring another phenotype of these mice?

      We appreciate your attention to the exclusion criteria and their implications. Below we provide a detailed clarification: A total of 7 mice (4 Het-KO and 3 WT) with small eyes or ocular lesions were excluded from the observation cohort. These anomalies were consistent with the baseline incidence of spontaneous malformations observed in historical colony data of wild-type C57BL/6J mice (approximately 11%), and were not attributed to the Zc3h11a heterozygous knockout. We have added the above content in the methods section. Your insightful comment has significantly strengthened our reporting rigor. We hope this clarification alleviates your concerns regarding potential selection bias or overlooked phenotypes.

      Minor/Word choice issues:

      All the figure legends need to be improved so that each figure can be interpreted without having to refer to the text.

      Thank you for your valuable comments. We have made modifications to the legend of each graphic, as detailed in the main text.

      Abstract: line 24: use refraction, not "vision"

      Thank you for your valuable comments. The “Vision” has been changed to “refraction”.

      Line 28: re-word "density of bipolar cell-labeled proteins" Do the authors mean density of bipolar cells? Or certain proteins were less abundant in bipolar cells?

      Thank you for your rigorous review of this terminology. We acknowledge the need to clarify the precise meaning of the phrase "density of bipolar cell-labeled proteins." In the original text, this term specifically refers to the expression abundance of the bipolar cell-specific marker protein PKCα, which was identified using immunofluorescence labeling techniques. Specifically: We utilized PKCα (a bipolar cell marker) to label bipolar cell populations. The "density" was quantified by measuring the fluorescence signal intensity per unit area in confocal microscopy images, rather than direct cell counting. This metric reflects changes in the expression of the specific marker protein (PKCα) within bipolar cells, which indirectly correlates with alterations in bipolar cell populations. To address ambiguity, we have revised the terminology throughout the manuscript to "bipolar cell-labelled protein PKCα immunofluorescence abundance".

      Additionally, since fluorescence intensity quantification is inherently semi-quantitative, we have included Western blot results for PKCα in the revised manuscript (Figure 3I, J) to validate the expression changes observed via immunofluorescence. We sincerely appreciate your feedback, which has significantly improved the precision of our manuscript.

      Line 45: axial length, not ocular axis

      Thank you for your valuable comments. The “ocular axis” has been changed to “axial length”.

      Lines73-75: confusing

      Thank you for your valuable comments. The relevant content has been modified to “Multiple zinc finger protein genes (e.g., ZNF644, ZC3H11B, ZFP161, ZENK) are associated with myopia or HM. Of these, ZC3H11B (a human homolog of ZC3H11A) and five GWAS loci (Schippert et al., 2007; Shi et al., 2011; Szczerkowska et al., 2019; Tang et al., 2020; Wang et al., 2004) correlate with AL elongation or HM severity. Proteomic studies further suggest ZC3H11A involvement in the TREX complex, implicating RNA export mechanisms in myopia pathogenesis”

      Line 138: what is dark 3.0 and dark 10.0

      Thank you for your valuable comments. The relevant content has been modified to “Upon dark adaptation, b-wave amplitudes in seven-week-old Het-KO mice were significantly lower at dark 3.0 (0.48 log cd·s/m²) and dark 10.0 (0.98 log cd·s/m²) compared to WT mice.” A detailed description has been added to the main text methods.

      Line 171-175: the GO terms of "biological processes" and "molecular functions" are so broad as to be meaningless.

      Thank you for your valuable comments. The relevant content has been modified to “GO enrichment analysis revealed significant enrichment of differentially expressed genes in the following functions: Zinc ion transmembrane transport (GO:0071577) within metal ion homeostasis, associated with retinal photoreceptor maintenance (Ugarte and Osborne, 2001), RNA biosynthesis and metabolism (GO:0006366) in transcriptional regulation, potentially influencing ocular development, negative regulation of NF-κB signaling (GO:0043124) in inflammatory modulation, a pathway involved in scleral remodelling (Xiao et al., 2025), calcium ion binding (GO:0005509), critical for phototransduction (Krizaj and Copenhagen, 2002), zinc ion transmembrane transporter activity (GO:0005385), participating in retinal zinc homeostasis (Figure 5C and D).”

      Line 257-259: which results indicated loss of Zc3h11a inhibited translocation of IκBα from nucleus to cytoplasm? Results of this study, or the previously referenced study?

      We sincerely appreciate your critical inquiry regarding the mechanistic relationship between Zc3h11a deficiency and IκBα translocation. We are grateful for this opportunity to clarify this important point. The findings regarding Zc3h11a-mediated regulation of IκBα mRNA nuclear export and its impact on NF-κB signaling originate from the study by Darweesh et al. The key experimental evidence demonstrates that: The depletion of Zc3h11a leads to nuclear retention of IκBα mRNA, resulting in failure to maintain normal levels of cytoplasmic IκBα mRNA and protein. This defect in IκBα mRNA export disrupts the essential inhibitory feedback loop on NF-κB activity, causing hyperactivation of this pathway. This manifests as upregulation of numerous innate immune-related mRNAs, including IL-6 and a large group of interferon-stimulated genes.While our study references this mechanism to explain the observed NF-κB dysregulation in Zc3h11a Het-KO mice, the specific nuclear export mechanism was indeed elucidated by Darweesh et al. The reference has been inserted into the corresponding position in the main text. Importantly, our research extends these previous molecular insights into the phenotypic context of myopia.

      We sincerely regret any ambiguity in the original text and deeply appreciate your rigorous approach in ensuring proper attribution of these fundamental findings. Your comment has significantly improved the clarity and accuracy of our manuscript.

      Figure 6 shows decrease of both mRNA and protein expression, but nothing about translocation.

      Thank you for your valuable comments. The research results of Darweesh et al. showed that Zc3h11a protein plays a role in regulation of NF-κB signal transduction. Depletion of Zc3h11a resulted in enhanced NF-κB mediated signaling, with upregulation of numerous innate immune related mRNAs, including IL-6 and a large group of interferon-stimulated genes. IL-6 upregulation in the absence of the Zc3h11a protein correlated with an increased NF-κB transcription factor binding to the IL-6 promoter and decreased IL-6 mRNA decay. The enhanced NF-κB signaling pathway in Zc3h11a deficient cells correlated with a defect in IκBα inhibitory mRNA and protein accumulation. Upon Zc3h11a depletion The IκBα mRNA was retained in the cell nucleus resulting in failure to maintain normal levels of the cytoplasmic IκBα mRNA and protein that is essential for its inhibitory feedback loop on NF-κB activity. These findings demonstrate that ZC3H11A can regulate the NF-κB pathway by controlling the translocation of IκBα mRNA, a mechanism that was indeed elucidated by Darweesh et al. We sincerely apologize for any lack of clarity in our original description and have now inserted the appropriate reference in the relevant section of the main text.

      We deeply appreciate your valuable comments in identifying this ambiguity in our manuscript, which have significantly improved the accuracy and clarity of our work.

      Line 283: what do you mean "may confer embryonic lethality"? Were they embryonic lethal or not?

      We sincerely appreciate your critical request for clarification. Our experimental data from 15 pregnancies of Zc3h11a Het-KO mice intercrosses (n = 15 litters) conclusively confirmed the absence of homozygous knockout (Homo-KO) pups at birth. These findings align with the embryonic lethality of Zc3h11a homozygous deletion as reported by Younis et al. We fully acknowledge the ambiguity in our original phrasing and have revised the text to:“Second, Zc3h11a homozygous KO (Homo-KO) mice were not obtained in our study because homozygous deletion of exons confer embryonic lethality.”Your vigilance in ensuring terminological precision has greatly strengthened the rigor of our manuscript. We hope this clarification fully resolves your concerns.

      Line 338: What is meant that Het-KO mice were constructed at 4 weeks of age? Do these mice not have a germline mutation?

      Thank you for your valuable comments. We have revised the following content: “The germline heterozygous Zc3h11a knockout (Het-KO) mice were generated by CRISPR/Cas9-mediated gene editing at the embryonic stage on a C57BL/6J background, provided by GemPharmatech Co., Ltd (Nanjing, China). Phenotypic analyses were initiated when the mice reached four weeks of age.”

      Line 346-347: how many mice were excluded due to having small eyes or ocular lesions? The methods section should state how refraction and ocular biometrics were measured.

      Thank you for your valuable comments. We have added or revised the following content: “To exclude potential confounding effects of spontaneous ocular developmental abnormalities, a total of 7 mice (4 Het-KO and 3 WT) with small eyes or ocular lesions were excluded from the observation cohort. These anomalies were consistent with the baseline incidence of spontaneous malformations observed in historical colony data of wild-type C57BL/6J mice (approximately 11%), and were not attributed to the Zc3h11a heterozygous knockout.

      The methods for measuring refraction and ocular biometrics are as follows and have been added to the original method. Refractive measurements were performed by a researcher blinded to the genotypes. Briefly, in a darkroom, mice were gently restrained by tail-holding on a platform facing an eccentric infrared retinoscope (EIR) (Schaeffel et al., 2004; Zhou et al., 2008a). The operator swiftly aligned the mouse position to obtain crisp Purkinje images centered on the pupil using detection software (Schaeffel et al., 2004), enabling axial measurements of refractive state and pupil size. Three repeated measurements per eye were averaged for analysis. The anterior chamber (AC) depth, lens thickness, vitreous chamber (VC) depth, and axial length (AL) of the eye were measured by real-time optical coherence tomography (a custom built OCT) (Zhou et al., 2008b). In simple terms, after anesthesia, each mouse was placed in a cylindrical holder on a positioning stage in front of the optical scanning probe. A video monitoring system was used to observe the eyes during the process. Additionally, by detecting the specular reflection on the corneal apex and the posterior lens apex in the two dimensional OCT image, the optical axis of the mouse eye was aligned with the axis of the probe. Eye dimensions were determined by moving the focal plane with a stepper motor and recording the distance between the interfaces of the eyes. Then, using the designed MATLAB software and appropriate refractive indices, the recorded optical path length was converted into geometric path length. Each eye was scanned three times, and the average value was taken.”

      Line 428: what age retinas

      Thank you for your meticulous attention to the experimental design details. Regarding the age of retinal samples, we have clarified the following in the revised manuscript:" Retinas were harvested from four-week-old mice for RNA sequencing." This revision enhances the transparency and reproducibility of our methodology. We deeply appreciate your rigorous review.

      Figure 3 D-F: these images are too small to adequately assess, please show at higher magnification. Are there fewer bipolar cells, or just decreased expression of PKC? From these images, expression of ZC3H11A does not appear decreased, but the retina appears thinner. Is that true, or are these poorly matched sections?

      Thank you for your professional insights regarding image quality and data interpretation. Your rigorous review has significantly enhanced the scientific rigor of our study. We hereby address your concerns point by point: The images in Figures 3D-F were acquired using a Zeiss LSM880 confocal microscope with a 10x eyepiece and 20x objective lens, a standard magnification for retinal section imaging that balances cellular resolution with full-thickness structural preservation. We quantified PKCα immunofluorescence intensity (a bipolar cell-specific marker) to assess changes in bipolar cell populations, rather than direct cell counting. This metric reflects PKCα expression abundance as a proxy for bipolar cell alterations (Figure 3H). To clarify terminology, we have revised the text to "bipolar cell-labelled protein PKCα immunofluorescence abundance" and detailed the methodology in the revised Methods section. Recognizing the semi-quantitative nature of fluorescence intensity analysis, we supplemented these data with Western blot results confirming reduced PKCα protein levels (Figure 3I). Zc3h11a expression was validated both by immunofluorescence intensity (Figure 3G) and Western blot (Figures 6F, H) quantification, confirming reduced expression in Zc3h11a Het-KO retinas. The apparent "retinal thinning" observed in histology sections stems from technical artifacts during tissue processing (fixation, dehydration, sectioning), not biological differences. HE staining, which better preserves sample morphology, showed no structural or thickness differences between Zc3h11a Het-KO mice and wild-type mice (Supplementary Figure 2).

      Your expert feedback has driven us to establish a more robust validation framework. We believe the revised data now more accurately reflect the biological reality and sincerely hope these improvements meet your approval.

      Figure 3G-J: Relative fluorescence intensity of immunohistochemistry is not a valid measure of protein expression.

      We sincerely appreciate your thorough review and valuable comments regarding the immunofluorescence quantification method in Figures 3G-J. In response to your concern that "relative fluorescence intensity is not an effective quantitative measure of protein expression," we have implemented the following improvements to our analysis and validation: To ensure result reliability, all immunofluorescence experiments followed strict protocols: experimental and control samples were fixed, stained, and imaged in the same batch to eliminate inter-batch variability. Imaging was performed using a Zeiss LSM 880 confocal microscope with identical parameters, and the relative fluorescence intensity of specific signals per unit area was measured and statistically analyzed using ZEN software. We fully acknowledge the semi-quantitative nature of relative fluorescence intensity measurements. Therefore, we validated key differentially expressed proteins using Western blot analysis: The Western blot results for Zc3h11a (Figures 6F, H) were completely consistent with the relative fluorescence intensity trends (Figure 3G). Additionally, the newly included Western blot data for PKCα (Figure 3 I) further confirmed the reliability of our relative fluorescence intensity quantification. Your expert advice has significantly enhanced the rigor of our study. Should any additional data or clarification be required, we would be pleased to provide further support.

      Figure 4: what are the arrows pointing at? This should be in the Figure legend. What is MB? Why are there no scale bars? What is difference between E and F, not clear from legend.

      We sincerely appreciate your thorough review of Figure 4 and your valuable suggestions. In response to your concerns, we have carefully examined and improved the relevant content with the following modifications and clarifications: We sincerely apologize for not clearly indicating the arrow annotations in the original figure legend. In the revised version, we have provided detailed explanations for the arrow indicators: black arrows indicate perinuclear space dilation, blue arrows indicate cytoplasmic edema, and red arrows indicate disorganized and loosely arranged membrane discs. The updated legend has been clearly marked below Figure 4 in the main text. MB represents membrane discs, which are critical subcellular structures in the outer segments of retinal photoreceptor cells (rods and cones). They are responsible for light signal capture and transduction (containing visual pigments such as rhodopsin). The structural integrity of MB is essential for normal visual function. The scale bars in the original figures were located in the lower right corner of each subpanel, with specific parameters as follows: Figures 4A and B: magnification ×1000, scale bar 10 μm; Figures 4C and D: magnification ×700, scale bar 20 μm; Figures 4E and G: magnification ×2000, scale bar 5 μm; Figures 4F and H: magnification ×7000, scale bar 2 μm. Both Figures 4E and 4F show electron microscopy images of membrane discs (MB) in wild-type mouse photoreceptor cells. The only difference lies in the magnification: Figure 4E (×2000) demonstrates the overall arrangement pattern of membrane discs, while Figure 4F (×7000) focuses on ultrastructural details of the membrane discs (such as structural integrity). We have thoroughly checked the consistency between the figures and text, and have supplemented detailed legend descriptions in the main text. Once again, we sincerely appreciate your rigorous review, which has significantly enhanced the scientific rigor and readability of our study. Should you have any further suggestions, we would be happy to incorporate them.

      Figure 5A: Why such a large y-axis? Figure legend does not match figure

      We sincerely appreciate your careful review of Figure 5A and your valuable suggestions regarding the figure details. In response to your concerns, we have thoroughly examined and improved the relevant content as follows: The Y-axis of the volcano plot represents -log₁₀(p-value), where the magnitude of the values reflects statistical significance. Our RNA-seq data underwent rigorous multiple testing correction, and the adjusted p-values for some genes were extremely small, resulting in large values after -log₁₀ transformation. We have re-examined the data distribution and confirmed that the expanded Y-axis range is solely due to a small number of highly significant genes (as shown in the figure, the majority of genes remain clustered in the lower half of the Y-axis). This result accurately reflects the true data characteristics.

      We sincerely apologize for the inadvertent error in the original labeling of "Up/Down" in the figure legend. This has now been corrected, and we strictly adhere to the following threshold criteria: Significantly upregulated (Up): adjusted p-value < 0.05 and log₂(FC) ≥ 1. Significantly downregulated (Down): adjusted p-value < 0.05 and log₂(FC) ≤ -1. To ensure the reliability of our conclusions, we have rechecked the raw data, statistical analysis, and visualization process. We confirmed that all significant genes strictly meet the above threshold criteria and that the visualization accurately reflects the true results. The revised figure has been updated in the manuscript as Figure 5A. We deeply appreciate your valuable feedback, which has helped us correct the errors in the figure and improve its accuracy and readability.

      Figure 6F: Based on the western blot, only Zc3h11a appears different.

      Thank you for your careful evaluation of the Western blot data in Figure 6F. We fully understand your concerns regarding the visual differences in PI3K and p-AKT/AKT bands and appreciate the opportunity to clarify the quantitative methodology and biological significance of these findings. Below we provide a detailed explanation of the experimental design and data analysis.

      First, the data for each group were derived from retinal samples of three independent mice, with all experiments performed in parallel to control for technical variability. Image analysis was conducted using ImageJ software with standardized settings for grayscale quantification. Zc3h11a and PI3K levels were normalized to GAPDH as an internal reference, while p-AKT levels were calculated as a ratio to total AKT. The results showed that Zc3h11a protein levels were significantly reduced (p < 0.01, Figures 6F and H), consistent with the expected effects of heterozygous knockout, with good agreement between visual and statistical results. For PI3K and p-AKT/AKT, the bands appeared visually similar due to: The nonlinear nature of Western blot chemiluminescence signals in the saturation range, which compresses subtle quantitative differences in the images; the fact that p-AKT represents only 5-15% of the total AKT pool, making small proportional changes difficult to discern visually. However, it is important to note that both PI3K and p-AKT/AKT showed statistically significant differences between groups (p < 0.001 and p < 0.01, respectively; Figures 6G and I). Furthermore, signal transduction pathways exhibit cascade amplification effects - in the PI3K-AKT pathway, even small changes in upstream proteins can produce significant downstream effects (e.g., NF-κB activation) through kinase cascades (Figure 6J). Additionally, our RNA-Seq results revealed activation of the PI3K-AKT signaling pathway in Zc3h11a Het-KO mice (Figure 5D), and the qRT-PCR results were consistent with the western blot results (Figure 6A-C). Your expert comments have prompted us to present these data differences with greater biological rigor. Although the visual differences are subtle, based on statistical significance, pathway characteristics, and RNA sequencing, and qRT-PCR data, we believe these changes have biological relevance. We sincerely appreciate your commitment to data rigor and respectfully request your recognition of both the experimental results and the scientific logic of this study.

      Figure 8: What is the role of ZC3H11A in this figure? Are the authors proposing that ZC3H11A regulates the translation of IκBα? They have not shown any evidence of that.

      Thank you for your insightful exploration of the role of ZC3H11A in Figure 8. We appreciate your critical review and hope to elucidate the mechanistic framework behind our findings. In Figure 8, Zc3h11a is depicted as a regulator of IκBα mRNA nucleocytoplasmic transport, a mechanism originally elucidated by Darweesh et al. Their studies demonstrated that Zc3h11a binds to IκBα mRNA and promotes its nuclear export. Loss of Zc3h11a results in nuclear retention of IκBα mRNA, leading to reduced cytoplasmic IκBα protein levels and subsequent hyperactivation of the NF-κB pathway. While the specific nuclear export mechanism has been elucidated by Darweesh et al., our study demonstrates that Zc3h11a haploinsufficiency results in decreased IκBα mRNA and protein levels in the retina (Figure 7), linking Zc3h11a haploinsufficiency to NF-κB pathway dysregulation in myopia and highlighting that these molecular insights can be extended to a new pathological context (myopia). Your critical comments have enhanced the clarity of our mechanistic concepts and we hope that these descriptions will demonstrate the importance of ZC3H11A as a new candidate gene for myopia.

    1. eLife Assessment

      This important study provides convincing evidence that the Kinesin protein family member KIF7 regulates the development of the cerebral cortex and its connectivity and the specificity of Sonic Hedgehog signaling by controlling the details of Gli repressor vs activator functions. This study provides new insights into general aspects of cortical development.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting follow-up to a paper published in Human Molecular Genetics reporting novel roles in corticogenesis of the Kif7 motor protein that can regulate the activator as well as the repressor functions of the Gli transcription factors in Shh signalling. This new work investigates how a null mutation in the Kif7 gene affects the formation of corticofugal and thalamocortical axon tracts and the migration of cortical interneurons. It demonstrates that Kif7 null mutant embryos present with ventriculomegaly and heterotopias as observed in patients carrying KIF7 mutations. The Kif7 mutation also disrupts the connectivity between cortex and thalamus and leads to an abnormal projection of thalamocortical axons. Moreover, cortical interneurons show migratory defects that are mirrored in cortical slices treated with the Shh inhibitor cyclopamine suggesting that the Kif7 mutation results in a down-regulation of Shh signalling. Interestingly, these defects are much less severe at later stages of corticogenesis.

      Strengths/weaknesses:

      The findings of this manuscript are clearly presented and are based on detailed analyses. Using a compelling set of experiments, especially the live imaging to monitor interneuron migration, the authors convincingly investigate Kif7's roles and their results support their major claims. The migratory defects in interneurons and the potential role of Shh signalling present novel findings and provide some mechanistic insights but rescue experiments would further support Kif7's role in interneuron migration. Similarly, the mechanism underlying the misprojection which has previously been reported in other cilia mutants remains unexplored. Taken together, this manuscript makes novel contributions to our understanding of the role of primary cilia in forebrain development and to the aetiology of the neural symptons in ciliopathy patients.

      Comments on revisions:

      The authors addressed most of the points I raised in my original review.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the role of KIF7, a ciliary kinesin involved in the Sonic Hedgehog (SHH) signaling pathway, in cortical development using Kif7 knockout mice. The researchers examined embryonic cortex development (mainly at E14.5), focusing on structural changes and neuronal migration abnormalities.

      Strengths:

      (1) The phenotype observed is interesting, and the findings provide neurodevelopmental insight into some of the symptoms and malformations seen in patients with KIF7 mutations.

      (2) The authors assess several features of cortical development, including structural changes in layers of the developing cortex, connectivity of the cortex with thalamus, as well as migration of cINs from CGE and MGE to cortex.

      Comments on revisions:

      The authors have made significant and thoughtful responses as well as experimental additions to the authors comments. Their efforts are appreciated and the manuscript is much improved.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      (1) I am not convinced by the figures the authors present on Shh protein expression. The "bright tiny dots" of Shh protein in the cortex are not visible on the images in Figure 7. I wonder whether the authors could present higher magnification and/or black and white images with increased contrast.

      We have modified Figure 7: we now present a higher magnification and a black and white image with increased contrast to better visualize SHH (+) bright tiny dots in the lateral cortex.

      (2)The manuscript also contains several typos.

      We apologize for these mistakes which have all been corrected.

    1. eLife Assessment

      This study presents useful findings on the application of HPV cfDNA as a marker for monitoring treatment response and prognosis in patients with recurrent or metastatic cervical cancer. The evidence supporting the claims of the authors is solid, although inclusion of a larger number of patient samples would have strengthened the study. The work will be of interest to medics and biologists working on cervical cancer.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Zhuomin Yin and colleagues focuses on the relationship between cell-free HPV (cfHPV) DNA and metastatic or recurrent cervical cancer patients. It expands the application of cfHPV DNA in tracking disease progression and evaluating treatment response in cervical cancer patients. The study is overall well-designed, including appropriate analyses.

      Strengths:

      The findings provide valuable reference points for monitoring drug efficacy and guiding treatment strategies in patients with recurrent and metastatic cervical cancer. The concordance between HPV cfDNA fluctuations and changes in disease status suggests that cfDNA could play a crucial role in precision oncology, allowing for more timely interventions. As with similar studies, the authors used Droplet Digital PCR to measure cfDNA copy numbers, a technique that offers ultrasensitive nucleic acid detection and absolute quantification, lending credibility to the conclusions.

      Weaknesses:

      Despite including 28 clinical cases, only 7 involved recurrent cervical cancer, which may not be sufficient to support some of the authors' conclusions fully. Future studies on larger cohorts could solidify HPV cfDNA's role as a standard in the personalized treatment of recurrent cervical cancer patients.

      Comments on revisions:

      Thanks for your additional efforts and for addressing my concerns.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The study "Monitoring of Cell-free Human Papillomavirus DNA in Metastatic or Recurrent Cervical Cancer: Clinical Significance and Treatment Implications" by Zhuomin Yin and colleagues focuses on the relationship between cell-free HPV (cfHPV) DNA and metastatic or recurrent cervical cancer patients. It expands the application of cfHPV DNA in tracking disease progression and evaluating treatment response in cervical cancer patients. The study is overall well-designed, including appropriate analyses.

      Strengths:

      The findings provide valuable reference points for monitoring drug efficacy and guiding treatment strategies in patients with recurrent and metastatic cervical cancer. The concordance between HPV cfDNA fluctuations and changes in disease status suggests that cfDNA could play a crucial role in precision oncology, allowing for more timely interventions. As with similar studies, the authors used Droplet Digital PCR to measure cfDNA copy numbers, a technique that offers ultrasensitive nucleic acid detection and absolute quantification, lending credibility to the conclusions.

      Weaknesses:

      Despite including 28 clinical cases, only 7 involved recurrent cervical cancer, which may not be sufficient to support some of the authors' conclusions fully. Future studies on larger cohorts could solidify HPV cfDNA's role as a standard in the personalized treatment of recurrent cervical cancer patients.

      (1) The authors should provide source data for Figures 2, 3, and 4 as supplementary material.

      We greatly appreciate your evaluation of our study and fully agree with the limitations you have pointed out. We appreciate your constructive feedback. Based on your suggestions, we have made the following additions to the article. We have realized that the information provided in Figures 2, 3, and 4 is limited. Therefore, we have presented the original data from Figures 2, 3, and 4 in tabular form in Supplementary Table 2.

      (2) Description of results in Figure 2: Figure 2 would benefit from clearer annotations regarding HPV virus subtypes. For example, does the color-coding in Figure 2B imply that all samples in the LR subgroup are of type HPV16? If that is the case, is it possible that detection variations are due to differences in subtype detection efficiency rather than cfDNA levels? The authors should clarify these aspects. Annotation of Figure 2B suggests that the p-value comes from comparing the LR and LN + H + DSM groups. This should be clarified in the legend. If this p-value comes from comparing HPV cfDNA copies for the (LR, LNM, HM) and (LN + HM, LN + HM + DSM) groups, did the authors carry out post-hoc pairwise comparisons? It would be helpful to include acronyms for these groups in the legend also.

      We fully agree with your point regarding the need for clearer labeling of HPV genotypes in Figures 2B and 2C. If each data point could be color-coded to represent the HPV genotype, Figures 2B and 2C would be clearer and provide more information. However, we must acknowledge that due to the limitations of our current graphing software and our graphical expertise, we were unable to fully represent each HPV genotype in the figures. To address this, we have presented the data in Supplementary Table 2. This table shows the HPV genotype for each patient, the corresponding metastasis patterns, and the baseline HPV copy numbers. We hope this will address the limitation of insufficient information in Figure 2.

      The point you raised regarding whether the differences in detection results might stem from variations in subtype detection efficiency rather than cfDNA levels is a valid limitation of this study. Due to the limited sample size, we did not perform subgroup analyses based on different HPV genotypes, which may have introduced bias in the results presented in Figures 2B and 2C. In response, we have added the following clarification in the discussion section (lines 416-422) and addressed this limitation in the limitations section (lines 499-502). Based on your suggestion, we believe that it is essential to expand the sample size and perform subgroup analysis of the baseline copy numbers for each HPV genotype before treatment. We hope to achieve this goal in future studies.

      Thank you for your thoughtful comments regarding the statistical analyses in the study. The p-value in Figure 2B comes from the comparison among five groups, using a two-sided Kruskal-Wallis test. Your suggestion to perform post-hoc pairwise comparisons is excellent and has made the data presentation in the article more rigorous. Following your advice, we conducted pairwise comparisons between the groups. We used the Mann-Whitney U test to compare HPV cfDNA copy numbers between two groups. Since the LR group only had one value, it could not be included in the pairwise comparisons. Significant differences were observed in two comparisons: LNM vs. LN + H + DSM (P = 0.006) and HM vs. LN + H + DSM (P = 0.036). No significant differences were found between the other groups: LNM vs. HM (P = 0.768), LNM vs. LN + HM (P = 0.079), HM vs. LN + HM (P = 0.112), and LN + HM vs. LN + H + DSM (P = 0.145), as determined by the Mann-Whitney U test  (Figure 2B). (Lines 258-263).

      Thank you for your thoughtful suggestion regarding the inclusion of group acronyms in the legends of Figures 2B and 2C. Including the full names corresponding to the abbreviations would indeed enhance clarity. While we attempted to add both acronyms and full names to the figure legend, the full names were too lengthy and impacted the figure's presentation. Therefore, we have provided the full names corresponding to the abbreviations in the figure caption below, to help readers easily understand the abbreviations used in the figure.

      (3) Interpretation of results in Figure 2 and elsewhere: Significant differences detected in Figure 2B could imply potential associations between HPV cfDNA levels (or subtypes) and recurrence/metastasis patterns. Figure 2C shows that there is a difference in cfDNA levels between the groups compared, suggesting an association but this would not necessarily be a direct "correlation". Overall, interpretation of statistical findings would benefit from more precise language throughout the text and overstatement should be avoided.

      Thank you for your insightful comments regarding the interpretation of results in Figure 2 and elsewhere. We acknowledge that there are several limitations in this study, and the interpretation of the results should be more careful and cautious. Indeed, in the results section, there were issues with inaccurate wording and exaggeration. We have made revisions in the discussion section, which are presented as follows: Preliminary results indicate that baseline HPV cfDNA levels may be linked to recurrence/metastasis patterns, potentially reflecting tumor burden and spread (Lines 411-413). Additionally, we have also made changes in the conclusion section, which are presented as follows: The baseline copy number of HPV cfDNA may be associated with metastatic patterns, thereby reflecting tumor burden and the extent of spread to some extent (Lines 511-513).

      (4) The authors state that six patients showed cfDNA elevation with clinically progressive disease, yet only three are represented in Figure 3B1 under "Patients whose disease progressed during treatment." What is the expected baseline variability in cfDNA for patients? If we look at data from patients with early-stage cancer would we see similar fluctuations? And does the degree of variability vary for different HPV subtypes? Without understanding the normal fluctuations in cfDNA levels, interpreting these changes as progression indicators may be premature.

      Thank you for your feedback. We appreciate your thorough review and attention to detail. Six cervical squamous cell carcinoma (SCC) patients exhibited elevated HPV cfDNA levels as their clinical condition progressed. In the previous Figures 3A1 and 3A2, we only presented data from three patients, as we initially believed that displaying the cfDNA curves from three patients would offer a clearer view, while including six patients might lead to overlap and reduce clarity. However, this may have caused confusion for readers. Based on your suggestion, we have revised Figure 3A1 to include the cfDNA curves for all six patients who with squamous cell carcinoma who experienced clinical disease progression during treatment (Figure 3A1), along with the corresponding SCC-Ag curves (Figure 3A2).

      Thank you for highlighting the issue of baseline variability in HPV cfDNA. This is indeed a limitation of our study, which did not address this aspect. If baseline variability is defined as changes in HPV cfDNA levels measured at different time points before treatment in the same patient, fluctuations at different time points are inevitable and objective. Following your suggestion, we have added a discussion on baseline variability in the limitations section of the manuscript to provide readers with a more objective understanding of our study's findings (Lines 501-502).In future studies, we will incorporate baseline variability into the research design to better understand pre-treatment HPV cfDNA fluctuations and provide support for clinical decision-making.

      (5) It would be helpful if where p-values are given, the test used to derive these values was also stated within parentheses e.g. (P < 0.05, permutation test with Benjamini-Hochberg procedure).

      Thank you for your valuable suggestions and examples. Following your advice, we have included the statistical test methods used to obtain the p-values in parentheses wherever they appear in the results section. Additionally, we have specified the statistical test methods for the p-values below the figures in the results section.

      Reviewer #2 (Public review):

      Summary:

      The authors conducted a study to evaluate the potential of circulating HPV cell-free DNA (cfDNA) as a biomarker for monitoring recurrent or metastatic HPV+ cervical cancer. They analyzed serum samples from 28 patients, measuring HPV cfDNA levels via digital droplet PCR and comparing these to squamous cell carcinoma antigen (SCC-Ag) levels in 26 SCC patients, while also testing the association between HPV cfDNA levels and clinical outcomes. The main hypothesis that the authors set out to test was whether circulating HPV cfDNA levels correlated with metastatic patterns and/or treatment response in HPV+ CC.

      The main claims put forward by the paper are that:

      (1) HPV cfDNA was detected in all 28 CC patients enrolled in the study and levels of HPV cfDNA varied over a median 2-month monitoring period.

      (2) 'Median baseline' HPV cfDNA varied according to 'metastatic pattern' in individual patients.

      (3) Positivity rate for HPV cfDNA was more consistent than SCC-Ag.

      (4) In 20 SCC patients monitored longitudinally, concordance with changes in disease status was 90% for HPV cfDNA.

      This study highlights HPV cfDNA as a promising biomarker with advantages over SCC-Ag, underscoring its potential for real-time disease surveillance and individualized treatment guidance in HPV-associated cervical cancer.

      Strengths:

      This study presents valuable insights into HPV+ cervical cancer with potential translational significance for management and guiding therapeutic strategies. The focus on a non-invasive approach is particularly relevant for women's cancers, and the study exemplifies the promising role of HPV cfDNA as a biomarker that could aid personalized treatment strategies.

      Weaknesses:

      While the authors acknowledge the study's small cohort and variability in sequential sampling protocols as a limitation, several revisions should be made to ensure that (1) the findings are presented in a way that aligns more closely with the data without overstatement and (2) that the statistical support for these findings is made more clear. Specific suggestions are outlined below.

      (1) Line 54 in the abstract refers to 'combined multiple-metastasis pattern' but it is not clear what this refers to at this point in the text.

      Thank you for your detailed feedback. You are correct that the "combined multi-metastatic pattern" was not adequately explained in the abstract, which may have caused confusion. To address this, we have clarified the definitions of the combined multi-metastatic pattern and single-metastatic pattern in lines 53-55 of the manuscript. Patients with a combined multi-metastatic pattern (lymph node + hematogenous ± diffuse serosal metastasis)  exhibited a higher median baseline HPV cfDNA level compared to those with a single-metastasis pattern (local recurrence, lymph node metastasis, or hematogenous metastasis) (P = 0.003).

      (2) Line 90 The reference to 'prospective clinical study (NCT03175848) in primary stage IVB CC to investigate the role of radiotherapy (RT) in combination therapy' seems not to be at all relevant at this point in the text. I would limit the description of this study to the methods.

      Thank you for your thoughtful and thorough review. Your suggestions are highly relevant. Upon further reflection, we recognized that this sentence was redundant in its original placement. Following your recommendation, we have removed it from this section and moved it to the methods section (Lines 109-111). The revised statement is as follows: "Notably, 19 cases from the primary CC group participated in our prospective clinical study (NCT03175848), focused on stage IVB cervical cancer."

      (3) Line 56 refers to HPV cfDNA levels (range 0.3-16.9) but what units?

      Thank you for your feedback regarding the manuscript format. While you highlighted this specific issue, we have since identified several other instances of omitted units in parentheses throughout the manuscript. We acknowledge that such formatting oversights can create ambiguity for readers. Following your suggestions, we have corrected all such issues in the manuscript. We greatly appreciate your careful and thorough review.

      (4) Lines 247-248 claim that higher baseline HPV cfDNA levels correlated with a more substantial post-chemotherapy decrease. This correlation should be statistically validated, and the p-value should be included.

      Thank you for your insightful comments, which highlighted an issue with this sentence. Upon review, I have made the necessary revisions. Since no statistical analysis was conducted and the P-value was not provided, the original sentence was imprecise. Given the small sample size, statistical analysis is not feasible. I have revised the sentence as follows: “For patients in whom systemic cytotoxic chemotherapy was effective, a significant decrease in HPV cfDNA levels could be detected after chemotherapy” (Lines 297-298).

      (5) The authors mention that baseline samples were collected "between Day -14 and Day +30 preceding initial treatment." If Day -14 indicates two weeks before treatment, then this would imply some samples were taken up to 30 days post-treatment. This notation should be clarified. To what extent might outliers or more extreme values in Figure 2 driven by variability in how baseline sampling was carried out?

      Thank you for your insightful comments. Undoubtedly, this is indeed a major limitation of our study. These factors could lead to a certain degree of bias in the detection data. The primary reason is that the study was conducted during the COVID-19 pandemic, making it sometimes difficult to conduct sampling regularly. In accordance with your suggestion, I have already added this part of the content to the results section of the article (Lines 266-275). We have also included the variation in baseline sampling as a limitation in the discussion section (Lines 497-499). In future studies, we will strive to improve the study design by ensuring baseline samples are collected prior to treatment, thereby enhancing the reliability of statistical and analytical results.

      (6) Would be useful to amend Figure 1 to show a subset of patients with SCC and a subset of patients who underwent longitudinal monitoring.

      Thank you for your detailed suggestion. Including a subset of pathological types could indeed add more information to Figure 1. However, regarding the pathological types of the patients in this group, we have listed them in Table 1 and Supplementary Table 2. Among the 28 patients, 26 are diagnosed with squamous cell carcinoma, so 92.9% of the patients in this study have squamous cell carcinoma. To avoid making Figure 1 too complex, we decided not to include the pathological type in the figure.

      (7) Line 120 "a time point matching or closely following HPV cfDNA sampling" - what is the time range for 'closely following' here? A couple of hours or days after sampling?

      Thank you for your detailed feedback. Based on your suggestion, we have revised the sentence as follows:

      "For patients with squamous cell CC in the sequential sampling group, concurrent SCC-Ag testing was performed at a time point that matched, or was within 7 days before or after, the HPV cfDNA sampling." (Line 123-125)

      (8) Lines 178-190 and lines 179-180 seem to make exactly the same point.

      Thank you very much for your careful review. Indeed, these two sentences were repetitive and conveyed the same point. I have removed the previous sentence here (lines 206-207).

      (9) In Figure 4, please indicate the number of patients in each group in the legend e.g. HPV16+ (n=x number of patients).

      Thank you for your feedback on the details of Figure 4 and the examples provided. We have updated Figure 4 according to your suggestions and included the number of patients in each group in the figure legend.

      (10) Lines 322-3 'HPV cfDNA predicted treatment response or disease progression at an earlier time point than imaging assessments' - based on the data available and the numbers of patients, I would argue that this is too bold a claim.

      Thank you very much for pointing out this issue. We fully agree with your view. We have modified this sentence as follows: "Secondly, dynamically monitored HPV cfDNA levels appeared to predict treatment response and disease progression. " (Lines 391-392).

    1. eLife Assessment

      This valuable study introduces a modern and accessible PyTorch reimplementation of the widely used SpliceAI model for splice site prediction. The authors provide solid evidence that their OpenSpliceAI implementation matches the performance of the original while improving usability and enabling flexible retraining across species. These advances are likely to be of broad interest to the computational genomics community.

    2. Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "non-local effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pre-training on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Chao et al. produced an updated version of the SpliceAI package using modern deep learning frameworks. This includes data preprocessing, model training, direct prediction, and variant effect prediction scripts. They also added functionality for model fine-tuning and model calibration. They convincingly evaluate their newly trained models against those from the original SpliceAI package and investigate how to extend SpliceAI to make predictions in new species. While their comparisons to the original SpliceAI models are convincing on the grounds of model performance, their evaluation of how well the new models match the original's understanding of non-local mutation effects is incomplete. Further, their evaluation of the new calibration functionality would benefit from a more nuanced discussion of what set of splice sites their calibration is expected to hold for, and tests in a context for which calibration is needed.

      Strengths:

      (1) They provide convincing evidence that their new implementation of SpliceAI matches the performance of the original model on a similar dataset while benefiting from improved computational efficiencies. This will enable faster prediction and retraining of splicing models for new species as well as easier integration with other modern deep learning tools.

      (2) They produce models with strong performance on non-human model species and a simple, well-documented pipeline for producing models tuned for any species of interest. This will be a boon for researchers working on splicing in these species and make it easy for researchers working on new species to generate their own models.

      (3) Their documentation is clear and abundant. This will greatly aid the ability of others to work with their code base.

      We thank the reviewer for these positive comments.  

      Weaknesses:

      (1) The authors' assessment of how much their model retains SpliceAI's understanding of "nonlocal effects of genomic mutations on splice site location and strength" (Figure 6) is not sufficiently supported. Demonstrating this would require showing that for a large number of (non-local) mutations, their model shows the same change in predictions as SpliceAI or that attribution maps for their model and SpliceAI are concordant even at distances from the splice site. Figure 6A comes close to demonstrating this, but only provides anecdotal evidence as it is limited to 2 loci. This could be overcome by summarizing the concordance between ISM maps for the two models and then comparing across many loci. Figure 6B also comes close, but falls short because instead of comparing splicing prediction differences between the models as a function of variants, it compares the average prediction difference as a function of the distance from the splice site. This limits it to only detecting differences in the model's understanding of the local splice site motif sequences. This could be overcome by looking at comparisons between differences in predictions with mutants directly and considering non-local mutants that cause differences in splicing predictions.

      We agree that two loci are insufficient to demonstrate preservation of non-local effects. To address this, we have extended our analysis to a larger set of sites: we randomly sampled 100 donor and 100 acceptor sites, applied our ISM procedure over a 5,001 nt window centered at each site for both models, and computed the ISM map as before. We then calculated the Pearson correlation between the collection of OSAI<sub>MANE</sub> and SpliceAI ISM importance scores. We also created 10 additional ISM maps similar to those in Figure 6A, which are now provided in Figure S23.

      Follow is the revised paragraph in the manuscript’s Results section:

      First, we recreated the experiment from Jaganathan et al. in which they mutated every base in a window around exon 9 of the U2SURP gene and calculated its impact on the predicted probability of the acceptor site. We repeated this experiment on exon 2 of the DST gene, again using both SpliceAI and OSAI<sub>MANE</sub> . In both cases, we found a strong similarity between the resultant patterns between SpliceAI and OSAI<sub>MANE</sub> , as shown in Figure 6A. To evaluate concordance more broadly, we randomly selected 100 donor and 100 acceptor sites and performed the same ISM experiment on each site. The Pearson correlation between SpliceAI and OSAI<sub>MANE</sub> yielded an overall median correlation of 0.857 (see Methods; additional DNA logos in Figure S23). 

      To characterize the local sequence features that both models focus on, we computed the average decrease in predicted splice-site probability resulting from each of the three possible singlenucleotide substitutions at every position within 80bp for 100 donor and 100 acceptor sites randomly sampled from the test set (Chromosomes 1, 3, 5, 7, and 9). Figure 6B shows the average decrease in splice site strength for each mutation in the format of a DNA logo, for both tools.

      We added the following text to the Methods section:

      Concordance evaluation of ISM importance scores between OSAI<sub>MANE</sub> and SpliceAI

      To assess agreement between OSAI<sub>MANE</sub> and SpliceAI across a broad set of splice sites, we applied our ISM procedure to 100 randomly chosen donor sites and 100 randomly chosen acceptor sites. For each site, we extracted a 5,001 nt window centered on the annotated splice junction and, at every coordinate within that window, substituted the reference base with each of the three alternative nucleotides. We recorded the change in predicted splice-site probability for each mutation and then averaged these Δ-scores at each position to produce a 5,001-score ISM importance profile per site.

      Next, for each splice site we computed the Pearson correlation coefficient between the paired importance profiles from ensembled OSAI<sub>MANE</sub> and ensembled SpliceAI. The median correlation was 0.857 for all splice sites. Ten additional zoom-in representative splice site DNA logo comparisons are provided in Supplementary Figure S23.

      (2) The utility of the calibration method described is unclear. When thinking about a calibrated model for splicing, the expectation would be that the models' predicted splicing probabilities would match the true probabilities that positions with that level of prediction confidence are splice sites. However, the actual calibration that they perform only considers positions as splice sites if they are splice sites in the longest isoform of the gene included in the MANE annotation. In other words, they calibrate the model such that the model's predicted splicing probabilities match the probability that a position with that level of confidence is a splice site in one particular isoform for each gene, not the probability that it is a splice site more broadly. Their level of calibration on this set of splice sites may very well not hold to broader sets of splice sites, such as sites from all annotated isoforms, sites that are commonly used in cryptic splicing, or poised sites that can be activated by a variant. This is a particularly important point as much of the utility of SpliceAI comes from its ability to issue variant effect predictions, and they have not demonstrated that this calibration holds in the context of variants. This section could be improved by expanding and clarifying the discussion of what set of splice sites they have demonstrated calibration on, what it means to calibrate against this set of splice sites, and how this calibration is expected to hold or not for other interesting sets of splice sites. Alternatively, or in addition, they could demonstrate how well their calibration holds on different sets of splice sites or show the effect of calibrating their models against different potentially interesting sets of splice sites and discuss how the results do or do not differ.

      We thank the reviewer for highlighting the need to clarify our calibration procedure. Both SpliceAI and OpenSpliceAI are trained on a single “canonical” transcript per gene: SpliceAI on the hg 19 Ensembl/Gencode canonical set and OpenSpliceAI on the MANE transcript set. To calibrate each model, we applied post-hoc temperature scaling, i.e. a single learnable parameter that rescales the logits before the softmax. This adjustment does not alter the model’s ranking or discrimination (AUC/precision–recall) but simply aligns the predicted probabilities for donor, acceptor, and non-splice classes with their observed frequencies. As shown in our reliability diagrams (Fig. S16-S22), temperature scaling yields negligible changes in performance, confirming that both SpliceAI and OpenSpliceAI were already well-calibrated. However, we acknowledge that we didn’t measure how calibration might affect predictions on non-canonical splice sites or on cryptic splicing. It is possible that calibration might have a detrimental effect on those, but because this is not a key claim of our paper, we decided not to do further experiments. We have updated the manuscript to acknowledge this potential shortcoming; please see the revised paragraph in our next response.

      (3) It is difficult to assess how well their calibration method works in general because their original models are already well calibrated, so their calibration method finds temperatures very close to 1 and only produces very small and hard to assess changes in calibration metrics. This makes it very hard to distinguish if the calibration method works, as it doesn't really produce any changes. It would be helpful to demonstrate the calibration method on a model that requires calibration or on a dataset for which the current model is not well calibrated, so that the impact of the calibration method could be observed.

      It’s true that the models we calibrated didn’t need many changes. It is possible that the calibration methods we used (which were not ours, but which were described in earlier publications) can’t improve the models much. We toned down our comments about this procedure, as follows.

      Original:

      “Collectively, these results demonstrate that OSAIs were already well-calibrated, and this consistency across species underscores the robustness of OpenSpliceAI’s training approach in diverse genomic contexts.” Revised:

      “We observed very small changes after calibration across phylogenetically diverse species, suggesting that OpenSpliceAI’s training regimen yielded well‐calibrated models, although it is possible that a different calibration algorithm might produce further improvements in performance.”

      Reviewer #2 (Public review):

      Summary:

      The paper by Chao et al offers a reimplementation of the SpliceAI algorithm in PyTorch so that the model can more easily/efficiently be retrained. They apply their new implementation of the SpliceAI algorithm, which they call OpenSpliceAI, to several species and compare it against the original model, showing that the results are very similar and that in some small species, pretraining on other species helps improve performance.

      Strengths:

      On the upside, the code runs fine, and it is well documented.

      Weaknesses:

      The paper itself does not offer much beyond reimplementing SpliceAI. There is no new algorithm, new analysis, new data, or new insights into RNA splicing. There is no comparison to many of the alternative methods that have since been published to surpass SpliceAI. Given that some of the authors are well-known with a long history of important contributions, our expectations were admittedly different. Still, we hope some readers will find the new implementation useful.

      We thank the reviewer for the feedback. We have clarified that OpenSpliceAI is an open-source PyTorch reimplementation optimized for efficient retraining and transfer learning, designed to analyze cross-species performance gains, and supported by a thorough benchmark and the release of several pretrained models to clearly position our contribution.

      Reviewer #3 (Public review):

      Summary:

      The authors present OpenSpliceAI, a PyTorch-based reimplementation of the well-known SpliceAI deep learning model for splicing prediction. The core architecture remains unchanged, but the reimplementation demonstrates convincing improvements in usability, runtime performance, and potential for cross-species application.

      Strengths:

      The improvements are well-supported by comparative benchmarks, and the work is valuable given its strong potential to broaden the adoption of splicing prediction tools across computational and experimental biology communities.

      Major comments:

      Can fine-tuning also be used to improve prediction for human splicing? Specifically, are models trained on other species and then fine-tuned with human data able to perform better on human splicing prediction? This would enhance the model's utility for more users, and ideally, such fine-tuned models should be made available.

      We evaluated transfer learning by fine-tuning models pretrained on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), Arabidopsis (OSAI<sub>Arabidopsis</sub>), and zebrafish (OSAI<sub>Zebrafish</sub>) on human data. While transfer learning accelerated convergence compared to training from scratch, the final human splicing prediction accuracy was comparable between fine-tuned and scratch-trained models, suggesting that performance on our current human dataset is nearing saturation under this architecture.

      We added the following paragraph to the Discussion section:

      We also evaluated pretraining on mouse (OSAI<sub>Mouse</sub>), honeybee (OSAI<sub>Honeybee</sub>), zebrafish (OSAI<sub>Zebrafish</sub>), and Arabidopsis (OSAI<sub>Arabidopsis</sub>) followed by fine-tuning on the human MANE dataset. While cross-species pretraining substantially accelerated convergence during fine-tuning, the final human splicing-prediction accuracy was comparable to that of a model trained from scratch on human data. This result indicates that our architecture seems to capture all relevant splicing features from human training data alone, and thus gains little or no benefit from crossspecies transfer learning in this context (see Figure S24).

      Reviewer #1 (Recommendations for the authors):

      We thank the editor for summarizing the points raised by each reviewer. Below is our point-bypoint response to each comment:

      (1) In Figure 3 (and generally in the other figures) OpenSpliceAI should be replaced with OSAI_{Training dataset} because otherwise it is hard to tell which precise model is being compared. And in Figure 3 it is especially important to emphasize that you are comparing a SpliceAI model trained on Human data to an OSAI model trained and evaluated on a different species.

      We have updated the labels in Figures 3, replacing “OpenSpliceAI” with “OSAI_{training dataset}” to more clearly specify which model is being compared.

      (2) Are genes paralogous to training set genes removed from the validation set as well as the test set? If you are worried about data leakage in the test set, it makes sense to also consider validation set leakage.

      Thank you for this helpful suggestion. We fully agree, and to avoid any data leakage we implemented the identical filtering pipeline for both validation and test sets: we excluded all sequences paralogous or homologous to sequences in the training set, and further removed any sequence sharing > 80 % length overlap and > 80 % sequence identity with training sequences. The effect of this filtering on the validation set is summarized in Supplementary Figure S7C.

      Figure S7. (C) Scatter plots of DNA sequence alignments between validation and training sets for Human-MANE, mouse, honeybee, zebrafish, and Arabidopsis. Each dot represents an alignment, with the x-axis showing alignment identity and the y-axis showing alignment coverage. Alignments exceeding 80% for both identity and coverage are highlighted in the redshaded region and were excluded from the test sets.

      Reviewer #3 (Recommendations for the authors):

      (1) The legend in Figure 3 is somewhat confusing. The labels like "SpliceAI-Keras (species name)" may imply that the model was retrained using data from that species, but that's not the case, correct?

      Yes, “SpliceAI-Keras (species name)” was not retrained; it refers to the released SpliceAI model evaluated on the specified species dataset. We have revised the Figure 3 legends, changing “SpliceAI-Keras (species name)” to “SpliceAI-Keras” to clarify this.

      (2) Please address the minor issues with the code, including ensuring the conda install works across various systems.

      We have addressed the issues you mentioned. OpenSpliceAI is now available on Conda and can be installed with:  conda install openspliceai. 

      The conda package homepage is at: https://anaconda.org/khchao/openspliceai We’ve also corrected all broken links in the documentation.

      (3) Utility:

      I followed all the steps in the Quick Start Guide, and aside from the issues mentioned below, everything worked as expected.

      I attempted installation using conda as described in the instructions, but it was unsuccessful. I assume this method is not yet supported.

      In Quick Start Guide: predict, the link labeled "GitHub (models/spliceai-mane/10000nt/)" appears to be incorrect. The correct path is likely "GitHub (models/openspliceaimane/10000nt/)".

      In Quick Start Guide: variant (https://ccb.jhu.edu/openspliceai/content/quick_start_guide/quickstart_variant.html#quick-startvariant), some of the download links for input files were broken. While I was able to find some files in the GitHub repository, I think the -A option should point to data/grch37.txt, not examples/data/input.vcf, and the -I option should be examples/data/input.vcf, not data/vcf/input.vcf.

      Thank you for catching these issues. We’ve now addressed all issues concerning Conda installation and file links. We thank the editor for thoroughly testing our code and reviewing the documentation.

    1. eLife Assessment

      The manuscript by Hawes et al. provides important findings on how striatal projection neurons regulate spontaneous locomotion speed in the context of implicit motivation and distinct contextual valence. The supporting evidence for the findings is convincing. This work will be of broad interest to neuroscientists in the fields of basal ganglia, movement control, and cognition.

    2. Reviewer #1 (Public review):

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of naturalistic contexts.

      Strengths:

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.

      Weaknesses:

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. A balanced discussion of the technical strengths and limitations of the present work would be helpful and beneficial to the field.

    3. Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      The strengths of this work include the use of multiple experimental approaches, including genetic/viral ablation of patch neurons, miniscope single-cell imaging, as well as projection-specific recording of axonal activity by fiber photometry, and causal manipulation of the neurons by chemogenetic and optogenetics. Although similar findings were reported previously, the authors' results will be of value owing to multiple levels of investigation. In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum controls movement vigor.

    4. Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues.

      In the revision, the authors have largely addressed my concerns with additional explanation and discussion, although some of the key experiments to strengthen the authors' claim by identifying the function of specific cell populations remain to be conducted due to technical challenges. Nevertheless, the current results remain valuable and interesting to a wide audience in the field.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of the naturalistic context.

      Strengths:

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.

      We appreciate the reviewer’s positive evaluation.

      Weaknesses:

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. A balanced discussion of the technical strengths and limitations of the present work would be helpful and beneficial to the field. Minor issues in data presentation were also noted.

      We have incorporated the recommended discussion of technical limitations and addressed the physiological plausibility of our manipulations on Page 33 of the revised Discussion section. Specifically, we wrote:

      “Judicious interpretation of the present data must consider the technical limitations of the various methods and circuit-level manipulations applied. Patchy neurons are distributed unevenly across the extensive structure of the striatum, and their targeted manipulation is constrained by viral spread in the dorsal striatum. Somatic calcium imaging using single-photon microscopy captures activity from only a subset of patchy neurons within a narrow focal plane beneath each implanted GRIN lens. Similarly, limitations in light diffusion from optical fibers may reduce the effective population of targeted fibers in both photometry and optogenetic experiments. For example, the more modest locomotor slowing observed with optogenetic activation of striatonigral fibers in the SNr compared to the stronger effects seen with Gq-DREADD activation across the dorsal striatum could reflect limited fiber optic coverage in the SNr. Alternatively, it may suggest that non-striatonigral mechanisms also contribute to generalized slowing. Our photometry data does not support a role for striatopallidal projections from patchy neurons in movement suppression. The potential contribution of intrastriatal mechanisms, discussed earlier, remains to be empirically tested. Although the behavioral assays used were naturalistic, many of the circuit-level interventions were not. Broad ablation or widespread activation of patchy neurons and their efferent projections represent non-physiological manipulations. Nonetheless, these perturbation results are interpreted alongside more naturalistic observations, such as in vivo imaging of patchy neuron somata and axon terminals, to form a coherent understanding of their functional role”.

      Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      We are grateful for the reviewer’s thorough summary of our main findings.

      In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum control movement vigor. This study has applied multiple approaches to investigate their functionality in locomotor behavior, and the obtained data largely support their conclusions. Nevertheless, I have some suggestions for improvements in the manuscript and figures regarding their data interpretation, accuracy, and efficacy of data presentation.

      We appreciate the reviewer’s overall positive assessment and have made substantial improvements to the revised manuscript in response to reviewers’ constructive suggestions. 

      (1) The authors found that the activation of the striatonigral pathway in the patch compartment suppresses locomotor speed, which contradicts with canonical roles of the direct pathway. It would be great if the authors could provide mechanistic explanations in the Discussion section. One possibility is that striatal D1R patch neurons directly inhibit dopaminergic cells that regulate movement vigor (Nadal et al., Sci. Rep., 2021; Okunomiya et al., J Neurosci., 2025). Providing plausible explanations will help readers infer possible physiological processes and give them ideas for future follow-up studies.

      We have added the recommended data interpretation and future perspectives on Page 30 of the revised Discussion section. Specifically, we wrote:

      “Potential mechanisms by which striatal patchy neurons reduce locomotion involve the suppression of dopamine availability within the striatum. Dopamine, primarily supplied by neurons in the SNc and VTA, broadly facilitates locomotion (Gerfen and Surmeier 2011, Dudman and Krakauer 2016). Recent studies have shown that direct activation of patchy neurons leads to a reduction in striatal dopamine levels, accompanied by decreased walking speed (Nadel, Pawelko et al. 2021, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Patchy neuron projections terminate in structures known as “dendron bouquets”, which enwrap SNc dendrites within the SNr and can pause tonic dopamine neuron firing (Crittenden, Tillberg et al. 2016, Evans, Twedell et al. 2020). The present work highlights a role for patchy striatonigral inputs within the SN in decelerating movement, potentially through GABAergic dendron bouquets that limit dopamine release back to the striatum (Dong, Wang et al. 2025). Additionally, intrastriatal collaterals of patch spiny projection neurons (SPNs) have been shown to suppress dopamine release and associated synaptic plasticity via dynorphin-mediated activation of kappa opioid receptors on dopamine terminals (Hawes, Salinas et al. 2017). This intrastriatal mechanism may further contribute to the reduction in striatal dopamine levels and the observed decrease in locomotor speed, representing a compelling avenue for future investigation.”

      (2) On page 14, Line 301, the authors stated that "Cre-dependent mCheery signals were colocalized with the patch marker (MOR1) in the dorsal striatum (Fig. 1B)". But I could not find any mCherry on that panel, so please modify it.

      We have included representative images of mCherry and MOR1 staining in Supplementary Fig. S1 of the revised manuscript.

      (3) From data shown in Figure 1, I've got the impression that mice ablated with striatal patch neurons were generally hyperactive, but this is probably not the case, as two separate experiments using LLbox and DDbox showed no difference in locomotor vigor between control and ablated mice. For the sake of better interpretation, it may be good to add a statement in Lines 365-366 that these experiments suggest the absence of hyperactive locomotion in general by ablating these specific neurons.

      As suggested by the reviewer, we have added the following statement on Page 17 of the revised manuscript: “These data also indicate that PA elevates valence-specific speed without inducing general hyperactivity”.

      (4) In Line 536, where Figure 5A was cited, the author mentioned that they used inhibitory DREADDs (AAV-DIO-hM4Di-mCherrry), but I could not find associated data on Figure 5. Please cite Figure S3, accordingly.

      We have added the citation for the now Fig. S4 on Page 25 of the revised manuscript.

      (5) Personally, the Figure panel labels of "Hi" and "ii" were confusing at first glance. It would be better to have alternatives.

      As suggested by the reviewer, we have now labeled each figure panel with a distinct single alphabetical letter.

      (6) There is a typo on Figure 4A: tdTomata → tdTomato

      We have made the correction on the figure.

      Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues. Below are some major concerns:

      The study concludes that patch striatonigral neurons regulate locomotion speed. However, unless I missed something, very little evidence is presented to support the idea that it is specifically striatonigral neurons, rather than striatopallidal neurons, that mediate these effects. In fact, the optogenetic experiments shown in Fig. 6 suggest otherwise. What about the behavioral effects of optogenetic stimulation of striatonigral versus striatopallidal neuron somas in Sepw1-Cre mice?

      Our photometry data implicate striatonigral neurons in locomotor slowing, as evidenced by a negative cross-correlation with acceleration and a negative lag, indicating that their activity reliably precedes—and may therefore contribute to—deceleration. In contrast, photometry results from striatopallidal neurons showed no clear correlation with speed or acceleration.

      Figure 6 demonstrates that optogenetic manipulation within the SNr of Sepw1-Cre<sup>+</sup> striatonigral axons recapitulated context-dependent locomotor changes seen with Gq-DREADD activation of both striatonigral and striatopallidal Sepw1-Cre<sup>+</sup> cells in the dorsal striatum but failed to produce the broader locomotor speed change observed when targeting all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum using either ablation or Gq-DREADD activation. The more subtle speed-restrictive phenotype resulting from ChR activation in the SNr could, as the reviewer suggests, implicate striatopallidal neurons in broad locomotor speed regulation. However, our photometry data indicate that this scenario is unlikely, as activity of striatopallidal Sepw1-Cre<sup>+</sup> fibers is not correlated with locomotor speed. Another plausible explanation is that the optogenetic approach may have affected fewer striatonigral fibers, potentially due to the limited spatial spread of light from the optical fiber within the SNr. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with optogenetics. We have added discussion of these technical limitations to the revised manuscript. Additionally, we now discuss the possibility that intrastriatal collaterals may contribute to reduced local dopamine levels by releasing dynorphin, which acts on kappa opioid receptors located on dopamine fibers (Hawes, Salinas et al. 2017), thereby suppressing dopamine release.

      The reviewer also suggests an interesting experiment involving optogenetic stimulation of striatonigral versus striatopallidal somata in Sepw1-Cre mice. While we agree that this approach would yield valuable insights, we have thus far been unable to achieve reliable results using retroviral vectors. Moreover, selectively targeting striatopallidal terminals optogenetically remains technically challenging, as striatonigral fibers also traverse the pallidum, and the broad anatomical distribution of the pallidum complicates precise targeting. This proposed work will need to be pursued in a future study, either with improved retrograde viral tools or the development of additional mouse lines that offer more selective access to these neuronal populations as we documented recently (Dong, Wang et al. 2025).

      In the abstract, the authors state that patch SPNs control speed without affecting valence. This claim seems to lack sufficient data to support it. Additionally, speed, velocity, and acceleration are very distinct qualities. It is necessary to clarify precisely what patch neurons encode and control in the current study.

      We believe the reviewer’s interpretation pertains to a statement in the Introduction rather than the Abstract: “Our findings reveal that patchy SPNs control the speed at which mice navigate the valence differential between high- and low-anxiety zones, without affecting valence perception itself.” Throughout our study, mice consistently preferred the dark zone in the Light/Dark box, indicating intact perception of the valence differential between illuminated areas. While our manipulations altered locomotor speed, they did not affect time spent in the dark zone, supporting the conclusion that valence perception remained unaltered. We appreciate the reviewer’s insight and agree it is an intriguing possibility that locomotor responses could, over time, influence internal states such as anxiety. We addressed this in the Discussion, noting that while dark preference was robust to our manipulations, future studies are warranted to explore the relationship between anxious locomotor vigor and anxiety itself.

      We report changes in scalar measures of animal speed across Light/Dark box conditions and under various experimental manipulations. Separately, we show that activity in both patchy neuron somata and striatonigral fibers is negatively correlated with acceleration—indicating a positive correlation with deceleration. Notably, the direction of the cross-correlational lag between striatonigral fiber activity and acceleration suggests that this activity precedes and may causally contribute to mouse deceleration, thereby influencing reductions in speed. To clarify this, we revised a sentence in the Results section: “Moreover, patchy neuron efferent activity at the SNr may causally contribute to deceleration, as indicated by the negative cross-correlational lag, thereby reducing animal speed.”. We also updated the Discussion to read: “Together, these data specifically implicate patchy striatonigral neurons in slowing locomotion by acting within the SNr to drive deceleration.”

      One of the major results relies on chemogenetic manipulation (Figure 5). It would be helpful to demonstrate through slice electrophysiology that hM3Dq and hM4Di indeed cause changes in the activity of dorsal striatal SPNs, as intended by the DREADD system. This would support both the positive (Gq) and negative (Gi) findings, where no effects on behavior were observed.

      We were unable to perform this experiment; however, hM3Dq has previously been shown to be effective in striatal neurons (Alcacer, Andreoli et al. 2017). The lack of effect observed in Gi-DREADD mice serves as an unintended but valuable control, helping to rule out off-target effects of the DREADD agonist JHU37160 and thereby reinforcing the specificity of hM3Dq-mediated activation in our study. We have now included an important caveat regarding the Gi-DREADD results, acknowledging the possibility that they may not have worked effectively in our target cells: “Potential explanations for the negative results in Gi-DREADD mice include inherently low basal activity among patchy neurons or insufficient expression of GIRK channels in striatal neurons, which may limit the effectiveness of Gi-coupling in suppressing neuronal activity (Shan, Fang et al. 2022).

      Finally, could the behavioral effects observed in the current study, resulting from various manipulations of patch SPNs, be due to alterations in nigrostriatal dopamine release within the dorsal striatum?

      We agree that this is an important potential implication of our work, especially given that we and others have shown that patchy striatonigral neurons provide strong inhibitory input to dopaminergic neurons involved in locomotor control (Nadel, Pawelko et al. 2021, Lazaridis, Crittenden et al. 2024, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Accordingly, we have expanded the discussion section to include potential mechanistic explanations that support and contextualize our main findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some minor issues for the authors' reference:

      (1) This work supports the motor-suppressing effect of patchy SPNs, and >80% of them are direct pathway SPNs. This conclusion is not expected from the traditional basal ganglia direct/indirect pathway model. Most experiments were performed using nonphysiological approaches to suppress (i.e., ablation) or activate (i.e., continuous chemo-optogenetic stimulation). It remains uncertain if the reported observations are relevant to the normal biological function of patchy SPNs under physiological conditions. Particularly, under what circumstances an imbalanced patch/matrix activity may be induced, as proposed in the sections related to the data presented in Figure 6. A thorough discussion and clarification remain needed. Or it should be discussed as a limitation of the present work.

      We have added discussion and clarification of physiological limitations in response to reviewer feedback. Additionally, we revised the opening sentence of an original paragraph in the discussion section to emphasize that it interprets our findings in the context of more physiological studies reporting natural shifts in patchy SPN activity due to cognitive conflict, stress, or training. The revised opening sentence now reads: “Together with previous studies of naturally occurring shifts in patchy neuron activation, these data illustrate ethologically relevant roles for a subgroup of genetically defined patchy neurons in behavior.”

      (2) Lines 499-500: How striato-nigral cells encode speed and deceleration deserves a thorough discussion and clarification. These striatonigral cells can target both SNr GABAergic neurons and dendrites of the dopaminergic neurons. A discussion of microcircuits formed by the patchy SPNs axons in the SNr GABAergic and SNC DAergic neurons should be presented.

      We have added this point at lines 499–500, including a reference to a relevant review of microcircuitry. Additionally, we expanded the discussion section to address microcircuit mechanisms that may underlie our main findings.

      (3) Line 70: "BNST" should be spelled out at the first time it is mentioned.

      This has been done.

      (4) Line 133: only GCaMP6 was listed in the method, but GCaMP8 was also used (Figure 4). Clarification or details are needed.

      Thank you for your careful attention to detail. We have corrected the typographical errors in the Methods section. Specifically, in the Stereotaxic Injections section, we corrected “GCaMP83” to “GCaMP8s.” In the Fiber Implant section, we removed the incorrect reference to “GCaMP6s” and clarified that GCaMP8s was used for photometry, and hChR2 was used for optogenetics.

      (5) Line 183: Can the authors describe more precisely what "a moment" means in terms of seconds or minutes?

      This has been done.

      (6) Line 288: typo: missing / in ΔF.

      Thank you this has been fixed.

      (7) Line 301-302: the statement of "mCherry and MOR1 colocalization" does not match the images in Figure 1B.

      This has been corrected by proving a new Supplementary Figure S1.

      (8) Related to the statement between Lines 303-304: Figure 1c data may reflect changes in MOR1 protein or cell loss. Quantification of NeuN+ neurons within the MOR1 area would strengthen the conclusion of 60% of patchy cell loss in Figure 1C.

      Since the efficacy of AAV-FLEX-taCasp3 in cell ablation has been well established in our previous publications and those of others (Yang, Chiang et al. 2013, Wu, Kung et al. 2019), we do not believe the observed loss of MOR1 staining in Fig. 1C merely reflects reduced MOR1 expression. Moreover, a general neuronal marker such as NeuN may not reliably detect the specific loss of patchy neurons in our ablation model, given the technical limitations of conventional cell-counting methods like MBF’s StereoInvestigator, which typically exhibit a variability margin of 15–20%.

      (9) Lines 313-314: "Similarly, PA mice demonstrated greater stay-time in the dark zone (Figure 1E)." Revision is needed to better reflect what is shown in Figure 1E and avoid misunderstandings.

      Thank you this has been addressed.

      (10) The color code in Figure 2Gi seems inconsistent with the others? Clarifications are needed.

      Color coding in Figure 2Gi differs from that in 2Eii out of necessity. For example, the "Light" cells depicted in light blue in 2Eii are represented by both light gray and light red dots in 2Gi. Importantly, Figure 2G does not encode specific speed relationships; instead, any association with speed is indicated by a red hue.

      (11) Lines 538-539: the statement of "Over half of the patch was covered" was not supported by Figure 5C. Clarification is needed.

      Thank you. For clarity, we updated the x-axis labels in Figures 1C and 5C from “% area covered” to “% DS area covered,” and defined “DS” as “dorsal striatal” in the corresponding figure legends. Additionally, we revised the sentence in question to read: “As with ablation, histological examination indicated that a substantial fraction of dorsal patch territories, identified through MOR1 staining, were impacted (Fig. 5C).”

      (12) Figure 3: statistical significance in Figure 3 should be labeled in various panels.

      We believe the reviewer's concern pertains to the scatter plot in panel F—specifically, whether the data points are significantly different from zero. In panel 3F, the 95% confidence interval clearly overlaps with zero, indicating that the results are not statistically significant.

      (13) Figures 6D-E: no difference in the speed of control mice and ChR2 mice under continuous optical stimulation was not expected. It was different from Gq-DRADDS study in Figure 5E-F. Clarifications are needed.

      For mice undergoing constant ChR2 activation of Sepw1-Cre<sup>+</sup> SNr efferents, overall locomotor speed does not differ from controls. However, the BIL (bright-to-illuminated) effect on zone transitions is disrupted: activating Sepw1-Cre<sup>+</sup> fibers in the SNr blunts the typical increase in speed observed when mice flee from the light zone toward the dark zone. This impaired BIL-related speed increase upon exiting the light was similarly observed in the Gq-DREADD cohort. The reviewer is correct that this optogenetic manipulation within the SNr did not produce the more generalized speed reductions seen with broader Gq-DREADD activation of all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum. A likely explanation is the difference in targeting—ChR2 specifically activates SNr-bound terminals, whereas Gq-DREADD broadly activates entire Sepw1-Cre<sup>+</sup> cells. Notably, many of the generalized speed profile changes observed with chemogenetic activation are opposite to those resulting from broad ablation of Sepw1-Cre<sup>+</sup> cells.

      The more subtle speed-restrictive phenotype observed with ChR2 activation targeted to the SNr may suggest that fewer striatonigral fibers were affected by this technique, possibly due to the limited spread of light from the fiber optic. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with an optogenetic approach. Alternatively, it could indicate that non-striatonigral Sepw1-Cre+ projections—such as striatopallidal or intrastriatal pathways—play a role in more generalized slowing. If striatopallidal fibers contributed to locomotor slowing, we would expect to see non-zero cross-correlations between neural activity and speed or acceleration, along with negative lag indicating that neural activity precedes the behavioral change. However, our fiber photometry data do not support such a role for Sepw1-Cre+ striatopallidal fibers.

      We have also referenced the possibility that intrastriatal collaterals could suppress striatal dopamine levels, potentially explaining the stronger slowing phenotype observed when the entire striatal population is affected, as opposed to selectively targeting striatonigral terminals.

      These technical considerations and interpretive nuances have been incorporated and clarified in the revised discussion section.

      (14) Lines 632: "compliment": a typo?

      Yes, it should be “complement”.

      (15) Figure 4 legend: descriptions of panels A and B were swapped.

      Thank you. This has been corrected.

      6) Friedman (2020) was listed twice in the bibliography (Lines 920-929).

      Thank you. This has been corrected.

      Reviewer #3 (Recommendations for the authors):

      It will be helpful to label and add figure legends below each figure.

      Thank you for the suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript. We noted some instances where only p values are reported.

      Readers would also benefit from coding individual data points by sex and noting N/sex.

      We have included detailed statistical information in the revised manuscript. Both male and female mice were used in all experiments in approximately equal numbers. Since no sex-related differences were observed, we did not report the number of animals by sex.

      References

      Alcacer, C., L. Andreoli, I. Sebastianutto, J. Jakobsson, T. Fieblinger and M. A. Cenci (2017). "Chemogenetic stimulation of striatal projection neurons modulates responses to Parkinson's disease therapy." J Clin Invest 127(2): 720-734.

      Crittenden, J. R., P. W. Tillberg, M. H. Riad, Y. Shima, C. R. Gerfen, J. Curry, D. E. Housman, S. B. Nelson, E. S. Boyden and A. M. Graybiel (2016). "Striosome-dendron bouquets highlight a unique striatonigral circuit targeting dopamine-containing neurons." Proc Natl Acad Sci U S A 113(40): 11318-11323.

      Dong, J., L. Wang, B. T. Sullivan, L. Sun, V. M. Martinez Smith, L. Chang, J. Ding, W. Le, C. R. Gerfen and H. Cai (2025). "Molecularly distinct striatonigral neuron subtypes differentially regulate locomotion." Nat Commun 16(1): 2710.

      Dudman, J. T. and J. W. Krakauer (2016). "The basal ganglia: from motor commands to the control of vigor." Curr Opin Neurobiol 37: 158-166.

      Evans, R. C., E. L. Twedell, M. Zhu, J. Ascencio, R. Zhang and Z. M. Khaliq (2020). "Functional Dissection of Basal Ganglia Inhibitory Inputs onto Substantia Nigra Dopaminergic Neurons." Cell Rep 32(11): 108156.

      Gerfen, C. R. and D. J. Surmeier (2011). "Modulation of striatal projection systems by dopamine." Annual review of neuroscience 34: 441-466.

      Hawes, S. L., A. G. Salinas, D. M. Lovinger and K. T. Blackwell (2017). "Long-term plasticity of corticostriatal synapses is modulated by pathway-specific co-release of opioids through kappa-opioid receptors." J Physiol 595(16): 5637-5652.

      Lazaridis, I., J. R. Crittenden, G. Ahn, K. Hirokane, T. Yoshida, A. Mahar, V. Skara, K. Meletis, K. Parvataneni, J. T. Ting, E. Hueske, A. Matsushima and A. M. Graybiel (2024). "Striosomes Target Nigral Dopamine-Containing Neurons via Direct-D1 and Indirect-D2 Pathways Paralleling Classic Direct-Indirect Basal Ganglia Systems." bioRxiv.

      Nadel, J. A., S. S. Pawelko, J. R. Scott, R. McLaughlin, M. Fox, M. Ghanem, R. van der Merwe, N. G. Hollon, E. S. Ramsson and C. D. Howard (2021). "Optogenetic stimulation of striatal patches modifies habit formation and inhibits dopamine release." Sci Rep 11(1): 19847.

      Okunomiya, T., D. Watanabe, H. Banno, T. Kondo, K. Imamura, R. Takahashi and H. Inoue (2025). "Striosome Circuitry Stimulation Inhibits Striatal Dopamine Release and Locomotion." J Neurosci 45(4).

      Shan, Q., Q. Fang and Y. Tian (2022). "Evidence that GIRK Channels Mediate the DREADD-hM4Di Receptor Activation-Induced Reduction in Membrane Excitability of Striatal Medium Spiny Neurons." ACS Chem Neurosci 13(14): 2084-2091.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

      Yang, C. F., M. C. Chiang, D. C. Gray, M. Prabhakaran, M. Alvarado, S. A. Juntti, E. K. Unger, J. A. Wells and N. M. Shah (2013). "Sexually dimorphic neurons in the ventromedial hypothalamus govern mating in both sexes and aggression in males." Cell 153(4): 896-909.

    1. eLife Assessment

      This important study is the first characterization of the phenotype caused by a lack of Eml3 expression in mice. Mutant animals present a disrupted pial basement membrane, leading to focal extrusions from the cerebral cortex, called ectopias. The methodology is convincing and the conclusions are solid, although further investigations on the mechanisms and inclusion of the experiments performed, but not presented, will improve the manuscript. This work would be of interest to neural development biologists and human geneticists working on brain disorders.

    2. Reviewer #1 (Public review):

      Summary:

      The paper describes the initial characterization of Eml3 knockout mice. Eml3 global inactivation leads to delayed embryonic development, perinatal lethality apparently due to failure to inflate lungs, and a cobblestone brain-like phenotype represented by focal neuronal ectopias in the marginal zone or subarachnoid space of dorsal telencephalon. The neural ectopias are associated with interruptions in the pial basal membrane (PBM), which appear around E11.5. The authors also confirmed previously described protein interactions, using coIP-MS experiments of placenta and embryonic tissues (TUBB3, several 14-3-3 proteins, and DYNLL). The authors generated mice carrying a TQT86AAA homozygous mutation in EML3 (a motif required for EML3-DYNLL interactions) that were normal and showed no focal neuronal ectopias, indicating that this particular protein interaction is dispensable. The authors propose Eml3 knockout mice as a model of cobblestone brain malformation.

      Strengths:

      The brain phenotype described in this work is relevant for the neural development field and with potential clinical relevance. The initial phenotyping is appropriate but will require additional experiments to establish the cause of the failure to inflate the lungs. The study shows convincing data regarding the main characteristics of the brain phenotype and data supporting the timing when these abnormalities arise during development.

      Weaknesses:

      The study would benefit from clearer evidence and additional experiments that would help to establish the molecular and cellular mechanisms underlying the brain phenotype, the central topic of the work.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of the microtubule-binding protein EML3 during cortical development through the generation and characterization of an Eml3 mouse mutant. The authors focus mainly on the effects of EML3 loss on brain development, although Eml3 mouse mutants also present with developmental delay and growth restriction, and die perinatally due to respiratory distress caused by delayed maturation of the lungs. The main finding in the developing cortex is the presence of focal neuronal ectopias, which contain neurons from all cortical layers, as revealed by immunostaining. The authors use electron microscopy to show that ectopias seem to be caused by disruption to the pial basement membrane at early stages of development, which allows neurons to breach through it. To find a functional link between EML3 and the observed phenotype, studies are conducted that demonstrate expression of EML3 in radial glia cells and mesenchymal cells, both cell types involved in the formation and maintenance of the pial basement membrane. Furthermore, interaction partners for EML3 are identified through coIP-MS analysis, including tubulin beta-3, 14-3-3 proteins, and cytoplasmic dynein light chain. However, mice carrying a mutant EML3 allele engineered to abolish the interaction between EML3 and cytoplasmic dynein light chain do not recapitulate any of the symptoms of complete EML3 loss.

      Strengths:

      The manuscript offers several important strengths that contribute significantly to the field. This study presents the first characterization of Eml3 knockout animals, providing novel insights into the role of Eml3 in vivo. Information on Eml3 function so far was restricted to cell culture data, so the results in this manuscript start to fill an important gap in our knowledge about this microtubule-binding protein. The experimental approach is carefully designed, with appropriate controls that ensure the reliability of the data. Moreover, the authors have addressed a key challenge in the analysis, namely the developmental delay of the knockout animals. By implementing a strategy to match developmental stages between wild-type and knockout groups, they allow for meaningful and valid comparisons between the two genotypes. Importantly, the authors have successfully generated three different Eml3 mutant mouse lines (knockout, floxed, and with disrupted binding to cytoplasmic dynein light chain), which are very valuable tools for the broader scientific community to further study the roles of this gene in development and disease in the future.

      Weaknesses:

      While the manuscript presents valuable data, there are also several weaknesses that limit the overall impact of the study. Most notably, there is no clear mechanistic link established between the loss of Eml3 function and the observed phenotype, leaving the biological significance of the findings somewhat speculative, as it is not straightforward how a microtubule-associated protein can have an impact on the stability of the pial basement membrane. In this respect, but also in general for the whole manuscript, there seems to be a considerable amount of experimental work that has been conducted but is not presented, possibly due to the negative nature of the results. At least some of those results could be shown, particularly (but not only) the stainings for the composition of the ECM components. Additionally, the phenotype reported appears to be dependent on the genetic background, as it is absent in the CD1 strain. This observation raises concerns as to how robust the results are and how much they can be generalized to other mouse strains, but, more importantly, to humans. There is no data included in the manuscript about the generation and analysis of the Eml3AAA/AAA mouse line. This is an important omission, especially as no details on the validation or phenotypic characterization of this additional mouse line are provided. Including these elements would greatly strengthen the rigor and interpretability of the work, especially if that mouse line is to be shared with the scientific community.

    4. Reviewer #3 (Public review):

      Summary:

      This work aims to understand the role of Echinoderm Microtubule-associated Protein-like 3 (EML3) in embryogenesis and neocortical development. Importantly, this work shows that depletion of EML3 causes focal neuronal ectopias by disrupting the structural integrity of the pial basement membrane, describing a new model of cobblestone brain malformation. Another member of the EML family, EML1, has already been shown to trigger neuronal migration disorders, particularly subcortical band heterotopia, by affecting cell polarity. The results presented here point to a different mechanism of action. The authors show that EML3 is expressed in radial glia cells and mesenchymal cells in the pial region, and upon EML3 depletion (i.e., Eml3 mutant mice), the pial basement membrane is structurally damaged, allowing migrating neuroblasts to ectopically migrate through. Answering, in this case, that the weakening of the pial basement membrane is a prerequisite for focal neuronal ectopias. The authors provide a meticulous characterization of the Eml3 mutant mice, strengthening the conclusions of the results.

      Strengths:

      The authors provide a very detailed analysis of the defects observed in Eml3 mutant mice, by providing not only results by inferred day of conception but also by classifying embryos by their number of somite pairs.

      Weaknesses:

      (1) Besides the data provided in the figures, the authors report a significant amount of experiments/results as "Data not shown". Negative data is still important data to report, and the authors may want to choose some crucial "not shown data" to report in the manuscript.

      (2) Results in Figure 3A apparently contradict results in 3B. A better explanation of the results should improve understanding of the data. Even though the conclusion that the "onset and progression of neurogenesis is normal in Eml3 null mice" seems logical based on the data, the final numbers are not (Figure 3A) and this should be acknowledged, as well.

      (3) The authors should define which cell types are identified by SOX1 and PAX6.

    5. Author response:

      Reviewer #1 (Public Review):

      The study would benefit from clearer evidence and additional experiments that would help to establish the molecular and cellular mechanisms underlying the brain phenotype, the central topic of the work.

      We agree that additional experiments are necessary to elucidate the mechanism(s) by which EML3 deficiency causes the observed developmental phenotypes. However, as no further experimentation is possible due to the closure of our laboratory, we are committed to sharing available materials—including custom antibodies and cryopreserved sperm from our mouse lines. We will include previously generated experimental data not presented in the original submission. While these additional data do not reveal the mechanisms, we believe that sharing hypotheses that were experimentally ruled out will benefit the scientific community.

      Reviewer #2 (Public Review):

      While the manuscript presents valuable data, there are also several weaknesses that limit the overall impact of the study. Most notably, there is no clear mechanistic link established between the loss of Eml3 function and the observed phenotype, leaving the biological significance of the findings somewhat speculative, as it is not straightforward how a microtubule-associated protein can have an impact on the stability of the pial basement membrane. In this respect, but also in general for the whole manuscript, there seems to be a considerable amount of experimental work that has been conducted but is not presented, possibly due to the negative nature of the results. At least some of those results could be shown, particularly (but not only) the stainings for the composition of the ECM components.

      We agree that additional experiments are necessary to elucidate the mechanisms at play. While we cannot conduct further experiments, we will include additional existing data, including supplemental ECM component staining, in a new figure or panel. As this reviewer rightly anticipated, these results might not clarify the mechanism but sharing the hypotheses that were already experimentally tested will be helpful.

      Additionally, the phenotype reported appears to be dependent on the genetic background, as it is absent in the CD1 strain. This observation raises concerns as to how robust the results are and how much they can be generalized to other mouse strains, but, more importantly, to humans.

      Indeed, we have determined that genetic background greatly influences the manifestation of developmental defects caused by absence or mutation of the EML3 protein in mice. Modifier genes appear to play a significant role in phenotypic expression. In humans, the presence or absence of such modifiers may result in a broad spectrum of outcomes—from no clinical relevance, as seen in CD1 mice, to potential intrauterine mortality. We agree that this underscores the challenge of translating mouse model findings to human implications. Future studies could include a search for EML3 non-coding regulatory mutations and expanded analysis of neuronal development defects, such as COB, as well as cases of intrauterine growth restriction (IUGR).

      There is no data included in the manuscript about the generation and analysis of the Eml3AAA/AAA mouse line. This is an important omission, especially as no details on the validation or phenotypic characterization of this additional mouse line are provided. Including these elements would greatly strengthen the rigor and interpretability of the work, especially if that mouse line is to be shared with the scientific community.

      We acknowledge this oversight and will add a Materials and Methods section describing the generation of Eml3 TQT86AAA mice as well as validation and phenotypic characterizations that were done for that mouse line.

      Reviewer #3 (Public Review):

      Besides the data provided in the figures, the authors report a significant amount of experiments/results as "Data not shown". Negative data is still important data to report, and the authors may want to choose some crucial "not shown data" to report in the manuscript.

      We will incorporate key datasets previously omitted, with priority given to those requested by Reviewer #2.

      Results in Figure 3A apparently contradict results in 3B. A better explanation of the results should improve understanding of the data. Even though the conclusion that the "onset and progression of neurogenesis is normal in Eml3 null mice" seems logical based on the data, the final numbers are not (Figure 3A) and this should be acknowledged, as well.

      We will provide further explanations for the data presented in figures 3A and 3B to better convey the fact that the two datasets are not contradicting. In essence, since Eml3 null mice are developmentally delayed (as determined by the number of somites at a specific age, Fig. 1C), the milestones in neurogenesis are reached at a later age in Eml3 null mice (Fig. 3A). However, Eml3 null mice have reached the same neurogenesis milestones as their WT counterparts when they have the same number of somites (Fig. 3B).

      The authors should define which cell types are identified by SOX1 and PAX6.

      We will expand our manuscript to define the expression timing and cell identity marked by SOX1 and PAX6 in neural progenitors during cortical development.

    1. eLife Assessment

      This paper discusses the cognitive implications of potential intentional burial, wall engraving creation, and fire as light source use behaviors by relatively small-brained Homo naledi hominins. The discussion presented in the paper is valuable theoretically in its healthy questioning of prior assumptions concerning the socio-biological constraints of hominin meaning-making behavior. The discussion also contributes practically given that these behaviors have been ascribed to Homo naledi in two associated papers. Still, the present paper does not fully engage with the extent to which the strength of evidence supporting the H. naledi behavior conclusions across the two associated papers remains actively questioned, and thus the inferences here may be considered incomplete. The ultimate assessment of this work will vary among individual readers depending on how they view this debate, at least until further evidence leading to a broader consensus is published.

    2. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editors at eLife and the one reviewer for engaging our revised manuscript. As we noted in our previous response to reviewers, which we wrote in October 2024 when we submitted our initial revision the majority of critique we received was targeted not so much at the argument of this manuscript but at the debate regarding the evidence in the two other manuscripts that this one accompanied; “ Evidence for deliberate burial of the dead by Homo naledi” and “241,000 to 335,000 Years Old Rock Engravings Made by Homo naledi in the Rising Star Cave system, South Africa.” Because of that critique we revised this manuscript to emphasize that the key element in constructing our argument is that H. naledi engaged in mortuary behavior (the movement of dead H. naledi by living H. naledi into the Rising Star cave system) and place that in context of a) the increasingly complex later Pleistocene record of meaning making activity and b) the assumed correlations between brain size and cognitive capacities in Pliocene and Pleistocene hominins. This framing, as noted in the eLife editorial comment, is the main thrust of our manuscript. There is a growing convergence of evidence that totality of the currently available data and analyses for H. naledi in the Rising Star cave system support mortuary behavior: that is, the agential and intentional action by H. naledi individuals in the transport of bodies to the Lesedi Chamber and Dinaledi Subsystem--see Berger et al. 2025 plus the 2nd round reviews and the eLife editorial comment associated with it, and also Van Rooyen et al. 2025. We acknowledge the serious debates around the assertion of funerary behavior (cultural burial) and seek to illustrate that while we believe the data support the funerary behavior hypothesis, it is not a necessary requirement for our main argument.

      A few specific responses to the reviewer in this revised manuscript:

      Reviewer states: “Claims for a positive correlation between absolute and/or relative brain size and cognitive ability are not common in discussions surrounding the evolution of Middle- and Late Pleistocene hominin behavior.” We are not making the argument that absolute brain size in the later Pleistocene is a point of focus, rather that there are many arguments and assertions about EQ and cognitive capacity that are central in the proposals for the evolution of hominins in general and genus Homo in particular across the Plio-pleistocene period. We offer a brief review of this in the text and suggest, as noted by this reviewer, that “exploration of the specific/potential socio-cultural, neuro-structural, ecological and other factors will be more informative than the emphasis on absolute/relative brain size”…this (in their words) is exactly our main point. However, we contend that such a framing should not be exclusive to later Pleistocene contexts, but rather that the examination of earlier hominins might also be better served by moving away from the traditional assumptions of cognitive complexity associated with absolute/relative brain size. The reviewer states: “The authors use, in a number of instances throughout the paper, secondary sources of information such as review papers (e.g., McBrearty & Brooks 2000; Scerri & Will 2023; Galway-Witham et al. 2019) instead of the original works that are the basis for making the desired case.” We do indeed use review papers in the main body of the text for clarity, brevity, and to acknowledge robust previous review work in these areas, however in the supplemental text and with the figures and table we offer substantive bibliographies of the original citations and studies. We encourage readers to please spend time with those materials as well. Finally, the reviewer states: “Given the inadequate analyses in the accompanying papers, and the lack of evidence for stone tools in the naledi sites, the present claims for the expression of culturally and symbolically mediated behaviors by this small-brained hominin must be adequately established.” We are quite specific in this manuscript, and in other publications, that we are not arguing for “symbolically mediated” behavior, but do stand by our non-controversial suggestions of meaning-making, and cultural behavior, as relevant in Pleistocene hominins (e.g. Kissel and Fuentes 2017, 2018). We do not argue that stone tools are necessary as mandatory indicators of such possibilities and lay out the H. naledi information in the context of the broader and increasing datasets and analyses for meaning-making behavior in Pleistocene hominins (see Figure 1 and table 1, and in the text).

      Our point with this manuscript which we reiterate here is that “The increasing data for complex behavior and meaning-making across the Pleistocene should play a major element in structuring how we investigate, explain, and model the origins and patterns of hominin and human evolution” and we feel that the current evidence for H. naledi behavior contributes to the broader suites of data, hypotheses, analyses, and theory building in this endeavor.


      The following is the authors’ response to the original reviews.

      Before laying out how we addressed the specific comments on this manuscript we want to clarify the goal and intent of this paper to maximize effective critical reading of its contents. We appreciate and look forward to continued critique and enhanced discussion of this topic and argument.

      Our starting point for constructing the argument in this manuscript is that H. naledi engaged in mortuary behavior. This emerges from the totality of the currently available data and analyses for Homo naledi in the Rising Star cave system, which support agential and intentional action by Homo naledi individuals in the transport of bodies to the Lesedi Chamber and Dinaledi Subsystem. We do feel that the data support the cultural burial hypothesis as well as the likelihood that at least some of the markings reported as engravings are non-naturally occurring (see Martinón-Torres et al. 2024) and made by Homo naledi. But these two elements are not necessary for the validity of the argument we pursue in this manuscript.

      Our second key point is that gross brain size does not necessarily correlate with particular patterns of complex behavior in Pleistocene hominins. On this there is wide agreement, yet both scholarly and public arguments for the success of the genus Homo and the success of Homo sapiens have incorporated an assumption of a Rubicon of cerebral size. From this we propose a third point: that smaller brained Pleistocene hominins, including Homo naledi, are part of a Pleistocene hominin niche that includes patterns of complex social and cognitive behavior. Such behavior was historically considered to be exclusive to Homo sapiens but is now documented to occur earlier, across a range of hominin taxa in the latter half of the Pleistocene. We offer the case of H. naledi behavior in the Rising Star system as an example of this. This case contributes to the development of a broader approach to the cognitive, physiological, and behavioral framings of, and explanations for, Pleistocene hominin behavior.

      Responses to specific critiques in the eLife reviews centered on this manuscript:

      Reviewer #1:

      All inferences regarding hominin behaviour and biology of Homo naledi, discussed by Fuentes and colleagues, are wholly dependent on the evidence presented in the archaeology preprints being true.

      Reviewer #2:

      Fuentes et al. provide a detailed and thoughtful commentary on the evolutionary and behavioral implications of complex behaviors associated with a small-brained hominin, Homo naledi…..While the review by Fuentes et al. highlights important assumptions about the relationship between hominin brain size, cognition, and complex behaviors, the evidence presented by Berger et al. 2023a,b does not support the claim that Homo naledi engaged in burial practices or symbolic expression through wall engravings.

      Reviewer #3:

      This paper presents the cognitive implications of claims made in two accompanying papers (Berger et al. 2023a, 2023b) about the creation of rock engravings, the intentional disposal of the dead, and fire use by Homo naledi. The importance of the paper, therefore, relies on the validity of the claims for the presence of socio-culturally complex and cognitively demanding behaviors that are presented in the associated papers. Given the archaeological, hominin, and taphonomic analyses in the associated papers are not adequate to enable the exceptional claims for nalediassociated complex behaviors, the inferences made in this paper are currently inadequate and incomplete.

      We have clarified in the manuscript text and above why we argue that the inferences we are setting as core to our argument do not require cultural burial or engravings by H. naledi be demonstrated. However, we do clarify in the revision that the current evidence for the transport of dead conspecifics into difficult to reach areas deep into the cave system by naledi is well supported by the archeological and paleoanthropological data currently available (e.g. Berger et al. 2024, Elliott et al. 2021, Robbins et al. 2021, Hawks et al. 2017) and that this is the basis for our argument.

      Reviewer #3:

      The claimed behaviors are widely recognized as complex and even quintessential to Homo sapiens. The implications of their unequivocal association with such a small-brained Middle Pleistocene hominin are thus far reaching. Accordingly, the main thrust of the paper is to highlight that greater cognition and complex socio-cultural behaviors were not necessarily associated with a positively encephalized brain. This argument begs the obvious question of whether absolute brain size and/or encephalization quotient (i.e., the actual brain volume of a given species relative the expected brain size for a species of the same average body size) can measure cognitive capacity and the complexity of socio-cultural behaviors among late Middle Pleistocene hominins….Claims for a positive correlation between absolute and/or relative brain size and cognitive ability are not common in discussions surrounding the evolution of Middle- and Late Pleistocene hominin behavior.

      We assert that claims for a positive correlation between absolute and/or relative brain size and cognitive ability are central—either explicitly or implicitly—in most arguments concerning cognitively complex behavior in the genus Homo. This is especially true for ideas about success of Pleistocene Homo relative to other hominins. We clarify this in the text offering various citations in support of this position (e.g. Meneganzin and Currie 2022, Galway-Witham, Cole, and Stringer 2019, DeCasien, Barton, and Higham 2022, Dunbar 2003, Kissel and Fuentes 2021, Muthukrishna et al. 2018, Püschelet al. 2021, Tattersall 2023).

      Reviewer #3:

      Currently, the bulk of the evidence for early complex technological and social behaviors derives from multiple sites across South Africa and postdates the emergence of H. sapiens by more than 100,000 years. Such lag in the expression of complex technologies and behaviors within our species renders the brain size-implies-cognitive capacity argument moot. Instead, a rich body of research over the past several decades has focused on aspects related to sociocultural, environmental, and even the wiring of the brain in order to understand factors underlying the expression of the capacity for greater behavioral variability. In this regard, even if the claimed evidence for complex behaviors among the small-brained naledi populations proves valid, the exploration of the specific/potential socio-cultural, neuro-structural, ecological and other factors will be more informative than the emphasis on absolute/relative brain size.”

      While not at all denying the critically important and rich record of cultural complexity in the Late Pleistocene South African archeological record, we disagree that “the bulk of the evidence for early complex technological and social behaviors derives from multiple sites across South Africa and postdates the emergence of H. sapiens by more than 100,000 years”. We offer a range of examples and citations in support of our assertion in the text (esp. in pp12-14 and Table 1 and Figure 1)

      We lay out the currently available data for such cultural complexity in Figure 1 with extensive documentation and citations for each case in the Supplementary material (both aa a table and a bibliography). We wholly agree with Reviewer 3 that “the exploration of the specific/potential socio-cultural, neuro-structural, ecological and other factors will be more informative than the emphasis on absolute/relative brain size” and are attempting to do just that in the manuscript.

      Reviewer #3:

      The paper presents as supporting evidence previous claims for the appearance of similar complex behaviors predating the emergence of our species, H. sapiens, although it does acknowledge their controversial nature. It then uses the current claims for the association of such behaviors with H. naledi as decisive. Given the inadequate analyses in the accompanying papers and the lack of evidence for stone tools in the naledi sites, the present claims for the expression of culturally and symbolically mediated behaviors by this small-brained hominin must be adequately established.

      We respond to the first part of this critique above (regarding the other papers). But again, we emphasize that although we do feel that the argument for cultural burial is supported (see Berger et al. 2024 preprint) what we are arguing for in this paper is that the agential and intentional transportation of dead (mortuary behavior) is the sufficient factor undergirding our proposal. We do not agree that absence of recognizable stone tools at the site negates our proposal and assert that the context provided by Figure 1, and the data in the table for figure 1 in the SOM, in concert with the supported mortuary behavior (transport and emplacement of the dead) offer sufficient support for the argument we make in the text regarding brain size and the role of emotional cognition and complex behavior in the Pleistocene hominin niche and H. naledi’s participation in it.

    1. eLife Assessment

      This paper presents the important finding that BNIP3/NIX, a mitophagy receptor, and its binding to ATG18 are required for mitophagy during muscle cell reorganization in Drosophila. Although the involvement of the BNIP3-ATG18/WIPI axis in mitophagy induction has been reported in mammalian cell culture systems, this study provides the first compelling evidence for this pathway in vivo in animals. The physiological significance of this BNIP3-dependent mitophagy will require further investigation.

    2. Reviewer #1 (Public review):

      During early Drosophila pupal development, a subset of larval abdominal muscles (DIOMs) is remodelled using an autophagy dependent mechanism.

      To better understand this not very well studied process, the authors have generated a systematic transcriptomics time course using dissected larval abdominal muscles of various stages from wild type and autophagy deficient mutants. The authors have further identified a function for BNIP3 for executing mitophagy during DIOM remodelling.

      Strengths:

      The paper does provide a detailed mRNA time course resource for the DIOM remodelling.

      The paper does find an interesting BNIP3 loss of function phenotype, a block of mitophagy during muscle remodelling and hence identifies a specific linker between mitochondria and the core autophagy

      machinery. This adds to the mechanism how mitochondria are degraded.

      Sophisticated fly genetics demonstrates that the larval muscle mitochondria are, to a large extend, degraded by autophagy during DIOM remodelling.

      Quantitative electron microscopy data show that BNIP3 is required for initiating mito-phagosomes. It needs either its LIR and MER domain for function.

      Weakness:

      Mitophagy during DIOM remodelling is not novel (earlier papers from Fujita et al.).

      Other weaknesses have been eliminated during the revision.

    3. Reviewer #2 (Public review):

      Summary:

      Autophagy (macroautophagy) is known to be essential for muscle function in flies and mammals. To date, many mitophagy (selective mitochondrial autophagy) receptors have been identified in mammals and other species. While loss of mitophagy receptors has been shown to impair mitochondrial degradation (e.g., OPTN and NDP52 in Parkin-mediated mitophagy and NIX and BNIP3 in hypoxia-induced mitophagy) at the level of cultured cells, it remains unclear, especially under physiological conditions in vivo. In this study, the authors revealed that one of the receptors BNIP3 plays a critical role in mitochondrial degradation during muscle remodeling in vivo.

      Overall, the manuscript provides solid evidence that BNIP3 is involved in mitophagy during muscle remodeling with in vivo analyses performed. In particular, all experiments in this study are well designed. The text is well written and the figures are very clear.

      Strengths:

      (1) In each experiment, appropriate positive and negative controls are used to indicate what is responsible for the phenomenon observed by the authors: e.g. FIP200, Atg18, Stx17 siRNAs during DIOM remodeling in Fig2 and Full, del-LIR, del-MER in Fig5.

      (2) Although the transcriptional dynamics of DIOM remodeling during metamorphosis is autophagy-independent, the transcriptome data obtained by the authors would be valuable for future studies.

      (3) In addition to the simple observation that loss of BNIP3 causes mitochondrial accumulation, the authors further observed that, by combining siRNA against STX17, which is required for fusion of autophagosomes with lysosomes, BNIP3 KO abolishes mitophagosome formation, which will provide solid evidence for BNIP3-mediated mitophagy. Furthermore, using a Gal80 temperature-sensitive approach, the authors showed that mitochondria derived from larval muscle, but not those synthesized during hypertrophy, remain in BNIP3 KO fly muscles.

      Weaknesses:

      (1) Because BNIP3 KO causes mitochondrial accumulation, it is expected that adult flies will have some physiological defects, but this has not been fully analyzed or sufficiently mentioned in the manuscript.

      (2) In Fig 5, the authors showed that BNIP3 binds to Atg18a by co-IP, but no data are provided on whether MER-mut or del-MER attenuates the affinity for Atg18a.

      Comments on revisions: The authors answered all the reviewer's concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Fujita et al build on their earlier, 2017 eLife paper that showed the role of autophagy in the developmental remodeling of a group of muscles (DIOM) in the abdomen of Drosophila. Most larval muscles undergo histolysis during metamorphosis, while DIOMs are programmed to regrow after initial atrophy to give rise to temporary adult muscles, which survive for only 1 day after eclosion of the adult flies (J Neurosci. 1990;10:403-1. and BMC Dev Biol 16, 12, 2016). The authors carry out transcriptomics profiling of these muscles during metamorphosis, which are in agreement with the atrophy and regrowth phases of these muscles. Expression of the known mitophagy receptor BNIP3/NIX is high during atrophy, so the authors start to delve more into the role of this protein/mitophagy in their model. BNIP3 KO indeed impairs mitophagy and muscle atrophy, which they convincingly demonstrate via nice microscopy images. They also show that the already known Atg8a-binding LIR and Atg18a-binding MER motifs of human NIX are conserved in the Drosophila protein, although the LIR turned out to be less critical for in vivo protein function than the MER motif.

      Strengths:

      Established methodology, convincing data, in vivo model

      Weaknesses:

      Significance for Drosophila physiology and for human muscles remains to be established

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      During early Drosophila pupal development, a subset of larval abdominal muscles (DIOMs) is remodelled using an autophagy-dependent mechanism. 

      To better understand this not very well studied process, the authors have generated a transcriptomics time course using dissected abdominal muscles of various stages from wild-type and autophagy-deficient mutants. The authors have further identified a function for BNIP3 in muscle mitophagy using this system. 

      Strengths: 

      (1) The paper does provide a detailed mRNA time course resource for DIOM remodeling. 

      (2) The paper does find an interesting BNIP3 loss of function phenotype, a block of mitophagy during muscle remodeling, and hence identifies a specific linker between mitochondria and the core autophagy machinery. This adds to the mechanism of how mitochondria are degraded. 

      (3) Sophisticated fly genetics demonstrates that the larval muscle mitochondria are, to a large extent, degraded by autophagy during DIOM remodeling. 

      Weaknesses: 

      (1) Mitophagy during DIOM remodeling is not novel (earlier papers from Fujita et al.). 

      (2) The transcriptomics time course data are not well connected to the autophagy part. Both could be separated into 2 independent manuscripts. 

      (3) The muscle phenotypes need better quantifications, both for the EM and light microscopy data in various figures. 

      (4) The transcriptomics data are hard to browse in the provided PDF format. 

      Thank you for reviewing our manuscript and for your feedback. While we understand and appreciate the suggestion to divide the manuscript into two separate studies, we believe that presenting the work as a single manuscript is more appropriate. This is because the time-course RNA-seq of DIOMs provides critical insight into BNIP3-mediated mitophagy during DIOM remodeling, which ties together the two components of our study. In response to Reviewer #1’s recommendations, we have quantified data from both EM and confocal images, and we have revised the RNA counts table in Supplementary File 1 accordingly. Please see our detailed responses and revisions on the following pages.

      Reviewer #2 (Public review): 

      Summary: 

      Autophagy (macroautophagy) is known to be essential for muscle function in flies and mammals. To date, many mitophagy (selective mitochondrial autophagy) receptors have been identified in mammals and other species. While the loss of mitophagy receptors has been shown to impair mitochondrial degradation (e.g., OPTN and NDP52 in Parkin-mediated mitophagy and NIX and BNIP3 in hypoxia-induced mitophagy) at the level of cultured cells, it remains unclear, especially under physiological conditions in vivo. In this study, the authors revealed that one of the receptors BNIP3 plays a critical role in mitochondrial degradation during muscle remodeling in vivo. 

      Overall, the manuscript provides solid evidence that BNIP3 is involved in mitophagy during muscle remodeling with in vivo analyses performed. In particular, all experiments in this study are well-designed. The text is well written and the figures are very clear. 

      Strengths: 

      (1) In each experiment, appropriate positive and negative controls are used to indicate what is responsible for the phenomenon observed by the authors: e.g. FIP200, Atg18, Stx17 siRNAs during DIOM remodeling in Figure 2 and Full, del-LIR, del-MER in Figure 5. 

      (2) Although the transcriptional dynamics of DIOM remodeling during metamorphosis is autophagy-independent, the transcriptome data obtained by the authors would be valuable for future studies. 

      (3) In addition to the simple observation that loss of BNIP3 causes mitochondrial accumulation, the authors further observed that, by combining siRNA against STX17, which is required for fusion of autophagosomes with lysosomes, BNIP3 KO abolishes mitophagosome formation, which will provide solid evidence for BNIP3-mediated mitophagy. Furthermore, using a Gal80 temperature-sensitive approach, the authors showed that mitochondria derived from larval muscle, but not those synthesized during hypertrophy, remain in BNIP3 KO fly muscles. 

      Weaknesses: 

      (1) Because BNIP3 KO causes mitochondrial accumulation, it is expected that adult flies will have some physiological defects, but this has not been fully analyzed or sufficiently mentioned in the manuscript. 

      (2) In Figure 5, the authors showed that BNIP3 binds to Atg18a by co-IP, but no data are provided on whether MER-mut or del-MER attenuates the affinity for Atg18a. 

      Thank you for pointing out the critical issues in the previous version of our manuscript. In this revision, we have conducted several physiological assays using BNIP3 KO flies, as well as co-IP experiments to confirm that the DMER weakens the interaction with Atg18a. We have also addressed all the recommendations provided. Please see our detailed point-by-point responses below.

      Reviewer #3 (Public review): 

      Summary: 

      Fujita et al build on their earlier, 2017 eLife paper that showed the role of autophagy in the developmental remodeling of a group of muscles (DIOM) in the abdomen of Drosophila. Most larval muscles undergo histolysis during metamorphosis, while DIOMs are programmed to regrow after initial atrophy to give rise to temporary adult muscles, which survive for only 1 day after eclosion of the adult flies (J Neurosci. 1990;10:403-1. and BMC Dev Biol 16, 12, 2016). The authors carry out transcriptomics profiling of these muscles during metamorphosis, which is in agreement with the atrophy and regrowth phases of these muscles. Expression of the known mitophagy receptor BNIP3/NIX is high during atrophy, so the authors have started to delve more into the role of this protein/mitophagy in their model. BNIP3 KO indeed impairs mitophagy and muscle atrophy, which they convincingly demonstrate via nice microscopy images. They also show that the already known Atg8a-binding LIR and Atg18a-binding MER motifs of human NIX are conserved in the Drosophila protein, although the LIR turned out to be less critical for in vivo protein function than the MER motif. 

      Strengths: 

      Established methodology, convincing data, in vivo model. 

      Weaknesses: 

      The significance for Drosophila physiology and for human muscles remains to be established. 

      Thank you for reviewing our manuscript. In response to the comment, we have performed lifespan, adult locomotion, and eclosion assays in BNIP3 KO flies. Although we observed substantial mitochondrial accumulation in the DIOMs of BNIP3 KO flies, no significant differences were detected in these physiological assays under our experimental conditions. We plan to further investigate the physiological role of BNIP3 in flies and extend our studies to human muscle in future work. Please see our detailed responses below.

      Reviewer #1 (Recommendations for the authors): 

      Major points: 

      (1) Unfortunately, the RNA counts file table in Supplementary file 1 is a PDF and not an Excel sheet. The labelling makes it unclear from which time points and genotype the listed values on the 650-page files are. 

      We have now corrected the labelling of time points and genotypes in Supplementary File 1 to improve clarity and have provided the updated Excel file.

      Looking at these counts it seems that sarcomere genes (Mhc, bt, sls, wupA, TpnC ) are 10x to 100x lower in sample "ctrl_1" compared to the three other control samples. Which time point is that? It is essential to have access to the full dataset, wild type and autophagy-deficient, to be able to assess the quality of the RNA SEQ data. These need to be deposited in a public database or to be provided in a useful format. 

      Thank you for pointing that out. In the previous version, “Ctrl_1” referred to the Control sample at 1 day APF, when atrophy occurs. We have corrected the labeling in Supplementary File 1 accordingly and have deposited the RNA-seq data to GEO, where it is now publicly available (GSE293359).

      (2) Which statistical test was used to assess the differences in muscle volumes in Figure 2E? I was not able to find a table with the measured data.

      In Figure 2E, we used the Mann-Whitney test for statistical analysis. The raw data used for quantification have also been provided (Supplementary File 2).

      The shown volumes do not correlate with the scheme shown in Figure 2A, in particular at the larval stage the muscle seems much larger.

      We have revised the schematic models of muscle cells in Figures 1C and 2A in accordance with the reviewer’s suggestion.

      (3) It is important to remember that adult Drosophila muscles are not homogenous, at least not the adult leg and abdominal muscles, as they are organised as tubes with myofibrils closer to the surface, and nuclei as well as mitochondria largely in the centre (see PMID 33828099). Hence, only showing a single plane in the muscle images can be very misleading. The authors should at least provide virtual XZ-cross section views in Figure 3G to ensure that similar muscle planes are compared. This applies to the interpretation of both, the mitochondria and the myofibril phenotypes in wildtype vs BNIP3-KO. 

      Thank you for your comment. As suggested, we have added XZ-cross-sectional views in Figure 3G. The XY plane corresponds to a central section of the Z-stack, as indicated in the figure.

      (4) The EM images are nice, however only 2 of the 4 conditions shown were quantified. As the section plane can be misleading, at least several planes should be analysed also for wild type and BNIP3-KO, and not only for stx17 RNAi and the double mutant. 

      In response to the comment, we quantified the TEM images of wild-type and BNIP3-KO DIOMs and added the resulting graph to Figure 4C. The corresponding raw data have also been provided (Supplementary File 2).

      (5) How was Figure 5D, 5D' quantified? What corresponds to "regular", "medium", "high"? A statistical test is missing. I would rather conclude that MIR and LIR are redundant as double mutant appears to be stronger than both singles. This is also concluded in some sections of the text, so the authors seem to contradict themselves. Why not measure the mitochondria areas as done in Figure 6A' instead? 

      In the previous version, we manually categorized pooled, blinded images from different genotypes. However, as the reviewer pointed out, this approach was not quantitative. In the revised version, we analyzed the images using ImageJ to quantify the mitochondrial area per cell. Statistical significance was assessed using the Kruskal-Wallis test. Accordingly, we have revised Figure 5D, the method section, and the figure legend.

      (6) Figure 6B data seem to come from a single image per genotype only. At least 3 or 4 animals should be measured and the values reported. 

      We analyzed Pearson’s correlation coefficients (R values) from at least five images per genotype and performed statistical analysis. The resulting quantification is presented in Figure 6B’, and the corresponding text has been revised accordingly.

      (7) As BNIP3 mutants are viable, it would be interesting to report if they can fly and how long they live. 

      Additional data on adult lifespan, climbing ability, and elapsed time for eclosion in BNIP3 KO flies have been included as supplemental information (Figure 3-figure supplement 2). No significant differences were observed in those assays under our experimental conditions.

      (8) The transcriptomics data are not well linked to the autophagy mechanism. In particular, the mutant transcriptomics data are confusing, as the abstract seems to suggest that blocking autophagy impacts transcriptomics, which is not (strongly) the case. I would at least re-write this part, as it is currently misleading and sparks wrong expectations to the reader. Also throughout the text, the authors need to make clear if there are transcriptomic changes or not and if there are, how these are linked to autophagy. 

      In the abstract, we described the findings as “transcriptional dynamics independent of autophagy” (line 49) because the loss of autophagy had only a minimal effect on transcriptional changes. This conclusion is supported by the data presented in our manuscript. In the result section, we state: “In contrast to our prediction, the knockdown of Atg18a, FIP200, or Stx17 only had a slight impact on transcriptomic dynamics in DIOM remodeling (Fig. 2C), with only minor changes detected (Fig. 2-figure supplement 2G)” (lines 199-201). In the Discussion section, we further note: “The transcriptional dynamics associated with DIOM remodeling are largely independent of autophagy (Fig.2). Instead, our RNA-seq data suggest that it is regulated primarily by ecdysone signaling, with minimal influence from autophagy inhibition” (lines 326-328).

      (9) No table with the measured data is provided. 

      We have provided the raw data files corresponding to all quantified results as Supplementary File 2.

      Minor points: 

      (1) To my knowledge, it is standard to indicate the time after puparium formation in hours, instead of days, (e.g. 24h, 48h etc.). 

      Thank you for the comments. In our previous publications on DIOM remodeling during metamorphosis (PMID: 28063257 and 33077556), we used days rather than hours to indicate developmental time points. To maintain consistency across our studies, we have chosen to continue using days in the present manuscript.

      (2) "Myofibrils typically form beneath the sarcolemma (Mao et al., 2022; Sanger et al., 2010); therefore, when mitochondria accumulate, myofibrils are restricted to the cell periphery." This is quite a general statement that does not always hold, in particular not in Drosophila flight muscles and likely also not in abdominal muscles (see PMIDs 29846170, 28174246). 

      Thank you for pointing that out. We rewrote the sentence as follows: In the absence of BNIP3, mitochondria derived from the larval muscle accumulate and cluster in the cell center, physically obstructing myofibril formation during hypertrophy and restricting myofibrils to the cell periphery (Fig. 6E) (lines 392-394).

      Reviewer #2 (Recommendations for the authors): 

      Suggestions for improved or additional experiments, data or analyses. 

      The authors should test, by a co-IP experiment, whether BNIP3 mutants lose the interaction with HA-Atg18a. 

      As requested, we tested the effect of MER deletion on the interaction between BNIP3 and Atg18a in co-IP experiment. As shown in the new Fig. 5C, the deletion of MER weakened the interaction. This result was confirmed in three independent experiments. Its corresponding text has also been revised as follows: “We confirmed that HA-tagged Drosophila Atg18a co-immunoprecipitated with GFP-tagged full-length Drosophila BNIP3, and that this interaction was attenuated by the deletion of the MER (residues 42-53) (Fig. 5C)” (lines 270-273).

      Minor corrections to the text and figures 

      (1) In the list of authors, Kawaguchi Kohei could be Kohei Kawaguchi_._ 

      Thank you very much. It has been corrected.

      (2) In Fig3D, other receptors (Zonda, CG12511, Key, Ref2P) should be mentioned briefly. 

      Thank you for the suggestion. We have revised the sentences as follows: “The time course RNA-seq data (Fig. 1 and 2) indicated that, among the known mitophagy regulators, only BNIP3 was robustly expressed in 1 d APF DIOMs. In contrast, Zonda, CG12511, Pink1, Park, Key, Ref(2)P, and IKKe—the Drosophila orthologs of FKBP8, FUNDC1, PINK1, Parkin, Optineurin, p62, and TBK1, respectively—showed little or undetectable expression at this stage (Fig. 3D).” (lines 230-234).

      Reviewer #3 (Recommendations for the authors): 

      Remarks: 

      (1) What is the consequence of impaired muscle remodeling on the organismal level? Is the eclosion of adult flies impaired? One could think of assays for this, such as quantifying failed eclosions and/or video microscopy of the eclosion process. Is muscle function impaired? One could measure the contractile force of isolated fibers during electrical stimulation as well, etc. I believe that showing the physiological importance of muscle remodeling would be the biggest advantage that could arise from using a complete animal model.

      We appreciate the comments. We have added data on adult lifespan, climbing ability, and the elapsed time for eclosion in BNIP3 KO flies as supplemental information (Figures 3-figure supplement 2). In BNIP3 KO DIOMs, despite the massive accumulation of mitochondria, an organized peripheral myofibril layer with contractile function is retained. However, we have not measured the contractile force of isolated muscle cells due to technical limitations. We plan to address this in future studies.

      A related note is that I missed the proper discussion of the function and fate of these short-lived adult muscles (please see references in my summary). 

      We have added a sentence regarding the function and fate of DIOMs in the introduction (lines 80-82) as follows: “The remodeled adult DIOMs function during eclosion, persist for approximately 12 hours, and are subsequently eliminated via programmed cell death (Kimura and Truman, 1990; J Neurosci. 1990;10:403-1)”.

      (2) I don't think that "data not shown" should be used these days, when supplemental data allow the inclusion of not-so-critical results. 

      We have added the data as Figure 5-figure supplement 2. As shown in the figure, overexpression of GFP-BNIP3 in 3IL BWMs did not induce the formation of tdTomato-positive autolysosomes, which are abundantly accumulated in DIOMs at 1 and 2 d APF.

      (3) The term "naked mitochondria" does not sound scientific enough to this reviewer. I suggest "cytosolic mitochondria" or "unengulfed mitochondria". 

      In accordance with the reviewer’s suggestion, we have replaced “naked mitochondria” with “unengulfed mitochondria” (lines 251 and 670).

    1. eLife Assessment

      Understanding how neural circuits mediate decision-making is a core problem in neuroscience. In this interesting and important work, the authors use detailed behavioral analysis and rigorous quantitative modeling to convincingly support the idea that the nematode C. elegans uses an "accept-reject" behavioral strategy, based on learned features of its environment, to make decisions upon encountering food patches. The work expands our understanding of the behavioral repertoire of this species, providing a foundation for future mechanistic studies in this powerful model system.

    2. Reviewer #1 (Public review):

      Summary:

      This work uses a novel, ethologically relevant behavioral task to explore decision-making paradigms in C. elegans foraging behavior. By rigorously quantifying multiple features of animal behavior as they navigate in a patch food environment, the authors provide strong evidence that worms exhibit one of three qualitatively distinct behavioral responses upon encountering a patch: (1) "search", in which the encountered patch is below the detection threshold; (2) "sample", in which animals detect a patch encounter and reduce their motor speed, but do not stay to exploit the resource and are therefore considered to have "rejected" it; and (3) "exploit", in which animals "accept" the patch and exploit the resource for tens of minutes. Interestingly, the probability of these outcomes varies with the density of the patch as well as the prior experience of the animal. Together, these experiments provide an interesting new framework for understanding the ability of the C. elegans nervous system to use sensory information and internal state to implement behavioral state decisions.

      Strengths:

      The work uses a novel, neuroethologically-inspired approach to studying foraging behavior

      The studies are carried out with an exceptional level of quantitative rigor and attention to detail

      Powerful quantitative modeling approaches including GLMs are used to study the behavioral states that worms enter upon encountering food, and the parameters that govern the decision about which state to enter

      The work provides strong evidence that C. elegans can make 'accept-reject' decisions upon encountering a food resource

      Accept-reject decisions depend on the quality of the food resource encountered as well as on internally represented features that provide measurements of multiple dimensions of internal state, including feeding status and time.

    3. Reviewer #2 (Public review):

      This study provides an experimental and computational framework to examine and understand how C. elegans make decisions while foraging environments with patches of food. The authors show that C. elegans reject or accept food patches depending on a number of internal and external factors.

      The key novelty of this paper is the explicit demonstration of behavior analysis and quantitative modeling to elucidate decision-making processes. In particular, the description of the exploring vs. exploiting phases, and sensing vs. non-sensing categories of foraging behavior based on the clustering of behavioral states defined in a multi-dimensional behavior-metrics space, and the implementation of a generalized linear model (GLM) whose parameters can provide quantitative biological interpretations.

      The work builds on the literature of C. elegans foraging by adding the reject/accept framework.

    4. Reviewer #3 (Public review):

      Summary:

      In this study by Haley et al, the authors investigated explore-exploit foraging using C. elegans as a model system. Through an elegant set of patchy environment assays, the authors built a GLM based on past experience that predicts whether an animal will decide to stay on a patch to feed and exploit that resource, instead of choosing to leave and explore other patches.

      Strengths:

      I really enjoyed reading this paper. The experiments are simple and elegant, and address fundamental questions of foraging theory in a well-defined system. The experimental design is thoroughly vetted, and the authors provide a considerable volume of data to prove their points.

      Weakness:

      History-dependence of the GLM. The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seem odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      It seems more likely the worm simply has some memory of chemosensation and relative satiety, both of which increase on patches, and decrease while off of patches. The magnitudes are likely a function of patch density. That being said, I leave it up to the reader to decide how best to interpret the data.

      Impact:

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

    5. Author Response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This work uses a novel, ethologically relevant behavioral task to explore decision-making paradigms in C. elegans foraging behavior. By rigorously quantifying multiple features of animal behavior as they navigate in a patch food environment, the authors provide strong evidence that worms exhibit one of three qualitatively distinct behavioral responses upon encountering a patch: (1) "search", in which the encountered patch is below the detection threshold; (2) "sample", in which animals detect a patch encounter and reduce their motor speed, but do not stay to exploit the resource and are therefore considered to have "rejected" it; and (3) "exploit", in which animals "accept" the patch and exploit the resource for tens of minutes. Interestingly, the probability of these outcomes varies with the density of the patch as well as the prior experience of the animal. Together, these experiments provide an interesting new framework for understanding the ability of the C. elegans nervous system to use sensory information and internal state to implement behavioral state decisions.

      Strengths:

      The work uses a novel, neuroethologically-inspired approach to studying foraging behavior

      The studies are carried out with an exceptional level of quantitative rigor and attention to detail

      Powerful quantitative modeling approaches including GLMs are used to study the behavioral states that worms enter upon encountering food, and the parameters that govern the decision about which state to enter

      The work provides strong evidence that C. elegans can make 'accept-reject' decisions upon encountering a food resource

      Accept-reject decisions depend on the quality of the food resource encountered as well as on internally represented features that provide measurements of multiple dimensions of internal state, including feeding status and time

      Reviewer #2 (Public review):

      This study provides an experimental and computational framework to examine and understand how C. elegans make decisions while foraging environments with patches of food. The authors show that C. elegans reject or accept food patches depending on a number of internal and external factors.

      The key novelty of this paper is the explicit demonstration of behavior analysis and quantitative modeling to elucidate decision-making processes. In particular, the description of the exploring vs. exploiting phases, and sensing vs. non-sensing categories of foraging behavior based on the clustering of behavioral states defined in a multi-dimensional behavior-metrics space, and the implementation of a generalized linear model (GLM) whose parameters can provide quantitative biological interpretations.

      The work builds on the literature of C. elegans foraging by adding the reject/accept framework.

      Reviewer #3 (Public review):

      Summary:

      In this study by Haley et al, the authors investigated explore-exploit foraging using C. elegans as a model system. Through an elegant set of patchy environment assays, the authors built a GLM based on past experience that predicts whether an animal will decide to stay on a patch to feed and exploit that resource, instead of choosing to leave and explore other patches.

      Strengths:

      I really enjoyed reading this paper. The experiments are simple and elegant, and address fundamental questions of foraging theory in a well-defined system. The experimental design is thoroughly vetted, and the authors provide a considerable volume of data to prove their points. My only criticisms have to do with the data interpretation, which I think are easily addressable.

      Weaknesses:

      History-dependence of the GLM

      The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seem odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      It seems more likely that the worm simply has some memory of chemosensation and relative satiety, both of which increase on patches and decrease while off of patches. The magnitudes are likely a function of patch density. That being said, I leave it up to the reader to decide how best to interpret the data.

      Model design: We agree with the reviewer that past experience is not likely to be discretized into the exact parameters of our model. We have added to our manuscript to further clarify this point (lines 645-647). Investigating the mechanisms behind this behavior is beyond the scope of this project but is certainly an exciting trajectory for future C. elegans research.

      osm-6

      The argument is that osm-6 animals can't sense food very well, so when they sense it, they enter the exploitation state by default. That is what they appear to do, but why? Clearly they are sensing the food in some other way, correct? Are ciliated neurons the only way worms can sense food? Don't they also actively pump on food, and can therefore sense the food entering their pharynx? I think you could provide further insight by commenting on this. Perhaps your decision model is dependent on comparing environmental sensing with pharyngeal sensing? Food intake certainly influences their decision, no? Perhaps food intake triggers exploitation behavior, which can be over-run by chemo/mechanosensory information?

      osm-6 behavior: We thank the reviewer for pointing out the need to further elaborate on a mechanistic hypothesis to explain the behavior of osm-6 sensory mutants. We agree with the reviewer’s speculation that post-ingestive and other non-ciliary sensory cues likely drive detection of food. We have added additional commentary to our manuscript to state this (lines 529-538).

      Impact

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

      Reviewer #2 (Recommendations for the authors):

      The authors have addressed most of my concerns.

      Reviewer #3 (Recommendations for the authors):

      The authors provide a considerable amount of processed data (great, thank you!), but it would be even better if they provided the raw data of the worm coordinates, and when and where these coordinates overlapped with patches. This is the raw data that was ultimately used for all the quantifications in the paper, and would be incredibly useful to readers who are interested in modeling the data themselves.

      This should not be prohibitive.

      Data Availability: We thank the reviewer for pointing out this need. We are uploading all processed data (e.g. worm coordinates relative to the arena and patches) to a curated data storage server. We have updated our data availability statement to state this (lines 684-688).

      Search vs. sample & sensing vs. non-sensing.

      The different definitions of behaviors in Figures 2H-K are a bit confusing. I think the confusion stems in part from the changing terms and color associations in Figures 2 H-K. Essentially the explore density in Figure 2 H is split into two densities based on the two densities (sensing vs. non-responding) observed in Figure 2I. In turn, the sensing density in Figure 2I is split into two densities (explore vs exploit) based on the two densities observed in Figure 2 H. But the way the figures are colored, yellow means search (Figure 2H) and non-responding (Figure 2I), green means exploit (Figure 2H) which includes sensing and non-responding, but also exclusively sensing (Figure 2I), and blue consistently means exploit in both figures. It might help to use two different color codes for Figures 2H and 2I, and then in 2J you define search as explore AND non-responding, sample as explore AND sensing, and exploit as exploit.

      Color schema: While we understand the confusion, we believe that introducing additional colors may also present some misunderstandings. We have decided to leave the figure as it is.

    1. eLife Assessment

      This is a well-written study that presents a solid genetic screen to identify regulators of adipose morphology and remodeling in zebrafish. The authors generated a rigorous screening platform based on live, whole animal imaging and statistical methods that revealed both novel and known genes critical for adipose regulation. This work is valuable because it provides several candidate genes relevant to metabolic health and a quantitative screening pipeline that will be beneficial for future studies. A limitation of the study is that it precludes a definitive distinction between developmental and remodeling effects.

    2. Joint Public Review:

      In this manuscript, Wafer and Tandon et al. present a thoughtful and well-designed genetic screen for regulators of adipose remodeling using zebrafish as a model system. The authors cross-referenced several human adipocyte-related transcriptomic and genetic association datasets to identify candidate genes, which they then tested in zebrafish. Importantly, the authors devised an unbiased microscopy-based screening platform to document quantitative adipose phenotypes with whole animal imaging, while also employing rigorous statistical methods. From their screen, the authors identified 6 genes that resulted in robust adipose phenotypes out of a total of 25 that were tested. Overall, this work will be a useful resource for the field because of both the genes identified and the quantitative, rigorous screening pipeline. However, there are limitations that preclude a definitive distinction between developmental and remodeling effects that should be acknowledged and discussed, or addressed with new experiments.

      Strengths:

      (1) This work combines multiple omic datasets to identify candidate genes that informed a CRISPR-based screen to identify genes underlying adipose tissue development and adaptation. This approach offers a new avenue to improve our understanding and testing of new genetic mechanisms underlying the development of obesity.

      (2) Using a clever screening approach, this study identifies new genes that are associated with adipose tissue lipid droplet size change. Importantly, the study provides further validation using a stable CRISPR line to show the phenotype in basal and high-fat diet conditions.

      (3) The experiments are well-designed and rigorous. Sample sizes are large. Statistical analyses are highly rigorous, contributing to a high-quality study.

      Weaknesses:

      (1) The image quantification established in Figures 3 and 4 and used in CRISPR screening showed the relationship among zebrafish development, adipose tissue size, and lipid droplet size. Although adipose tissue development patterning is linked with adipose tissue adaptation, as shown by the evidence provided in this paper, it will be more powerful if the imaging method and pipeline were established to directly access the adipose tissue plasticity rather than just the developmental patterning. Furthermore, the authors should perform additional analysis of their existing data to more accurately determine lipid droplet size along the AP axis in response to HFD.

      (2) In the absence of tissue-specific manipulations, definitively establishing the mechanisms underlying the genetic regulation of adipose tissue physiology presents limitations.

    1. eLife Assessment

      This paper makes a valuable contribution to our understanding of the tradeoffs in eye design - specifically between improvements in optics and in photoreceptor performance. The authors successfully build a formal theory that enables comparisons across a wide range of species and eye types. One notable example is that how space should be allocated to optics and photoreceptors depends on eye type - with particularly notable differences between compound and simple eyes. The framework introduced to compare different design properties is convincing and provides a nice example of how to study tradeoffs in seemingly disparate design properties.

    2. Reviewer #1 (Public review):

      Summary:

      Two important factors in visual performance are the resolving power of the lens and the signal-to-noise ratio of the photoreceptors. These both compete for space: a larger lens has improved resolving power over a smaller one, and longer photoreceptors capture more photons and hence generate responses with lower noise. The current paper explores the tradeoff of these two factors, asking how space should be allocated to maximize eye performance (measured as encoded information).

      The revisions, to my read, have greatly improved the paper. Most of this was due to setting clear expectations from the start of the paper. Nice work!

    3. Reviewer #2 (Public review):

      Summary:

      In short, the paper presents a theoretical framework that predicts how resources should be optimally distributed between receptors and optics in eyes.

      After revision of an already excellent contribution, the manuscript is now even better. The authors have responded carefully to all reviewer comments.

      Strengths:

      The authors build on the principle of resource allocation within an organism and develop a formal theory for optimal distribution of resources within an eye between the receptor array and the optics. Because the two parts of eyes, receptor arrays and optics, share the same role of providing visual information to the animal it is possible to isolate these from resource allocation in the rest of the animal. This allows for a novel and powerful way of exploring the principles that govern eye design. By clever and thoughtful assumptions/constraints, the authors have built a formal theory of resource allocation between the receptor array and the optics for two major types of compound eye as well as for camera-type eyes. The theory is formalized with variables that are well characterized in a number of different animal eyes, resulting in testable predictions.

      The authors use the theory to explain a number of design features that depend on different optimal distribution of resources between the receptor array and the optics in different types of eye. As an example, they successfully explain why eye regions with different spatial resolution should be built in different ways. They also explain differences between different types of eye, such as long photoreceptors in apposition compound eyes and much shorter receptors in camera type eyes. The predictive power in the theory is impressive.

      To keep the number of parameters at a minimum, the theory was developed for two types of compound eye (neural superposition, and apposition) and for camera-type eyes. It is possible to extend the theory to other types of eye, although it would likely require more variables and assumptions/constraints to the theory. It is thus good to introduce the conceptual ideas without overdoing the applications of the theory.

      The paper extends a previous theory, developed by the senior author, that develops performance surfaces for optimal cost/benefit design of eyes. By combining this with resource allocation between receptors and optics, the theoretical understanding of eye design takes a major leap and provides entirely new sets of predictions and explanations for why eyes are built the way they are.

      The paper is well written and even though the theory development in the Results may be difficult to take in for many biologists, the Discussion very nicely lists all the major predictions under separate headings, and here the text is more tuned for readers that are not entirely comfortable with the formalism of the Results section. I must point out though that the Results section is kept exemplary concise. The figures are excellent and help explain concepts that otherwise may go above the head of many biologists.

    4. Reviewer #3 (Public review):

      Summary:

      This is a proposal for a new theory for the geometry of insect eyes. The novel cost-benefit function combines the cost of the optical portion with the photoreceptor portion of the eye. These quantities are put on the same footing using a specific (normalized) volume measure, plus an energy factor for the photoreceptor compartment. An optimal information transmission rate then specifies each parameter and resource allocation ratio for a variable total cost. The elegant treatment allows for comparison across a wide range of species and eye types. Simple eyes are found to be several times more efficient across a range of eye parameters than neural superposition eyes. Some trends in eye parameters can be explained by optimal allocation of resources between the optics and photoreceptors compartments of the eye.

      Strengths:

      Data from a variety of species roughly align with rough trends in the cost analysis, e.g. as a function of expanding the length of the photoreceptor compartment.

      New data could be added to the framework once collected, and many species can be compared.

      Eyes of different shapes are compared.

      Weaknesses:

      Detailed quantitative conclusions are not possible given the approximations and simplifying assumptions in the models and weak accounting for trends in the data across eye types.

      Comments on revisions:

      I have no additional comments for the authors and appreciate the revisions and corrections implemented - I think those changes have improved the clarity of the manuscript and expanded the potential readership for the paper.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Two important factors in visual performance are the resolving power of the lens and the signal-to-noise ratio of the photoreceptors. These both compete for space: a larger lens has improved resolving power over a smaller one, and longer photoreceptors capture more photons and hence generate responses with lower noise. The current paper explores the tradeoff of these two factors, asking how space should be allocated to maximize eye performance (measured as encoded information).

      Your summary is clear, concise and elegant. The competition is not just for space, it is for space, materials and energy. We  now emphasise that we are considering these three costs in our rewrites of the Abstract and the first paragraph of the Discussion.  

      Strengths:

      The topic of the paper is interesting and not well studied. The approach is clearly described and seems appropriate (with a few exceptions - see weaknesses below). In most cases, the parameter space of the models are well explored and tradeoffs are clear.  

      Weaknesses:

      Light level

      The calculations in the paper assume high light levels (which reduces the number of parameters that need to be considered). The impact of this assumption is not clear. A concern is that the optimization may be quite different at lower light levels. Such a dependence on light level could explain why the model predictions and experiment are not in particularly good agreement. The paper would benefit from exploring this issue.

      Thank you for raising this point. We briefly explained in our original Discussion, under Understanding the adaptive radiation of eyes (Version 1, Iines 756 – 762), how our method can be modified to investigate eyes adapted for lower light levels. We have some thoughts on how eyes might be adapted. In general, transduction rates are increased by increasing D, reducing f, increasing d<sub>rh</sub> and increasing L . In addition, d<sub>rh</sub> is increased to allow for a larger D within the constraint of eye radius/corneal surface area, and to avoid wasteful oversampling (the changes in D, f and d<sub>rh</sub> increase acceptance angle ∆ρ). We suspect that in eyes optimised for the efficient use of space, materials and energy the increases in L will be relatively small, first because  increasing D, reducing f and increasing d<sub>rh</sub> are much more effective at increasing transduction rate than increasing L. Second, increasing sensitivity by reducing f decreases the cost Vo whereas increasing sensitivity by increasing L increases the cost V<sub>ph</sub>. This disadvantage, together with exponential absorption, might explain why L is only 10% - 20% longer in the apposition eyes of nocturnal bees (Somanathan et al, J. comp. Physiol. A195, 571583, 2009). Because this line of argument is speculative and enters new territory, we have not included it in our revised version. We already present a lot of new material for readers to digest, and we agree with referee 2 that “It is possible to extend the theory to other types of eyes, although it would likely require more variables and assumptions/constraints to the theory. It is thus good to introduce the conceptual ideas without overdoing the applications of the theory”. Nonetheless, we take your point that some of the eyes in our data set might be adapted for lower light levels, and we have rewritten the Discussion section, How efficiently do insects allocate resources within their apposition eyes accordingly. On line 827 – 843 we address the assumption that eyes are adapted for full daylight,  and also take the opportunity  to mention two more reasons for increasing the eye parameter p: namely increasing image velocity (Snyder, 1979), and constructing  bright zones that increase the detectability of small targets (van Hateren et al., 1989; Straw et al., 2006).

      Discontinuities

      The discontinuities and non-monotonicity of the optimal parameters plotted in Figure 4 are concerning. Are these a numerical artifact? Some discussion of their origin would be quite helpful.

      Good points, we now address the discontinuities in the Results, where they are first observed (lines 311 - 319) 

      Discrepancies between predictions and experiment

      As the authors clearly describe, experimental measurements of eye parameters differ systematically from those predicted. This makes it difficult to know what to take away from the paper. The qualitative arguments about how resources should be allocated are pretty general, and the full model seems a complex way to arrive at those arguments. Could this reflect a failure of one of the assumptions that the model rests on - e.g. high light levels, or that the cost of space for photoreceptors and optics is similar? Given these discrepancies between model and experiment, it is also hard to evaluate conclusions about the competition between optics and photoreceptors (e.g. at the end of the abstract) and about the importance for evolution (end of introduction).

      Your misgivings boil down to two issues: what use is a model that fails to fit the data, and do we need a complicated model to show something that seems to be intuitively obvious?  Our study is useful because it introduces new approaches, methods, factors and explanations which advance our analysis and understanding of eye design and evolution. Your comments make it clear that we failed to get this message across and we have revised the manuscript accordingly. We have rewritten the Abstract and the first paragraph of the Discussion to emphasise the value of our new measure of cost, specific volume, by including more of its practical advantages. In particular, our use of specific volume 1) opens the door to the morphospace of all eyes of given type and cost. 2) This allows one to construct performance surfaces across morphospace that not only identify optima, but by evaluating the sub-optimal cast light on efficiency and adaptability. 3) Shows that photoreceptor energy costs have a major impact on design and efficiency, and 4) allows us to calculate and compare the capacities and efficiencies of compound eyes and simple eyes using a superior measure of cost. It is also possible that your dissatisfaction was deepened by disappointment. The first sentence of our original Abstract said that the goal of design is to maximize performance, so you might have expected to see that eyes are optimised.  Given that optimization provides cast iron proof that a system is designed to be efficient, and previous studies of coding by fly LMCs (Laughlin, 1981; Srinivasan et al., 1982 & van Hateren 1992) validated Barlow’s Efficient Coding Hypothesis by showing that coding is optimised, your expectation is reasonable. However, our investigation of how the allocation of resources to optics and photoreceptors affects an eye’s performance, efficiency and design does not depend a priori  on finding optima, therefore we have removed the “maximized”. Our revised Abstract now says, “to improve performance”.  

      In short, our study illustrates an old adage in statistics “All models fail to fit, but some are useful”. As is often the case, the way in which our model fails is useful. In the original version of the Results and Discussion, we argued that the allocation of resources is efficient, and identified factors that can, in principle, explain the scattering of data points. Indeed, our modelling identifies two of these deficiencies; a lack of data on species-specific energy usage, and the need for models that quantify the relationship between the quality of the captured image and the behavioural tasks for which an eye might be specialised. Thus, by examining the model’s failings we identify critical factors and pose new questions for future research.  We have rewritten the Discussion section How efficiently do insects allocate resources…. to make these points. We hope that these revisions will convince you that we have established a starting point for definitive studies, invented a vehicle that has travelled far enough to discover new territory, and shown that it can be modified to cope with difficult terrain.

      Turning to the need for a complicated model, because the costs and benefits depend on elementary optics and geometry, we too thought that there ought to be a simple model. However, when we tried to formulate a simple set of equations that approximate the definitive findings of our more complicated model we discovered that this is not as straightforward as we thought.  Many of the parameters in our model interact to determine costs and benefits, and many of these interactions are non-linear (e.g. the volumes of shells in spheres involve quadratic and cubic terms, and information depends on the log of a square root). So, rather than hold back publication of our complicated model, we decided to explain how it works as clearly as we can and demonstrate its value.

      In response to your final comment, “it is hard to evaluate conclusions about the competition between optics and photoreceptors (e.g. at the end of the abstract) and about the importance for evolution (end of introduction)”, we stand by our original argument. There must be competition in an eye of fixed cost, and because competition favours a heavy investment in photoreceptors, both in theory and in practice, it  is a significant factor in eye design. A match between investments in optics and photoreceptors is predicted by theory and observed in fly NS eyes, therefore this is a design principle. As for evolution, no one would deny that it is important to view the adaptive radiation of eyes through a cost-benefit lens. Our lens is the first to view the whole eye, optics and photoreceptor array, and the first to treat the costs of space, materials and energy. Although the view through our lens is a bit fuzzy, it reveals that costs, benefits and trade-offs are important. Thus we have established a promising starting point for a new and more comprehensive cost-benefit approach to understanding eye design and evolution.  As for the involvement of genes, when there are heritable changes in phenotype genes must be involved and if, as we suggest, efficient resource allocation is beneficial, the developmental mechanisms responsible for allocating resources to optics and photoreceptor array will be playing a formative role in eye evolution.

      Reviewer #2 (Public Review):

      Summary:

      In short, the paper presents a theoretical framework that predicts how resources should be optimally distributed between receptors and optics in eyes.

      Strengths:

      The authors build on the principle of resource allocation within an organism and develop a formal theory for optimal distribution of resources within an eye between the receptor array and the optics. Because the two parts of eyes, receptor arrays and optics, share the same role of providing visual information to the animal it is possible to isolate these from resource allocation in the rest of the animal. This allows for a novel and powerful way of exploring the principles that govern eye design. By clever and thoughtful assumptions/constraints, the authors have built a formal theory of resource allocation between the receptor array and the optics for two major types of compound eye as well as for camera-type eyes. The theory is formalized with variables that are well characterized in a number of different animal eyes, resulting in testable predictions.

      The authors use the theory to explain a number of design features that depend on different optimal distribution of resources between the receptor array and the optics in different types of eyes. As an example, they successfully explain why eye regions with different spatial resolution should be built in different ways. They also explain differences between different types of eyes, such as long photoreceptors in apposition compound eyes and much shorter receptors in camera type eyes. The predictive power in the theory is impressive.

      To keep the number of parameters at a minimum, the theory was developed for two types of compound eye (neural superposition, and apposition) and for camera-type eyes. It is possible to extend the theory to other types of eyes, although it would likely require more variables and assumptions/constraints to the theory. It is thus good to introduce the conceptual ideas without overdoing the applications of the theory.

      The paper extends a previous theory, developed by the senior author, that develops performance surfaces for optimal cost/benefit design of eyes. By combining this with resource allocation between receptors and optics, the theoretical understanding of eye design takes a major leap and provides entirely new sets of predictions and explanations for why eyes are built the way they are.

      The paper is well written and even though the theory development in the Results may be difficult to take in for many biologists, the Discussion very nicely lists all the major predictions under separate headings, and here the text is more tuned for readers that are not entirely comfortable with the formalism of the Results section. I must point out though that the Results section is kept exemplary concise. The figures are excellent and help explain concepts that otherwise may go above the head of many biologists.

      We are heartened by your appreciation of our manuscript - it persuaded us not to undertake extensive revisions – thank you.

      Reviewer #3 (Public Review):

      Summary:

      This is a proposal for a new theory for the geometry of insect eyes. The novel costbenefit function combines the cost of the optical portion with the photoreceptor portion of the eye. These quantities are put on the same footing using a specific (normalized) volume measure, plus an energy factor for the photoreceptor compartment. An optimal information transmission rate then specifies each parameter and resource allocation ratio for a variable total cost. The elegant treatment allows for comparison across a wide range of species and eye types. Simple eyes are found to be several times more efficient across a range of eye parameters than neural superposition eyes. Some trends in eye parameters can be explained by optimal allocation of resources between the optics and photoreceptors compartments of the eye.

      Strengths:

      Data from a variety of species roughly align with rough trends in the cost analysis, e.g. as a function of expanding the length of the photoreceptor compartment.

      New data could be added to the framework once collected, and many species can be compared.

      Eyes of different shapes are compared.

      Weaknesses:

      Detailed quantitative conclusions are not possible given the approximations and simplifying assumptions in the models and poor accounting for trends in the data across eye types.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1: Panel E defines the parameters described in panel d. Consider swapping the order of those panels (or defining D and Delta Phi in the figure legend for d). Order follows narrative, eye types then match 

      We think that you are referring to Figure 1. We modified the legend.

      Lines 143-145: How does a different relative cost impact your results?

      Thank you for raising this question. Because our assumption that relative costs are the same is our starting point, and for optics it is not an obvious mistake, we do not raise your question here. We address your question where you next raise it because, for photoreceptors the assumption is obviously wrong.  We now emphasise that our method for accounting for photoreceptor energy costs can be applied to other costs. 

      Lines 187-190: Same as above - how do your results change if this assumption is not accurate?

      We have revised our manuscript to emphasise that we are dealing with the situation in which our initial assumption (costs per unit volume are equal) breaks down. On (lines 203 - 208) we write “ However, this assumption breaks down when we consider specific metabolic rates. To enable and power phototransduction, photoreceptors have an exceptionally high specific metabolic rate (energy consumed per gram, and hence unit volume, per second) (Laughlin et al., 1998; Niven et al., 2007; Pangršič et al., 2005). We account for this extra cost by applying an energy surcharge, S<sub>E</sub>. To equate…. 

      We also revised part of the Discussion section, Specific volume is a useful measure of cost to make it clear that we are able take account for situations in which the costs per unit volume are not equal, and we give our treatment of photoreceptor energy costs as an example of how this is done. On lines 626 - 640 we say  

      Cost estimates can be adjusted for situations in which costs per unit volume are not equal, as illustratedby our treatment of photoreceptor energy consumption.  To support transduction the photoreceptor array has an exceptionally high metabolic rate (Laughlin et al., 1998; Niven et al., 2007; Pangršič et al., 2005). We account forthis higher energy cost by using the animal’s specific metabolic rate (power per unit mass and hence power per unit volume) to convert an array’s power consumption into an equivalent volume (Methods). Photoreceptor ion pumps are the major consumers of energy and the smaller contribution of pigmented glia (Coles, 1989) is included in our calculation of the energy tariff K<sub>E</sub>. (Methods) The higher costs of materials and their turnover in the photoreceptor array can be added the energy tariff K<sub>E</sub> but given the magnitude of the light-gated current (Laughlin et al., 1998) the relative increase will be very small. Thus for our intents and purposes the effects of these additional costs are covered by our models. For want of sufficient data…”.

      Reviewer #2 (Recommendations For The Authors):

      A few comments for consideration by the authors:

      (1) In the abstract, Maybe give another example explaining why other eyes should be different to those of fast diurnal insects.

      This worthwhile extrapolation is best kept to the Discussion.

      (2) Would it be worthwhile mentioning that the photopigment density is low in rhabdoms compared to vertebrate outer segments? This will have major effects on the relative size of retina and optics.

      Thank you, we now make this good point in the Discussion (lines 698-702).

      (3) It took me a while to understand what you mean by an energy tariff. For the less initiated reader many other variables may be difficult to comprehend. A possible remedy would be to make a table with all variables explained first very briefly in a formal way and then explained again with a few more words for readers less fluent in the formalism.

      A very useful suggestion. We have taken your advice (p.4).

      (4) The "easy explanation" on lines 356-357 need a few more words to be understandable.

      We have expanded this argument, and corrected a mistake, the width of the head front to back is not 250 μm, it is 600 μm (lines 402-407)

      (5) Maybe devote a short paragraph in the Discussion to other types of eye, such as optical superposition eyes and pinhole eyes. This could be done very shortly and without formalism. I'm sure the authors already have a good idea of the optimal ratio of receptor arrays and optics in these eye types.

      We do not discuss this because we have not found a full account of the trade-offs and their  effects on costs and benefits. We hope that our analysis of apposition and simple eyes will encourage people to analyse the relationships between costs and benefits in other eye types. To this end we pointed out in the Discussion that recent advances in imaging and modelling could be helpful.

      (6)  Could the sentence on lines 668-671 be made a little clearer?

      “Efficiency is also depressed by increasing the photoreceptor energy tariff K<sub>E</sub>, and in line with the greater impact of photoreceptor energy costs in simple eyes, the reduction in efficiency is much greater in simple eyes (Figure 8b).0.

      We replaced this sentence with “In both simple and apposition eyes efficiency is reduced by increasing the photoreceptor energy tariff K<sub>E</sub>. This effect is much greater in simple eyes, thus as found for reductions in photoreceptor length (Figure 7b),K<sub>E</sub> has more impact on the design of simple eyes” (lines749 – 752).

      (7)  I have some reservations about the text on lines 789-796. The problem is that optics can do very little to improve the performance of a directional photoreceptor where delrho should optimally be very wide. Here, membrane folding is the only efficient way to improve performance (SNR). The option to reduce delrho for better performance comes later when simultaneous spatial resolution (multiple pixels) is introduced.

      Yes, we have been careless. We have rewritten this paragraph to say (lines 920-931)

      “Two key steps in the evolution of eyes were the stacking of photoreceptive membranes to absorb more photons, and the formation of optics to intercept more photons and concentrate them according to angle of incidence to form an image (Nilsson, 2013, 2021). Our modelling of well-developed image forming eyes shows that to improve performance stacked membranes (rhabdomeres) compete with optics for the resources invested in an eye, and this competition profoundly influences both form and function. It is likely that competition between optics and photoreceptors was shaping eyes as lenses evolved to support low resolution spatial vision. Thus the developmental mechanisms that allocate resources within modern high resolution eyes (Casares & MacGregor, 2021), by controlling cell size and shape, and as our study emphasises, gradients in size and shape across an eye, will have analogues or homologues in more ancient eyes. Their discovery….” (lines 920-931

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for major revisions:

      While the approach is novel and elegant, the results from the analysis of insect morphology do not broadly support the optimization argument and hardly constrain parameters, like the energy tariff value, at all. The most striking result of the paper is the flat plateau in information across a broad range of shape parameters and the length, and resolution trend in Figure 5.

      At no point in the Results and Discussion do we argue that resource allocation is optimized. Indeed, we frequently observe that it is not. Our mistake was to start the Abstract by observing that animals evolve to minimise costs. We have rewritten the Abstract accordingly.

      The information peaks are quite shallow. This might actually be a very important and interesting result in the paper - the fact that the information plateaus could give the insect eye quite a wide range of parameters to slide between while achieving relatively efficient sensing of the environment. Instead of attempting to use a rather ad hoc and poorly supported measure of energetics in PR cost, perhaps the pitch could focus on this flexibility. K<sub>E</sub> does not seem to constrain eye parameters and does not add much to the paper.

      We agree, being able to construct performance surfaces across morphospace is an important advance in the field of eye design and evolution, and the performance surface’s flat top has interesting implications for the evolution of adaptations. Encouraged by your remarks, we have rewritten the Abstract and the introductory paragraph of the Discussion to draw attention to these points. 

      We are disappointed that we failed to convince you that our energy tariff, K<sub>E</sub> , is no better than a poorly supported ad hoc parameter that does not add much to the paper. In our opinion a resource allocation model that ignores photoreceptor energy consumption is obviously inadequate because the high energy cost of phototransduction is both wellknown and considered to be a formative factor in eye evolution (Niven and Laughlin, 2008). One of the advantages of modelling is that one can assess the impact of factors that are known to be present, are thought to be important, but have not been quantified. We followed standard modelling practice by introducing a cost that has the same units as the other costs and, for good physiological reasons, increases linearly with the number of microvilli, according to K<sub>E</sub>. We then vary this unknown cost parameter to discover when and why it is significant. We were pleased to discover that we could combine data on photoreceptor energy demands and whole animal metabolic rates to establish the likely range of K<sub>E</sub>. This procedure enabled us to unify the cost-benefit analyses of optics and photoreceptors, and to discover that realistic values of K<sub>E</sub> have a profound impact on the structure and performance of an efficient eye. We hope that this advance will encourage people to collect the data needed to evaluate K<sub>E</sub>.To emphasise the importance of K<sub>E</sub> and dispel doubts associated with the failure of the model to fit the data, we have revised two sections:  Flies invest efficiently in costly photoreceptor arrays in the Results, and How efficiently do insects allocate resources within their apposition eyes?  in the Discussion. These rewrites also explain why it is impossible for us to infer K<sub>E</sub> by adjusting its value so that the model’s predictions fit the data.

      The graphics after Figure 3 are quite dense and hard to follow. None of the plateau extent shown in Fig 3 is carried through to the subsequent plots, which makes the conclusions drawn from these figures very hard to parse. If the peak information occurs on a flat plateau, it would be more helpful to see those ranges of parameters displayed in the figures.

      Ideally one should do as you suggest and plot the extent of the plateau, but in our situation this is not very helpful. In the best data set, flies, optimised models predict D well, get close to ∆φ in larger eyes, and demonstrate that these optimum values are not very sensitive to K<sub>E</sub> L is a different matter, it is very sensitive to K<sub>E</sub> L which, as we show (and frequently remind) is poorly constrained by experimental data. The best we can do is estimate the envelope of L vs C<sub>tot</sub>  curves, as defined by a plausible range of K<sub>E</sub>L . Because most of the plateau boundaries you ask for will fall within this envelope, plotting them does little to clear the fog of uncertainty. We note that all three referees agree that our model can account for two robust trends, i) in apposition eyes L increase with optical resolving power and acuity, both within individual eyes and among eyes of different sizes, and ii) L is much longer is apposition eyes than in simple eyes. Nonetheless, the scatter of data points and their failure to fit creates a bad impression. We gave a number of reasons why the model does not fit the data points, but these were scattered throughout the Results and Discussion and, as referees 1 and 3 point out, this makes it difficult to draw convincing conclusions. To rectify this failing, we have rewritten two sections, in the Results Flies invest efficiently in costly photoreceptor arrays and in the Discussion, How efficiently do insects allocate resources within their apposition eyes?, to discuss these reasons en bloc, draw conclusions and suggest how better data and refinements to modelling could resolve these issues.  

      Throughout the figures, the discontinuities in the optimal cuts through parameter space are not sufficiently explained.

      We added a couple of sentences that address the “jumps” (lines 313 – 318)

      None of the data seems to hug any of the optimal lines and only weakly follow the trends shown in the plots. This makes interpretation difficult for the reader and should be better explained. The text can be a little telegraphic in the Results after roughly page 10, and requires several readings to glean insight into the manuscript's conclusions.

      We revised the Results section in which we compare the best data set, flies’  NS eyes with theoretical predictions, Flies invest efficiently in costly photoreceptor arrays,  to expand our interpretation of the data and clarify our arguments. The remaining sections have not been expanded. In the next section, which is on fused rhabdom apposition eyes, our interpretation of the scattering of data points follows the same line of argument. The remaining Results sections are entirely theoretical.  

      Overall, the rough conclusions outlined in the Results seem moderately supported by the matches of the data to the optimal information transmission cuts through parameter space, but only weakly.

      We agree, more data is required to test and refine our theoretical predictions.

      The Discussion is long and well-argued, and contains the most cogent writing in the manuscript.

      Thank you: this is most pleasing. We submitted our study to eLife because it allows longer Discussions, but we worried that ours was too long. However, we felt that our extensive Discussion was necessary for two reasons. First, we are introducing a new approach to understanding of eye design and evolution. Second, because the data on eye morphology and costs are limited, we had to make a number of assumptions and by discussing these, warts and all, we hoped to encourage experimentalists to gather more data and focus their efforts on the most revealing material.  

      Minor comments:

      We have acted upon most of your minor comments and we confine our remarks to our disagreements. We are grateful for your attention to details that we \textshould have picked up on.  

      It's a more standard convention to say "cost-benefit" rather than with a colon. 

      "equation" should be abbreviated "eq" or "eqn", never with a "t"

      when referring to the work of van Hateren, quote the paper and the database using "van Hateren" not just "Hateren"

      small latex note: use "\textit{SNR}" to get the proper formatting for those letters when in the math environment

      Line 100-110: "f" is introduced, but only f' is referenced in the figure. This should be explained in order. d_rh is not included in the figure. Also in this section, d_rh/f is also referenced before \Delta \rho_rf, which is the same quantity, without explanation.  

      Figure 1 shows eye structure and geometry. f’ is a lineal dimension of the eye but f is not, so f is not shown in Fig 1e. We eliminated the confusion surrounding ∆ρ<sub>rh</sub>  by deleting “and changing the acceptance angle of the photoreceptive waveguide ∆ρ<sub>rh</sub> (Snyder, 1979)”.  

      Fig 1 caption: this says "From dorsal to ventral," then describes trends that run ventral to dorsal, which is a confusing typo.

      Fig 3 - adding some data points to these plots might help the reader understand how (or if) K_E is constrained by the data.

      It is not possible to add data points because to total cost, Ctot ,is unknown.

      Fig 4c (and in other subplots): the jumps in L with C_tot could be explained better in the text - it wasn't clear to this reviewer why there are these discontinuities.

      Dealt with in the revised text (lines  310-318).

      Fig 4d: The caption for this subplot could be more clearly written.

      We have rewritten the subscript for subplot 4d.

      Fig 5 and other plots with data: please indicate which symbols are samples from the same species. This info is hard to reconstruct from the tables.

      We have revised Figure 5 accordingly. Species were already indicated in Figure 6.

      Line 328: missing equation number

    1. eLife Assessment

      This work is a important resource for hypothesis testing of candidate upstream transcriptional regulatory factors that control the spatiotemporal expression of selector genes and their targets for GABAergic vs glutamatergic neuron fate in the anterior brainstem. Extensive high-quality datasets were generated and state of the art computational methods were convincingly implemented to identify candidate regulatory elements. The work will be of interest to biologists working to understand neuronal gene regulatory networks.

    2. Reviewer #1 (Public review):

      The objectives of this research are to understand how key selector transcription factors, Tal1, Gata2, Gata3, determine GABAergic vs glutamatergic neuron fate from the rhombencephalic V2 precursor domain and how their spatiotemporal expression is controlled by upstream regulators. Toward these goals, the authors have generated an impressive array of scRNA, scATAC-seq, and CUT&Tag datasets obtained from dissociated E12.5 ventral R1 dissections. The rV2 was subsetted with well-known markers. The authors use an extensive set of computational approaches to identify temporal patterns of chromatin accessibility, TF motif binding activities (footprints), gene expression and regulatory motifs at the different selector gene loci. These analyses are used to predict upstream regulators, candidate accessible CREs, and DNA binding motifs through which the selectors may be controlled in rV2 by upstream regulators. Further analyses predict auto- and cross-regulatory interactions for maintenance of selector expression and the downstream effectors of alternative transmitter identities controlled by the selectors. The authors have achieved their aim of making predictions about upstream and downstream selector TF regulatory networks; their conclusions and predictions are largely well supported. The work clearly illustrates the daunting gene regulatory complexity likely at play in controlling rV2 transmitter fate.

      This is data-rich study and a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators and motifs identified in the study. The strengths of this work are the overall high quality of the datasets and in depth analyses. Through its comprehensive data and predictions, it is likely to have impact in advancing the understanding of GABAergic vs glutamatergic neuron fate decisions. The authors present a "simplified" gene regulatory model. However, the model does not illustrate the complexity of potential stage-specific upstream TF interactions with Tal1 and Vsx2 selector genes uncovered in TF footprinting analyses. While this seems nearly impossible to achieve given the plethora of potential functional TF inputs, the authors should consider assembling a focussed model by selectively illustrating the most robust, evidence-backed upstream TF input predictions, which are considered the strongest candidates for future hypothesis-driven perturbation experiments. It seems Insm1, Sox4, E2f1, Ebf1 and Tead2 TFs might be the strongest upstream candidates for future testing of Tal1 activation given the extensive analyses of their spatiotemporal expression patterns relative to Tal1, presented in Fig 4.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors, and layer these onto the single cell data to develop a model of the transcription factors interactions that define this fate choice.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      Weaknesses:

      The study does not go as far as to experimentally test the transcription factor network from their model.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The objective of this research is to understand how the expression of key selector transcription factors, Tal1, Gata2, Gata3, involved in GABAergic vs glutamatergic neuron fate from a single anterior hindbrain progenitor domain is transcriptionally controlled. With suitable scRNAseq, scATAC-seq, CUT&TAG, and footprinting datasets, the authors use an extensive set of computational approaches to identify putative regulatory elements and upstream transcription factors that may control selector TF expression. This data-rich study will be a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators identified in the study. The data are displayed in some of the main and supplemental figures in a way that makes it difficult to appreciate and understand the authors' presentation and interpretation of the data in the Results narrative. Primary images used for studying the timing and coexpression of putative upstream regulators, Insm1, E2f1, Ebf1, and Tead2 with Tal1 are difficult to interpret and do not convincingly support the authors' conclusions. There appears to be little overlap in the fluorescent labeling, and it is not clear whether the signals are located in the cell soma nucleus.

      Strengths:

      The main strength is that it is a data-rich compilation of putative upstream regulators of selector TFs that control GABAergic vs glutamatergic neuron fates in the brainstem. This resource now enables future perturbation-based hypothesis testing of the gene regulatory networks that help to build brain circuitry.

      We thank Reviewer #1 for the thoughtful assessment and recognition of the extensive datasets and computational approaches employed in our study. We appreciate the acknowledgment that our efforts in compiling data-rich resources for identifying putative regulators of key selector transcription factors (TFs)—Tal1, Gata2, and Gata3—are valuable for future hypothesis-driven research.

      Weaknesses:

      Some of the findings could be better displayed and discussed.

      We acknowledge the concerns raised regarding the clarity and interpretability of certain figures, particularly those related to expression analyses of candidate upstream regulators such as Insm1, E2f1, Ebf1, and Tead2 in relation to Tal1. We agree that clearer visualization and improved annotation of fluorescence signals are crucial to accurately support our conclusions. In our revised manuscript, we will enhance image clarity and clearly indicate sites of co-expression for Tal1 and its putative regulators, ensuring the results are more readily interpretable. Additionally, we will expand explanatory narratives within the figure legends to better align the figures with the results section.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors and layer these onto the single-cell data to get a sense of the transcriptional dynamics.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis squarely on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      We also appreciate the thoughtful comments from Reviewer #2, highlighting the strengths of our approach in elucidating gene regulatory interactions that govern neuronal fate decisions in the embryonic mouse brainstem. We are pleased that our focus on a critical cell-fate decision point and the integration of diverse data modalities, combined with computational analyses, has been recognized as a key strength.

      Weaknesses:

      The study generates a lot of data about transcription factor binding sites, both predicted and validated, but the data are substantially descriptive. It remains challenging to understand how the integration of all these different TFs works together to switch terminal programs on and off.

      Reviewer #2 correctly points out that while our study provides extensive data on predicted and validated transcription factor binding sites, clearly illustrating how these factors collectively interact to regulate terminal neuronal differentiation programs remains challenging. We acknowledge the inherently descriptive nature of the current interpretation of our combined datasets.

      In our revision, we will clarify how the different data types support and corroborate one another, highlighting what we consider the most reliable observations of TF activity. Additionally, we will revise the discussion to address the challenges associated with interpreting the highly complex networks of interactions within the gene regulatory landscape.

      We sincerely thank both reviewers for their constructive feedback, which we believe will significantly enhance the quality and accessibility of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The results in Figure 3 and several associated supplements are mainly a description/inventory of putative CREs some of which are backed to some extent by previous transgenic studies. But given the way the authors chose to display the transgenic data in the Supplements, it is difficult to fully appreciate how well the transgenic data provide functional support. Take, for example, the Tal +40kb feature that maps to a midbrain enhancer: where exactly does +40kb map to the enhancer region? Is Tal +40kb really about 1kb long? The legend in Supplemental Figure 6 makes it difficult to interpret the bar charts; what is the meaning of: features not linked to gene -Enh? Some of the authors' claims are not readily evident or are inscrutable. For example, Tal locus features accessible in all cell groups are not evident (Fig 2A,B). Other cCREs are said to closely correlate with selector expression for example, Tal +.7kb and +40kb. However, inspection of the data seems to indicate that the two cCREs have very different dynamics and only +40kb seems to correlate with the expression track above it. Some features are described redundantly such as the Gata2 +22 kb, +25.3 kb, and +32.8 kb cCREs above and below the Gata3 cCRE. What is meant by: The feature is accessible at 3' position early, and gains accessibility at 5' positions ... Detailed feature analysis later indicated the binding of Nkx6-1 and Ascl1 that are expressed in the rV2 neuronal progenitors, at 3' positions, and binding of Insm1 and Tal1 TFs that are activated in early precursors, at 5' positions (Figure 3C).

      To allow easier assessment of the overlap of the features described in this study in reference to the transgenic studies, we have added further information about the scATAC features, cCREs and previously published enhancers, as well as visual schematics of the feature-enhancer overlaps in the Supplementary table 4. The Supplementary Table 4 column contents are also now explained in detail in the table legend (under the table). We hope those changes make the feature descriptions clearer. To answer the reviewer's question about the Tal1+40kb enhancer, the length of the published enhancer element is 685 bp and the overlapping scATAC feature length is 2067 bp (Supplementary Table 3, sheet Tal1, row 103).

      The legend and the chart labelling in the Supplementary Figure 5 (formerly Supplementary figure 6) have been elaborated, and the shown categories explained more clearly.

      Regarding the features at the Tal1 locus, the text has been revised and the references to the features accessible in all cell groups were removed. These features showed differences in the intensity of signal but were accessible in all cell groups. As the accessibility of these features does not correlate with Tal1 expression, they are of less interest in the context of this paper.

      The gain in accessibility of the +0.7kb and +40 kb features correlates with the onset of Tal1 RNA expression. This is now more clearly stated in the text, as " For example, the gain in the accessibility of Tal1 cCREs at +0.7 and +40 kb correlated temporally with the expression of Tal1 mRNA (Figure 2B), strongly increasing in the earliest GABAergic precursors (GA1) and maintained at a lower level in the more mature GABAergic precursor groups (GA2-GA6), " (Results, page 4). The reviewer is right that the later dynamics of the +0.7 and +40 cCREs differ and this is now stated more clearly in the text (Results, page 5, last chapter).

      The repetition in the description of the Gata2 +22 kb, +25.3 kb, and +32.8 kb cCREs has been removed.

      The Tal1 +23 kb cCRE showed within-feature differences in accessibility signal. This is explained in the text on page 5, referring to the relevant figure 2A, showing the accessibility or scATAC signal in cell groups and the features labelled below, and 3C, showing the location of the Nkx6-1 and Ascl1 binding sites in this feature: "The Tal1 +23 kb cCRE contained two scATAC-seq peaks, having temporally different patterns of accessibility. The feature is accessible at 3' position early, and gains accessibility at 5' positions concomitant with GABAergic differentiation (Figure 2A, accessibility). Detailed feature analysis later indicated that the 3' end of this feature contains binding sites of Nkx6-1 and Ascl1 that are expressed in the rV2 neuronal progenitors, while the 5' end contains TF binding sites of Insm1 and Tal1 TFs that are activated in early precursors (described below, see Figure 3C)."

      (2) Supplementary Figure 3 is not presented in the Results.

      Essential parts of previous Supplementary Figure 3 have been incorporated into the Figure 4 and the previous Supplementary Figure omitted.

      (3) The significance of Figure 3 and the many related supplements is difficult to understand. A large number of footprints with wide-ranging scores, many very weak or unbound, are displayed in the various temporal cell groups in different epigenomic regions of Tal1 and Vsx2. The footprints for GA1 and Ga2 are combined despite Tal1 showing stronger expression in GA1 and stronger accessibility (Figure 2). Many possibilities are outlined in the Results for how the many different kinds of motifs in the cCREs might bind particular TFs to control downstream TF expression, but no experiments are performed to test any of the possibilities. How well do the TOBIAS footprints align with C&T peaks? How was C&T used to validate footprints? Are Gata2, 3, and Vsx2 known to control Tal1 expression from perturbation experiments?

      Figure 3 and related supplements present examples of the primary data and summarise the results of comprehensive analysis. The methods of identifying the selector TF regulatory features and the regulators are described in the Methods (Materials and Methods page 16). Briefly, the correlation between feature accessibility and selector TF RNA expression (assessed by the LinkPeaks score and p-value) were used to select features shown in the Figure 3.

      We are aware of differences in Tal1 expression and accessibility between GA1 and GA2. However, number of cells in GA2 was not high enough for reliable footprint calculations and therefore we opted for combining related groups throughout the rV2 lineage for footprinting.

      As suggested, CUT&Tag could be used to validate the footprinting results with some restrictions. In the revised manuscript, we included analysis of CUT&Tag peak location and footprints similarly to an earlier study (Eastman et al. 2025). In summary, we analysed whether CUT&Tag peaks overlap locations in which footprinting was also recognized and vice versa. Per each TF with CUT&Tag data we calculated a) Total number of CUT&Tag consensus peaks b) Total number of bound TFBS (footprints) c) Percentage of CUT&Tag overlapping bound TFBS d) Percentage of bound TFBS overlapping CUT&Tag. These results are shown in Supplementary Table 6 and in Supplementary figure 11 with analysis described in Methods (Materials and Methods, page 19). There is considerable overlap between CUT&Tag peaks and bound footprints, comparable to one shown in Eastman et al. 2025. However, these two methods are not assumed to be completely matching for several reasons: binding by related/redundant TFs, antigen masking in the TF complex, chromatin association without DNA binding, etc. In addition, some CUT&Tag peaks with unbound footprints could arise from non-rV2 cells that were part of the bulk CUT&Tag analysis but not of the scATAC footprint analysis.

      The evidence for cross-regulation of selector genes and the regulation of Tal1 by Gata2, Gata3 and Vsx2 is now discussed (Discussion, chapter Selector TFs directly autoregulate themselves and cross-regulate each other, page 12-13). The regulation of Tal1 expression by Vsx2 has, to our knowledge, not been earlier studied.

      (4) Figure 4 findings are problematic as the primary images seem uninterpretable and unconvincing in supporting the authors' claims. There is a lack of clear evidence in support of TF coexpression and that their expression precedes Tal1.

      Figure 4 has been entirely redrawn with higher resolution images and a more logical layout. In the revised Figure 4, only the most relevant ISH images are shown and arrowheads are added showing the colocalization of the mRNA in the cell cytoplasm. Next to the plots of RNA expression along the apical-basal axis of r1, an explanatory image of the quantification process is added (Figure 4D).

      (5) What was gained from also performing ChromVAR other than finding more potential regulators and do the results of the two kinds of analyses corroborate one another? What is a dual GATA:TAL BS?

      Our motivation for ChromVAR analysis is now more clearly stated in the text (Results, page 9): “In addition to the regulatory elements of GABAergic fate selectors, we wanted to understand the genome-wide TF activity during rV2 neuron differentiation. To this aim we applied ChromVAR (Schep et al., 2017)" Also, further explanation about the Tal1and Gata binding sites has been added in this chapter (Results, page 9).

      The dual GATA:Tal BS (TAL1.H12CORE.0.P.B) is a 19-bp motif that consists of an E-box and GATA sequence, and is likely bound by heteromeric Gata2-Tal1 TF complex, but may also be bound by Gata2, Gata3 or Tal1 TFs separately. The other TFBSs of Tal1 contain a strong E-box motif and showed either a lower activity (TAL1.H12CORE.1.P.B) or an earlier peak of activity in common precursors with a decline after differentiation (TAL1.H12CORE.2.P.B) (Results, page 9).

      (6) The way the data are displayed it is difficult to see how the C&T confirmed the binding of Ebf1 and Insm1, Tal1, Gata2, and Gata3 (Supplementary Figures 9-11). Are there strong footprints (scores) centered at these peaks? One can't assess this with the way the displays are organized in Figure 3. What is the importance of the H3K4me3 C&T? Replicate consistency, while very strong for some TFs, seems low for other TFs, e.g. Vsx2 C&T on Tal1 and Gata2. The overlaps do not appear very strong in Supplementary Figure 10. Panels are not letter labeled.

      We have added an analysis of footprint locations within the CUT&Tag peaks (Supplementary Figure 11). The Figure shows that the footprints are enriched at the middle regions of the CUT&Tag peaks, which is expected if TF binding at the footprinted TFBS site was causative for the CUT&Tag peaks.

      The aim of the Supplementary Figures 9-11 (Supplementary Figures 8-10 in the revised manuscript) was to show the quality and replicability of the CUT&Tag.

      The anti-H3K4me3 antibody, as well as the anti-IgG antibody, was used in CUT&Tag as part of experiment technical controls. A strong CUT&Tag signal was detected in all our CUT&Tag experiments with H3K4me3. The H3K4me3 signal was not used in downstream analyses.

      We have now labelled the H3K4me3 data more clearly as "positive controls" in the Supplementary Figure 8. The control samples are shown only on Supplementary Figure 8 and not in the revised Supplementary Figure 10, to avoid repetition. The corresponding figure legends have been modified accordingly.

      To show replicate consistency, the genome view showing the Vsx2 CUT&Tag signal at Gata2 gene has been replaced by a more representative region (Supplementary Figure 8, Vsx2). The Vsx2 CUT&Tag signal at the Gata2 locus is weak, explaining why the replicability may have seemed low based on that example.

      Panel labelling is added on Supplementary Figures S8, S9, S10.  

      (7) It would be illuminating to present 1-2 detailed examples of specific target genes fulfilling the multiple criteria outlined in Methods and Figure 6A.

      We now present examples of the supporting evidence used in the definition of selector gene target features and target genes. The new Supplementary Figure 12 shows an example gene Lmo1 that was identified as a target gene of Tal1, Gata2 and Gata3.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors perform CUT&Tag to ask whether Tal1 and other TFs indeed bind putative CREs computed. However, it is unclear whether some of the antibodies (such as Gata3, Vsx2, Insm1, Tead2, Ebf1) used are knock-out validated for CUT&Tag or a similar type of assay such as ChIP-seq and therefore whether the peaks called are specific. The authors should either provide specificity data for these or a reference that has these data. The Vsx2 signal in Figure S9 looks particularly unconvincing.

      Information about the target specificity of the antibodies can be found in previous studies or in the product information. The references to the studies have been now added in the Methods (Materials and Methods, CUT&Tag, pages 18-19). Some of the antibodies are indeed not yet validated for ChIP-seq, Cut-and-run or CUT&Tag. This is now clearly stated in the Materials and Methods (page 19): "The anti-Ebf1, anti-Tal1, anti-IgG and anti-H3K4me3 antibodies were tested on Cut-and-Run or ChIP-seq previously (Boller et al., 2016b; Courtial et al., 2012) and Cell Signalling product information). The anti-Gata2 and anti-Gata3 antibodies are ChIP-validated ((Ahluwalia et al., 2020a) and Abcam product information). There are no previous results on ChIP, ChIP-seq or CUT&Tag with the anti-Insm1, anti-Tead2 and anti-Vsx2 antibodies used here. The specificity and nuclear localization have been demonstrated in immunohistochemistry with anti-Vsx2 (Ahluwalia et al., 2020b) and anti-Tead2 (Biorbyt product information). We observed good correlation between replicates with anti-Insm1, similar to all antibodies used here, but its specificity to target was not specifically tested". We admit that specificity testing with knockout samples would increase confidence in our data. However, we have observed robust signals and good replicability in the CUT&Tag for the antibodies shown here.

      Vsx2 CUT&Tag signal at the loci previously shown in Supplementary Figure S9 (now Supplementary Figure 8) is weak, explaining why the replicability may seem low based on those examples. The genome view showing the Vsx2 CUT&Tag signal at Gata2 gene locus in Supplementary Figure 8 (previously Supplementary figure 9) has now been replaced by a view of Vsx2 locus that is more representative of the signal.

      (2) It is unclear why the authors chose to focus on the transcription factor genes described in line 626 as opposed to the many other putative TFs described in Figure 3/Supplementary Figure 8. This is the major challenge of the paper - the authors are trying to tell a very targeted story but they show a lot of different names of TFs and it is hard to follow which are most important.

      We agree with the reviewer that the process of selection of the genes of interest is not always transparent. We are aware that interpretations of a paper are based on the known functions of the putative regulatory TFs, however additional aspects of regulation could be revealed even if the biological functions of all the TFs were known. This is now stated in the Discussion “Caveats of the study” chapter. It would be relevant to study all identified candidate genes, but as often is the case, our possibilities were limited by the availability of materials (probes, antibodies), time, and financial resources. In the revised manuscript, we now briefly describe the biological processes related to the selected candidate regulatory TFs of the Tal1 gene (Results, page 8, "Pattern of expression of the putative regulators of Tal1 in the r1"). We hope this justifies the focus on them in our RNA co-expression analysis. The TFs analysed by RNAscope ISH are examples, which demonstrate alignment of the tissue expression patterns with the scRNA-seq data, suggesting that the dynamics of gene expression detected by scRNA-seq generally reflects the pattern of expression in the developing brainstem.

      (3) How is the RNA expression level in Figure 5B and 4D-L computed? These are the clusters defined by scATAC-seq. Is this an inferred RNA expression? This should be made more clear in the text.

      The charts in Figures 5B and 4G,H,I show inferred RNA expression. The Y-axis labels have now been corrected and include the term inferred’. RNA expression in the scATAC-seq cell clusters is inferred from the scRNA-seq cells after the integration of the datasets.

      (4) The convergence of the GABA TFs on a common set of target genes reminds me of a nice study from the Rubenstein lab PMID: 34921112 that looked at a set of TFs in cortical progenitors. This might be a good comparison study for the authors to use as a model to discuss the convergence data.

      We thank the reviewer for bringing this article to our attention. The article is now discussed in the manuscript (Discussion, page 11).

      (5) The data in Figure 4, the in-situ figure, needs significant work. First, the images especially B, F, and J appear to be of quite low resolution, so they are hard to see. It is unclear exactly what is being graphed in C, G, and K and it does not seem to match the text of the results section. Perhaps better labeling of the figure and a more thorough description will make it clear. It is not clear how D, H, and L were supposed to relate to the images - presumably, this is a case where cell type is spatially organized, but this was unclear in the text if this is known and it needs to be more clearly described. Overall, as currently presented this figure does not support the descriptions and conclusions in the text.

      Figure 4 has been entirely redrawn with higher resolution images and more logical layout. In the revised Figure 4, the ISH data and the quantification plots are better presented; arrows showing the colocalization of the mRNA in the cell cytoplasm were added; and an explanatory image of the quantification process is added on (D).

      Minor points

      (1) Helpful if the authors include scATAC-seq coverage plots for neuronal subtype markers in Figure 1/S1.

      We are unfortunately uncertain what is meant with this request. Subtype markers in Figure 1/S1 scATAC-seq based clusters are shown from inferred RNA expression, and therefore these marker expression plots do not have any coverage information available.

      (2) The authors in line 429 mention the testing of features within TADs. They should make it clear in the main text (although tadmap is mentioned in the methods) that this is a prediction made by aggregating HiC datasets.

      Good point and that this detail has been added to both page 3 and 16.

      (3) The authors should include a table with the phastcons output described between lines 511 and 521 in the main or supplementary figures.

      We have now clarified int the text that we did not recalculate any phastcons results, we merely used already published and available conservation score per nucleotide as provided by the original authors (Siepel et al. 2005). (Results, page 5: revised text is " To that aim, we used nucleotide conservation scores from UCSC (Siepel et al., 2005). We overlaid conservation information and scATAC-seq features to both validate feature definition as well as to provide corroborating evidence to recognize cCRE elements.")

      (4) It is very difficult to read the names of the transcription factor genes described in Figure 3B-D and Supplementary Figure 8 - it would be helpful to resize the text.

      The Figures 3B-D and Supplementary Figure 7 (former Supplementary figure 8) have been modified, removing unnecessary elements and increasing the size of text.

      (5) It is unclear what strain of mouse is used in the study - this should be mentioned in the methods.

      Outbred NMRI mouse strain was used in this study. Information about the mouse strain is added in Materials and Methods: scRNA-seq samples (page 14), scATAC-seq samples (page 15), RNAscope in situ hybridization (page 17) and CUT&Tag (page 18).

      (6) Text size in Figure 6 should be larger. R-T could be moved to a Supplementary Figure.

      The Figure 6 has been revised, making the charts clearer and the labels of charts larger. The Figure 6R-S have been replaced by Supplementary table 8 and the Figure 6T is now shown as a new Figure (Figure 7).

      Additional corrections in figures

      Figure 6 D,I,N had wrong y-axis scale. It has been corrected, though it does not have an effect on the interpretation of the data as Pos.link and Neg.link counts were compared to each other’s (ratio).

      On Figure 2B, the heatmap labels were shifted making it difficult to identify the feature name per row. This is now corrected.

    1. eLife Assessment

      This valuable study reports the physiological function of a putative transmembrane UDP-N-acetylglucosamine transporter called SLC35G3 in spermatogenesis. The conclusion that SLC35G3 is a new and essential factor for male fertility in mice and probably in humans is supported by convincing data. This study will be of interest to reproductive biologists and physicians working on male infertility.

    2. Reviewer #1 (Public review):

      Summary:

      In the present manuscript, Mashiko and colleagues describe a novel phenotype associated with deficient SLC35G3, a testis-specific sugar transporter that is important in glycosylation of key proteins in sperm function. The study characterizes a knockout mouse for this gene and the multifaceted male infertility that ensues. The manuscript is well-written and describes novel physiology through a broad set of appropriate assays.

      Strengths:

      Robust analysis with detailed functional and molecular assays

      Weaknesses:

      (1) The abstract references reported mutations in human SLC35G3, but this is not discussed or correlated to the murine findings to a sufficient degree in the manuscript. The HEK293T experiments are reasonable and add value, but a more detailed discussion of the clinical phenotype of the known mutations in this gene and whether they are recapitulated in this study (or not) would be beneficial.

      (2) Can the authors expand on how this mutation causes such a wide array of phenotypic defects? I am surprised there is a morphological defect, a fertilization defect, and a transit defect. Do the authors believe all of these are present in humans as well?

    3. Reviewer #2 (Public review):

      Summary:

      This study characterized the function of SLC35G3, a putative transmembrane UDP-N-acetylglucosamine transporter, in spermatogenesis. They showed that SLC35G3 is testis-specific and expressed in round spermatids. Slc35g3-null males were sterile, but females were fertile. Slc35g3-null males produced a normal sperm count, but sperm showed subtle head morphology. Sperm from Slc35g3-null males have defects in uterotubal junction passage, ZP binding, and oocyte fusion. Loss of SLC35G3 causes abnormal processing and glycosylation of a number of sperm proteins in the testis and sperm. They demonstrated that SLC35G3 functions as a UDP-GlcNAc transporter in cell lines. Two human SLC35G3 variants impaired their transporter activity, implicating these variants in human infertility.

      Strengths:

      This study is thorough. The mutant phenotype is strong and interesting. The major conclusions are supported by the data. This study demonstrated SLC35G3 as a new and essential factor for male fertility in mice, which is likely conserved in humans.

      Weaknesses:

      Some data interpretations need to be revised.

    1. eLife Assessment

      Qiu et al. present multiple dimeric structures of GPR3, which reveal the binding mode of the inverse agonist AF64394. The findings provide important insights into the regulation of GPCR3 and potentially other related orphan GPCRs. The authors present convincing evidence of their claims through thoughtful analysis of their cryo-EM structures, mutagenesis, and cell-based assays. This work will be of interest to GPCR investigators, especially those studying the signaling of orphan receptors.

    2. Reviewer #1 (Public review):

      Summary:

      GPR3 is an orphan receptor that plays a crucial role in central nervous system development and cold-induced thermogenesis, with potential implications for treating neurodegenerative and metabolic diseases. Although previous structural studies of GPR3 have been reported, Qiu et al. presented both active and inactive structures of GPR3 in its dimeric form. Notably, they identified AF64394 as a negative allosteric modulator that binds at the dimerization interface. This interface, primarily formed by transmembrane helices TM5 and TM6, is significantly larger than the dimerization interfaces previously reported for class A GPCRs. The authors further elucidate GPR3's activation mechanism and propose that dimerization may serve as a regulatory feature of GPR3 function. Overall, the study is well-executed, and the conclusions are sound.

      Strengths:

      Reported a unique dimerization interface of GPR3 and identified AF64394 as a negative allosteric modulator that binds at the dimerization interface.

      Weaknesses:

      There are some minor issues in the figure presentation.

    3. Reviewer #2 (Public review):

      Qiu et al. present active and inactive state dimeric structures of GPR3 with and without the previously identified inverse agonist AF64394. The manuscript combines cryo-EM processing, mutagenesis studies, and live-cell cAMP measurements to provide insights into the mechanism of action of AF64394 as a negative allosteric modulator of GPR3. All resolved structures show the density of a presumably hydrophobic endogenous, co-purified ligand in the orthosteric receptor binding pocket, supporting previous publications by this and other groups that endogenous lipids are endogenous ligands of GPR3. However, the authors also show that none of the proposed endogenous lipids (e.g., oleoylethanolamide) are able to further increase cAMP in living cells in a GPR3-dependent manner when applied exogenously. These data are in contrast to previous studies, but are of interest to the field as they may suggest that GPR3 expressed in different cell types is already saturated by endogenous lipids.

      The overall findings are novel and exciting. GPR3 has not previously been proposed to assemble into a homodimeric complex, and no information has been published on where AF64394 binds to the receptor. Several comparative analyses between GPR3 and its close relatives, GPR6 and GPR12, including live cell experiments with GPR3/6 chimera, provide intriguing mechanistic explanations for the different dimerisation behaviour and activity of AF64394 at this GPCR cluster.

      The only weakness of the study is that the population shift towards homodimer induced by AF6439, as suggested by 2D classifications of purified GPR3, is not supported by live cell experiments. The fact that AF64394 reduces GPR3-mediated cAMP production in a concentration-dependent manner may also be due to mechanisms independent of homodimerisation. Therefore, a live cell assay that directly detects dimer formation and/or dissociation upon different stimuli would significantly strengthen the findings of Qiu et al.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Qiu and co-workers describes the single-particle cryo-electron microscopy structures of various oligomeric states of the orphan GPCR, GPR3. It describes the monomeric and dimeric structure of a mutant of GPR3 with a modified G-protein complex (miniGs) and then builds on this work to attempt an inactive 'apo' dimer and an allosteric modulator (AF bound dimer structure, by using an ICL3 insertion and stabilizing FAB fragments.

      In general, I'm supportive of the work done in this study, and it does indeed provide valuable insight into GPR3 function. It may be that dimerization of certain class A GPCRs may be a means of signalling regulation or perhaps even amplification. However, some of the interpretation of the single particle data needs some extra attention to strengthen the hypothesis presented in the manuscript.

      Firstly, I want to thank the authors for providing the unfiltered half-maps and PDB models for careful assessment. During this review, I did my own post-processing of the half-maps and used the resultant maps for careful analysis of models.

      So to begin, I understand that the authors didn't model any lipid in the binding orthosteric binding site in any of the maps, but it may be worthwhile to model something in there, as many readers only download coordinates and not the maps.

      A more general point about all the maps. In no case were any focussed refinements carried out. As the point of this paper are some of the finer details between active and intermediate states and the effect of an allosteric modulator, masking out hypervariable portions of the structure and doing local Euler searches would most certainly provide richer insights of the details in GPR3 (especially as the BRIL:Fab structures are not of interest). And also, generally, no 3D-variability studies were performed to see if minor differences in, say, TM4/5/6 positions were due to large variation in the single particles or were a stable consensus position.

      As for the PFK dimeric structure. It appears to be refined with C2 point group symmetry (which is not mentioned anywhere except in a tiny bit of text in a supplemental figure). Was this also calculated in C1 to assess if there is any difference in either GPR3 protomer? Also, how certain are the authors of the cholesterol positions at the bottom of TM4/5? At lower map thresholds in the PFK dimer structure, one of them appears to be continuous with the orthosteric lipid. It also appears that there are many unmodelled lipids in this structure, and only two were assigned as cholesterol. It appears that many of the unmodelled lipids are forming bridging connections between the GPR3 protomers. Also, it may be worthwhile to provide a table of the key interactions between the protomers (although I note that there was a figure highlighting them).

      With the PFK monomer structure, there was weak density for the same cholesterol, which was not modelled in this one; perhaps some commentary on the authors' approach for deciding how to assign density would be helpful. It also appears that the refinement mask was probably a bit tight in this one (something that cryoSPARC is notorious for), and rerefining with a much looser mask around the TM domain may be helpful in resolving the inner lipid leaflet positions.

      The Apo structure, I think, I have the most issues with. Firstly, it is not 'apo'. There is definitely unaccounted for density in the orthosteric site. Also, the structure definitely needs a bit more attention. Firstly, masking out the BRIL and FABs would be a good start in helping better resolve the TMD regions, and then even focussing on a single monomer to increase the map interpretability. My major problem here is that, if this is being called 'apo' and inactive, the map doesn't reflect this; also, the TM5/6 does not look to be in a fully inactive position. The map density (at least around one of the protomers) in this region looks to be poorly resolved, most likely due to averaging due to internal motion. I think some 3DVA is certainly warranted here to strengthen the hypothesis that they have solved an 'apo' inactive.

      The AF (allosteric modulator) bound structure is of significantly better quality. But again, only AF is modelled, and no lipids are. How are the authors sure? Perhaps some focussed refinements (and changing the Euler Origin to centre it on the AF molecule could be a good start). To this reviewer, at least in one of the protomers, adjacent to the AF position, there is a density that looks very much like the allosteric modulator, so it could even be forming a bridging dimer. Also, some potential assignments of the lipids may enlighten some of the structure-activity relationship of this modulator, as it seems to make as many contacts with surrounding lipids as it does with TM4/5. Also, it may be worthwhile exploring carefully the 3DVA of this data. In our studies (Russel et al.), we noted that the orthosteric lipid appears to ratchet back-and-forth in concert with TM4/5 twisting. Perhaps in the AF bound structure, as it binds at the 'exit' site of the lipid, perhaps it is locking in a specific conformation.

    1. eLife Assessment

      This study provides valuable insights into the crosstalk between ATG2A with components of the early secretory pathway, namely RAB1A and ARFGAP1. The evidence supporting the claims is convincing. However, the manuscript would benefit from a more in-depth exploration of the details of the role of RAB1A in autophagy and the functional implications of its interaction with ATG2A. In addition, the molecular details of the role of ARFGAP1 in this complex need further clarification

    2. Reviewer #1 (Public review):

      Summary:

      D. Fuller et al. set out to study the molecular partners that cooperate with ATG2A, a lipid transfer protein essential for phagophore elongation, during the process of autophagy. Through a series of experiments combining microscopy and biochemistry, the authors identify ARFGAP1 and Rab1A as components of early autophagic membranes, which accumulate at the periphery of aberrant pre-autophagosomal structures induced by loss of ATG2. While ARFGAP1 has no apparent function in autophagy, the authors show that RAB1A is implicated in autophagy, although the mechanisms are not explored in the manuscript.

      Strengths:

      The work presented by Fuller et al. provides new insights into the composition of early autophagic membranes. The authors provide a series of MS experiments identifying proteins in close proximity to ATG2A, which is a valuable dataset for the field. Furthermore, they show for the first time the interaction between ATG2A and RAB1A, both in fed and starved conditions, which extends the characterisation of the pre-autophagosomal structures observed in ATG2 DKO cells.

      Weaknesses:

      The authors claim that this study elucidates the role of early secretory membranes in phagophore formation. However, this work is largely observational, which presents compelling evidence on the association between RAB1A GTPase and ATG2A without providing mechanistic insights into the functional relevance of this interaction. It remains unclear whether Rab1A depletion phenocopies ATG2A depletion in terms of autophagy progression or accumulation of pre-autophagosomal structures.

      Furthermore, this research is conducted exclusively in HEK293 cells. Including at least one additional cell line would significantly strengthen the main findings (i.e., effects on LC3-II accumulation observed for RAB1A/B knockdown, given the previously published data on this topic).

      A notable weakness of this manuscript, in this reviewer's opinion, lies in the discussion of the data in the context of existing literature. The discussion is rather short, mostly focused on the phenotype observed in ATG2 DKO cells. While this phenotype is certainly intriguing, it feels the discussion overlooks some important aspects, as outlined in the comments to the authors.

    3. Reviewer #2 (Public review):

      The mechanisms governing autophagic membrane expansion remain incompletely understood. ATG2 is known to function as a lipid transfer protein critical for this process; however, how ATG2 is coordinated with the broader autophagic machinery and endomembrane systems has remained elusive. In this study, the authors employ an elegant proximity labeling approach and identify two ER-Golgi intermediate compartment (ERGIC)-localized proteins-Rab1 and ARFGAP1-as novel regulators of ATG2 during autophagic membrane expansion.

      Their findings support a model in which autophagosome formation occurs within a specialized subdomain of the ER that is enriched in both ER exit sites (ERES) and ERGIC, providing valuable mechanistic insight. The overall study is well-executed and offers an important contribution to our understanding of autophagy.

      Specific Comments

      (1) Integration with Prior Literature<br /> The data convincingly implicate the ERES-ERGIC interface in autophagosome biogenesis. It would strengthen the manuscript to discuss previous studies reporting ERES and ERGIC remodeling and formation of ERERS-ERGIC contact sites (PMID: 34561617; PMID: 28754694) in the context of the current findings.

      (2) Experimental Conditions<br /> In Figures 2A-C and Figure 4, it is unclear how the cells were treated. Were they starved in EBSS? This information should be included in the corresponding figure legends.

      (3) LC3 Lipidation vs. Cleavage<br /> In Figure 2A, ARFGAP1 knockdown appears to reduce LC3 lipidation without affecting Halo-LC3 cleavage. Clarifying this observation would help readers better understand the functional specificity of ARFGAP1 in the pathway.

      (4) Use of HT-mGFP in Figure 2C<br /> It should be clarified whether the assay in Figure 2C was performed in the presence of HT-mGFP. Explaining the rationale would aid the interpretation of the results.

      (5) COPII Inhibition Strategy<br /> The authors used the dominant-active SAR1(H79G) mutant to inhibit COPII function. While this is effective in in vitro budding assays, the GDP-locked mutant SAR1(T39N) has been shown to be more effective in blocking COPII-mediated trafficking in cells. Including SAR1(T39N) in the analysis would provide stronger support for the conclusions.

    4. Reviewer #3 (Public review):

      The manuscript by Fuller et al describes a crosstalk between ARTG2A with components of the early secretory pathway, namely RAB1A and ARFGAP1. They show that ATG2A is recruited to membranes positive for RAB1A, which they also show to interact with ATG2A. In agreement with earlier findings by other groups, silencing RAB1A negatively affects autophagy. While ARFGAP1 was also found on ATG2A-positive membranes, silencing ARFGAP1 had no impact on autophagy. Notably, these ARFGAP1-positive membranes are not Golgi membranes.

      The findings are interesting, and in general, the data are of good quality; however, I have outstanding questions. An answer to any of these questions might strengthen the manuscript:

      (1) Are the membranes to which ATG2A is recruited a form of ERGIC?

      (2) Figure 3A/B: Is it possible to show a better example? The difference is barely detectable by eye. Since immunoblotting is not really a quantitative method, I think that such a weak effect is prone to be wrong. Is there another tool/assay to validate this result?

      (3) Is the curvature-sensitive region of ARFGAP1 required for its co-localization with ATG2A?

      (4) What does Rab1A do? What is its effector? Or does the GTPase itself remodel the membrane?

      (5) What about Arf1? It appears that the role of ARFGAP1 is unrelated to Arf1 and COPI? Thus, one would predict that Arf1 does not localize to these structures and does not affect ATG2A function.

      (6) Does ARFGAP1 promote fission of the membrane from its donor compartment?

      (7) What are ARFGAP1 and Rab1A recruited to? What is the lipid composition or protein that recruits these two players to regulate autophagy?

    1. eLife Assessment

      This important study is of relevance for the fields of predictive processing, perception and learning, with a well-designed paradigm allowing the authors to avoid several common confounds in investigating predictions, such as adaptation. Using a state-of-the-art multivariate EEG approach, the authors test the opposing process theory and find evidence in support of it - i.e., the persuasive within trial effects. However, the interactions across block are not well motivated and much less persuasive, such that the support for the conclusions is only incomplete at present.

    2. Reviewer #1 (Public review):

      Summary:

      In this lovely paper, McDermott and colleagues tackle an enduring puzzle in the cognitive neuroscience of perceptual prediction. Though many scientists agree that top-down predictions shape perception, previous studies have yielded incompatible results - with studies showing 'sharpened' representations of expected signals, and others showing a 'dampening' of predictable signals to relatively enhance surprising prediction errors. To deepen the paradox further, it seems like there are good reasons that we would want to see both influences on perception in different contexts.

      Here, the authors aim to test one possible resolution to this 'paradox' - the opposing process theory (OPT). This theory makes distinct predictions about how the timecourse of 'sharpening' and 'dampening' effects should unfold. The researchers present a clever twist on a leading-trailing perceptual prediction paradigm, using AI to generate a large dataset of test and training stimuli, so that it is possible to form expectations about certain categories without repeating any particular stimuli. This provides a powerful way of distinguishing expectation effects from repetition effects - a perennial problem in this line of work.

      Using EEG decoding, the researchers find evidence to support the OPT. Namely, they find that neural encoding of expected events is superior in earlier time ranges (sharpening-like) followed by a relative advantage for unexpected events in later time ranges (dampening-like). On top of this, the authors also show that these two separate influences may emerge differently in different phases of learning - with superior decoding of surprising prediction errors being found more in early phases of the task, and enhanced decoding of predicted events being found in the later phases of the experiment.

      Strengths:

      As noted above, a major strength of this work lies in important experimental design choices. Alongside removing any possible influence of repetition suppression mechanisms in this task, the experiment also allows us to see how effects emerge in 'real time' as agents learn to make predictions. This contrasts with many other studies in this area - where researchers 'over-train' expectations into observers to create the strongest possible effects, or rely on prior knowledge that was likely to be crystallised outside the lab.

      Weaknesses:

      This study reveals a great deal about how certain neural representations are altered by expectation and learning on shorter and longer timescales, so I am loath to describe certain limitations as 'weaknesses'. But one limitation inherent in this experimental design is that, by focusing on implicit, task-irrelevant predictions, there is not much opportunity to connect the predictive influences seen at the neural level to perceptual performance itself (e.g., how participants make perceptual decisions about expected or unexpected events, or how these events are detected or appear).

    3. Reviewer #2 (Public review):

      Summary:

      There are two accounts in the literature that propose that expectations suppress activity of neurons that are (a) not tuned to the expected stimulus to increase the signal-to-noise ratio for expected stimuli (sharpening model) or (b) tuned to the expected stimulus to highlight novel information (dampening model). One recent account, the opposing process theory, brings the two models together and suggests that both processes occur, but at different time points: initial sharpening is followed by later dampening of the neural activity of the expected stimulus. In this study, the authors aim to test the opposing process theory in a statistical learning task by applying multivariate EEG analyses and find evidence for the opposing process theory based on the within-trial dynamics.

      Strengths:

      This study addresses a very timely research question about the underlying mechanisms of expectation suppression. The applied EEG decoding approach offers an elegant way to investigate the temporal characteristics of expectation effects. A strength of the study lies in the experimental design that aims to control for repetition effects, one of the common confounds in prediction suppression studies. The reported results are novel in the field and have the potential to improve our understanding of expectation suppression in visual perception.

      Weaknesses:

      Although some of the findings are in line with the opposing process theory, especially the EEG results only partly support the hypothesis. While the initial dampening effect occurs in the grand average ERP and in image memory decoding, the expected later sharpening effect is lacking. Moreover, some methodological decisions still remain arbitrary. One of the interesting aspects of the study - prediction decoding - had to be removed due to the fact that it could not be disentangled from category decoding. This weakens the overall scope and impact of the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      In their study McDermott et al. investigate the neurocomputational mechanism underlying sensory prediction errors. They contrast two accounts: representational sharpening and dampening. Representational sharpening suggests that predictions increase the fidelity of the neural representations of expected inputs, while representational dampening suggests the opposite (decreased fidelity for expected stimuli). The authors performed decoding analyses on EEG data, showing that first expected stimuli could be better decoded (sharpening), followed by a reversal during later response windows where unexpected inputs could be better decoded (dampening). These results are interpreted in the context of opposing process theory (OPT), which suggests that such a reversal would support perception to be both veridical (i.e., initial sharpening to increase the accuracy of perception) and informative (i.e., later dampening to highlight surprising, but informative inputs).

      Strengths:

      The topic of the present study is of significant relevance for the field of predictive processing. The experimental paradigm used by McDermott et al. is well designed, allowing the authors to avoid several common confounds in investigating predictions, such as stimulus familiarity and adaptation. The introduction of the manuscript provides a well written summery of the main arguments for the two accounts of interest (sharpening and dampening), as well as OPT. Overall, the manuscript serves as a good overview of the current state of the field.

      Weaknesses:

      In my opinion some details of the methods, results and manuscript raise some doubts about the reliability of the reported findings. Key concerns are:

      (1) In the previous round of comments, I noted that: "I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease (or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible". The authors responded: "we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1%. Given the results of this analysis and to ensure a sufficient number of trials, we focused our further analyses on bins 1-2". However, I do not see how this new analysis addresses the concern that the conclusion highlights differences in decoding performance between bins 1 and 2, yet no contrast between these bins are performed. While I appreciate the addition of the new model, in my current understanding it does not solve the problem I raised. I still believe that if the authors wish to conclude that an effect differs between two bins they must contrast these directly and/or use a different appropriate analysis approach.

      Relatedly, the logarithmic model fitting and how it justifies the focus on analysis bin 1-2 needs to be explained better, especially the rationale of the analysis, the choice of parameters (e.g., why logarithmic, why change of logarithmic fit < 0.1% as criterion, etc), and why certain inferences follow from this analysis. Also, the reporting of the associated results seems rather sparse in the current iteration of the manuscript.

      (2) A critical point the authors raise is that they investigate the buildup of expectations during training. They go on to show that the dampening effect disappears quickly, concluding: "the decoding benefit of invalid predictions [...] disappeared after approximately 15 minutes (or 50 trials per condition)". Maybe the authors can correct me, but my best understanding is as follows: Each bin has 50 trials per condition. The 2:1 condition has 4 leading images, this would mean ~12 trials per leading stimulus, 25% of which are unexpected, so ~9 expected trials per pair. Bin 1 represents the first time the participants see the associations. Therefore, the conclusion is that participants learn the associations so rapidly that ~9 expected trials per pair suffice to not only learn the expectations (in a probabilistic context) but learn them sufficiently well such that they result in a significant decoding difference in that same bin. If so, this would seem surprisingly fast, given that participants learn by means of incidental statistical learning (i.e. they were not informed about the statistical regularities). I acknowledge that we do not know how quickly the dampening/sharpening effects develop, however surprising results should be accompanied with a critical evaluation and exceptionally strong evidence (see point 1). Consider for example the following alternative account to explain these results. Category pairs were fixed across and within participants, i.e. the same leading image categories always predicted the same trailing image categories for all participants. Some category pairings will necessarily result in a larger representational overlap (i.e., visual similarity, etc.) and hence differences in decoding accuracy due to adaptation and related effects. For example, house  barn will result in a different decoding performance compared to coffee cup  barn, simply due to the larger visual and semantic similarity between house and barn compared to coffee cup and barn. These effects should occur upon first stimulus presentation, independent of statistical learning, and may attenuate over time e.g., due to increasing familiarity with the categories (i.e., an overall attenuation leading to smaller between condition differences) or pairs.

      (3) In response to my previous comment, why the authors think their study may have found different results compared to multiple previous studies (e.g. Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011), particularly the sharpening to dampening switch, the authors emphasize the use of non-repeated stimuli (no repetition suppression and no familiarity confound) in their design. However, I fail to see how familiarity or RS could account for the absence of sharpening/dampening inversion in previous studies.

      First, if the authors argument is about stimulus novelty and familiarity as described by Feuerriegel et al., 2021, I believe this point does not apply to the cited studies. Feuerriegel et al., 2021 note: "Relative stimulus novelty can be an important confound in situations where expected stimulus identities are presented often within an experiment, but neutral or surprising stimuli are presented only rarely", which indeed is a critical confound. However, none of the studies (Han et al., 2019; Richter et al., 2018; Kumar et al., 2017; Meyer and Olson, 2011) contained this confound, because all stimuli served as expected and unexpected stimuli, with the expectation status solely determined by the preceding cue. Thus, participants were equally familiar with the images across expectation conditions.

      Second, for a similar reason the authors argument for RS accounting for the different results does not hold either in my opinion. Again, as Feuerriegel et al. 2021 correctly point out: "Adaptation-related effects can mimic ES when the expected stimuli are a repetition of the last-seen stimulus or have been encountered more recently than stimuli in neutral expectation conditions." However, it is critical to consider the precise design of previous studies. Taking again the example of Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. To my knowledge none of these studies contained manipulations that would result in a more frequent or recent repetition of any specific stimulus in the expected compared to unexpected condition. The crucial manipulation in all these previous studies is not that a single stimulus or stimulus feature (which could be subject to familiarity or RS) determines the expectation status, but rather the transitional probability (i.e. cue-stimulus pairing) of a particular stimulus given the cue. Therefore, unless I am missing something critical, simple RS seems unlikely to differ between expectation condition in the previous studies and hence seems implausible to account for differences in results compared to the current study.

      Moreover, studies cited by the authors (e.g. Todorovic & de Lange, 2012) showed that RS and ES are separable in time, again making me wonder how avoiding stimulus repetition should account for the difference in the present study compared to previous ones. I am happy to be corrected in my understanding, but with the currently provided arguments by the authors I do not see how RS and familiarity can account for the discrepancy in results.

      I agree with the authors that stimulus familiarity is a clear difference compared to previous designs, but without a valid explanation why this should affect results I find this account rather unsatisfying. I see the key difference in that the authors manipulated category predictability, instead of exemplar prediction - i.e. searching for a car instead of your car. However, if results in support of OPT would indeed depend on using novel images (i.e. without stimulus repetition), would this not severely limit the scope of the account and hence also its relevance? Certainly, the account provided by the authors casts the net wider and tries to explain visual prediction. Relatedly, if OPT only applies during training, as the authors seem to argue, would this again not significantly narrow the scope of the theory? Combined these two caveats would seem to demote the account from a general account of prediction and perception to one about perception during very specific circumstances. In my understanding the appeal of OPT is that it accounts for multiple challenges faced by the perceptual system, elegantly integrating them into a cohesive framework. Most of this would be lost by claiming that OPT's primary prediction would only apply to specific circumstances - novel stimuli during learning of predictions. Moreover, in the original formulation of the account, as outlined by Press et al., I do not see any particular reason why it should be limited to these specific circumstances. This does of course not mean that the present results are incorrect, however it does require an adequate discussion and acknowledgement in the manuscript.

      Impact:

      McDermott et al. present an interesting study with potentially impactful results. However, given my concerns raised in this and the previous round of comments, I am not entirely convinced of the reliability of the results. Moreover, the difficulty of reconciling some of the present results with previous studies highlights the need for more convincing explanations of these discrepancies and a stronger discussion of the present results in the context of the literature.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer 1 (Public Review):

      Many thanks for the positive and constructive feedback on the manuscript.

      This study reveals a great deal about how certain neural representations are altered by expectation and learning on shorter and longer timescales, so I am loath to describe certain limitations as 'weaknesses'. But one limitation inherent in this experimental design is that, by focusing on implicit, task-irrelevant predictions, there is not much opportunity to connect the predictive influences seen at the neural level to the perceptual performance itself (e.g., how participants make perceptual decisions about expected or unexpected events, or how these events are detected or appear).

      Thank you for the interesting comment. We now discuss the limitation of task-irrelevant prediction . In brief, some studies which showed sharpening found that task demands were relevant, while some studies which showed dampening were based on task-irrelevant predictions, but it is unlikely that task relevance - which was not manipulated in the current study - would explain the switch between sharpening and dampening that we observe within and across trials.

      The behavioural data that is displayed (from a post-recording behavioural session) shows that these predictions do influence perceptual choice - leading to faster reaction times when expectations are valid. In broad strokes, we may think that such a result is broadly consistent with a 'sharpening' view of perceptual prediction, and the fact that sharpening effects are found in the study to be larger at the end of the task than at the beginning. But it strikes me that the strongest test of the relevance of these (very interesting) EEG findings would be some evidence that the neural effects relate to behavioural influences (e.g., are participants actually more behaviourally sensitive to invalid signals in earlier phases of the experiment, given that this is where the neural effects show the most 'dampening' a.k.a., prediction error advantage?)

      Thank you for the suggestion. We calculated Pearson’s correlation coefficients for behavioural responses (difference in mean reaction times), neural responses during the sharpening effect (difference in decoding accuracy), and neural responses during the dampening effect for each participant, which resulted in null findings.

      Reviewer 2 (Public Review):

      Thank you for your helpful and constructive comments on the manuscript.

      The strength in controlling for repetition effects by introducing a neutral (50% expectation) condition also adds a weakness to the current version of the manuscript, as this neutral condition is not integrated into the behavioral (reaction times) and EEG (ERP and decoding) analyses. This procedure remained unclear to me. The reported results would be strengthened by showing differences between the neutral and expected (valid) conditions on the behavioral and neural levels. This would also provide a more rigorous check that participants had implicitly learned the associations between the picture category pairings.

      Following the reviewer's suggestion, we have included the neutral condition in the behavioural analysis and performed a repeated measures ANOVA on all three conditions.

      It is not entirely clear to me what is actually decoded in the prediction condition and why the authors did not perform decoding over trial bins in prediction decoding as potential differences across time could be hidden by averaging the data. The manuscript would generally benefit from a more detailed description of the analysis rationale and methods.

      In the original version of the manuscript, prediction decoding aimed at testing if the upcoming stimulus category can be decoded from the response to the preceding ( leading) stimulus. However, in response to the other Reviewers’ comments we have decided to remove the prediction decoding analysis from the revised manuscript as it is now apparent that prediction decoding cannot be separated from category decoding based on pixel information.

      Finally, the scope of this study should be limited to expectation suppression in visual perception, as the generalization of these results to other sensory modalities or to the action domain remains open for future research.

      We have clarified the scope of the study in the revised manuscipt .

      Reviewer 3 (Public Review):

      Thank you for the thought-provoking and interesting comments and suggestions.

      (1) The results in Figure 2C seem to show that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, Figure 2E suggests the prediction (surprisingly, valid or invalid) during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Unless I am misinterpreting the analyses, it seems implausible to me that a prediction, but not actually shown image, can be better decoded using EEG than an image that is presented on-screen.

      Following this and the remaining comments by the Reviewer (see below), we have decided to remove the prediction analysis from the manuscript. Specifically, we have focused on the Reviewer’s concern that it is implausible that image prediction would be better decoded that an image that is presented on-screen. This led us to perform a control analysis, in which we tried to decode the leading image category based on pixel values alone (rather than on EEG responses). Since this decoding was above chance, we could not rule out the possibility that EEG responses to leading images reflect physical differences between image categories. This issue does not extend to trailing images, as the results of the decoding analysis based on trailing images are based on accuracy comparisons between valid and invalid trials, and thus image features are counterbalanced. We would like to thank the Reviewer for raising this issue

      (2) The "prediction decoding" analysis is described by the authors as "decoding the predictable trailing images based on the leading images". How this was done is however unclear to me. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there were only 2 possible trailing image categories: 1 valid, 1 invalid). How is it then possible that the analysis is performed separately for valid and invalid trials? If the authors simply decode which leading image category was shown, but combine L1+L2 and L4+L5 into one class respectively, the resulting decoder would in my opinion not decode prediction, but instead dissociate the representation of L1+L2 from L4+L5, which may also explain why the time-course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding predictions (e.g. Kok et al. 2017). Instead for the prediction analysis to be informative about the prediction, the decoder ought to decode the representation of the trailing image during the leading image and inter-stimulus interval. Therefore I am at present not convinced that the utilized analysis approach is informative about predictions.

      In this analysis, we attempted to decode ( from the response to leading images) which trailing categories ought to be presented. The analysis was split between trials where the expected category was indeed presented (valid) vs. those in which it was not (invalid). The separation of valid vs invalid trials in the prediction decoding analysis served as a sanity check as no information about trial validity was yet available to participants. However, as mentioned above, we have decided to remove the “prediction decoding” analysis based on leading images as we cannot disentangle prediction decoding from category decoding.

      (3) I may be misunderstanding the reported statistics or analyses, but it seems unlikely that >10  of the reported contrasts have the exact same statistic of Tmax= 2.76 . Similarly, it seems implausible, based on visual inspection of Figure 2, that the Tmax for the invalid condition decoding (reported as Tmax = 14.903) is substantially larger than for the valid condition decoding (reported as Tmax = 2.76), even though the valid condition appears to have superior peak decoding performance. Combined these details may raise concerns about the reliability of the reported statistics.

      Thank you for bringing this to our attention. This copy error has now been rectified.

      (4) The reported analyses and results do not seem to support the conclusion of early learning resulting in dampening and later stages in sharpening. Specifically, the authors appear to base this conclusion on the absence of a decoding effect in some time-bins, while in my opinion a contrast between time-bins, showing a difference in decoding accuracy, is required. Or better yet, a non-zero slope of decoding accuracy over time should be shown ( not contingent on post-hoc and seemingly arbitrary binning).

      Thank you for the helpful suggestion. We have performed an additional analysis to address this issue, we calculated the trial-by-trial time-series of the decoding accuracy benefit for valid vs. invalid for each participant and averaged this benefit across time points for each of the two significant time windows. Based on this, we fitted a logarithmic model to quantify the change of this benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1% (i.e., accuracy was stabilized). Given the results of this analysis and to ensure a sufficient number of trials, we focussed our further analyses on bins 1-2 to directly assess the effects of learning. This is explained in more detail in the revised manuscript .

      (5) The present results both within and across trials are difficult to reconcile with previous studies using MEG (Kok et al., 2017; Han et al., 2019), single-unit and multi-unit recordings (Kumar et al., 2017; Meyer & Olson 2011), as well as fMRI (Richter et al., 2018), which investigated similar questions but yielded different results; i.e., no reversal within or across trials, as well as dampening effects with after more training. The authors do not provide a convincing explanation as to why their results should differ from previous studies, arguably further compounding doubts about the present results raised by the methods and results concerns noted above.

      The discussion of these findings has been expanded in the revised manuscript . In short, the experimental design of the above studies did not allow for an assessment of these effects prior to learning. Several of them also used repeated stimuli (albeit some studies changed the pairings of stimuli between trials), potentially allowing for RS to confound their results.

      Recommendations for the Authors:

      Reviewer 1 (Recommendations for the authors):

      (1) On a first read, I was initially very confused by the statement on p.7 that each stimulus was only presented once - as I couldn't then work out how expectations were supposed to be learned! It became clear after reading the Methods that expectations are formed at the level of stimulus category (so categories are repeated multiple times even if exemplars are not). I suspect other readers could have a similar confusion, so it would be helpful if the description of the task in the 'Results' section (e.g., around p.7) was more explicit about the way that expectations were generated, and the (very large) stimulus set that examples are being drawn from.

      Following your suggestion, we have clarified the paradigm by adding details about the categories and the manner in which expectations are formed.

      (2) p.23: the authors write that their 1D decoding images were "subjected to statistical inference amounting to a paired t-test between valid and invalid categories". What is meant by 'amounting to' here? Was it a paired t-test or something statistically equivalent? If so, I would just say 'subjected to a paired t-test' to avoid any confusion, or explaining explicitly which statistic inference was done over.

      We have rephrased this as “subjected to (1) a one-sample t-test against chance-level, equivalent to a fixed-effects analysis, and (2) a paired t-test”.

      Relatedly, this description of an analysis amounting to a 'paired t-test' only seems relevant for the sensory decoding and memory decoding analyses (where there are validity effects) rather than the prediction decoding analysis. As far as I can tell the important thing is that the expected image category can be decoded, not that it can be decoded better or worse on valid or invalid trials.

      In the previous version of the manuscript, the comparison of prediction decoding between valid and invalid trials was meant as a sanity check. However, in response to the other Reviewers’ comments we have decided to remove the prediction decoding analysis from the revised manuscript due to confounds.

      It would be helpful if authors could say a bit more about how the statistical inferences were done for the prediction decoding analyses and the 'condition against baseline' contrasts (e.g., when it is stated that decoding accuracy in valid trials *,in general,* is above 0 at some cluster-wise corrected value). My guess is that this amounts to something like a one-sample t-test - but it may be worth noting that one-sample t-tests on information measures like decoding accuracy cannot support population-level inference, because these measures cannot meaningfully be below 0 (see Allefeld et al, 2016).

      When testing for decoding accuracy against baseline, we used one-sample t-tests against chance level (rather than against 0) throughout the manuscript. We now clarify in the manuscript that this corresponds to a fixed-effects analysis (Allefeld et al., 2016). In contrast, when testing for differences in decoding accuracy between valid and invalid conditions, we used paired-sample t-tests. As mentioned above, the prediction decoding analysis has been removed from the analysis.

      (3) By design, the researchers focus on implicit predictive learning which means the expectations being formed are ( by definition) task-irrelevant. I thought it could be interesting if the authors might speculate in the discussion on how they think their results may or may not differ when predictions are deployed in task-relevant scenarios -  particularly given that some studies have found sharpening effects do not seem to depend on task demands ( e.g., Kok et al, 2012 ; Yon et al, 2018)  while other studies have found that some dampening effects do seem to depend on what the observer is attending to ( e.g., Richter et al, 2018) . Do these results hint at a possible explanation for why this might be? Even if the authors think they don't, it might be helpful to say so!

      Thank you for the interesting comment. We have expanded on this in the revised manuscript.

      Reviewer 2  (Recommendations for the authors):

      Methods/results

      (1) The goal of this study is the assessment of expectation effects during statistical learning while controlling for repetition effects, one of the common confounds in prediction suppression studies (see, Feuerriegel et al., 2021). I agree that this is an important aspect and I assume that this was the reason why the authors introduced the P=0.5 neutral condition (Figure 1B, L3). However, I completely missed the analyses of this condition in the manuscript. In the figure caption of Figure 1C, it is stated that the reaction times of the valid, invalid, and neutral conditions are shown, but only data from the valid and invalid conditions are depicted. To ensure that participants had built up expectations and had learned the pairing, one would not only expect a difference between the valid and invalid conditions but also between the valid and neutral conditions. Moreover, it would also be important to integrate the neutral condition in the multivariate EEG analysis to actually control for repetition effects. Instead, the authors constructed another control condition based on the arbitrary pairings. But why was the neutral condition not compared to the valid and invalid prediction decoding results? Besides this, I also suggest calculating the ERP for the neutral condition and adding it to Figure 2A to provide a more complete picture.

      As mentioned above, we have included the neutral condition in the behavioural analysis, as outlined in the revised manuscript. We have also included a repeated measures ANOVA on all 3 conditions. The purpose of the neutral condition was not to avoid RS, but rather to provide a control condition. We avoided repetition by using individual, categorised stimuli. Figure 1C has been amended to include the neutral condition). In response to the remaining comments, we have decided to remove the prediction decoding analysis from the manuscript.

      (2) One of the main results that is taken as evidence for the OPT is that there is higher decoding accuracy for valid trials (indicate sharpening) early in the trial and higher decoding accuracy for invalid trials (indicate dampening) later in the trial. I would have expected this result for prediction decoding that surprisingly showed none of the two effects. Instead, the result pattern occurred in sensory decoding only, and partly (early sharpening) in memory decoding. How do the authors explain these results? Additionally, I would have expected similar results in the ERP; however, only the early effect was observed. I missed a more thorough discussion of this rather complex result pattern. The lack of the opposing effect in prediction decoding limits the overall conclusion that needs to be revised accordingly.

      Since sharpening vs. dampening rests on the comparison between valid and invalid trials, evidence for sharpening vs. dampening could only be obtained from decoding based on responses to trailing images. In prediction decoding (removed from the current version), information about the validity of the trial is not yet available. Thus, our original plan was to compare this analysis with the effects of validity on the decoding of trailing images (i.e. we expected valid trials to be decoded more accurately after the trailing image than before). The results of the memory decoding did mirror the sensory decoding of the trailing image in that we found significantly higher decoding accuracy of the valid trials from 123-180 ms. As with the sensory decoding, there was a tendency towards a later flip (280-296 ms) where decoding accuracy of invalid trials became nominally higher, but this effect did not reach statistical significance in the memory decoding.

      (3) To increase the comprehensibility of the result pattern, it would be helpful for the reader to clearly state the hypotheses for the ERP and multivariate EEG analyses. What did you expect for the separate decoding analyses? How should the results of different decoding analyses differ and why? Which result pattern would (partly, or not) support the OPT?

      Our hypotheses are now stated in the revised manuscript.

      (4) I was wondering why the authors did not test for changes during learning for prediction decoding. Despite the fact that there were no significant differences between valid and invalid conditions within-trial, differences could still emerge when the data set is separated into bins. Please test and report the results.

      As mentioned above, we have decided to remove the prediction decoding analysis from the current version of the manuscript.

      (5) To assess the effect of learning the authors write: 'Given the apparent consistency of bins 2-4, we focused our analyses on bins 1-2.' Please explain what you mean by 'apparent consistency'. Did you test for consistency or is it based on descriptive results? Why do the authors not provide the complete picture and perform the analyses for all bins? This would allow for a better assessment of changes over time between valid and invalid conditions. In Figure 3, were valid and invalid trials different in any of the QT3 or QT4 bins in sensory or memory encoding?

      We have performed an additional analysis to address this issue. The reasoning behind the decision to focus on bins 1-2 is now explained in the revised manuscript. In short, fitting a learning curve to trial-by-trial decoding estimates indicates that decoding stabilizes within <50% of the trials. To quantify changes in decoding occurring within these <50% of the trials while ensuring a sufficient number of trials for statistical comparisons, we decided to focus on bins 1-2 only.

      (6) Please provide the effect size for all statistical tests.

      Effect sizes have now been provided.

      (7) Please provide exact p-values for non-significant results and significant results larger than 0.001.

      Exact p-values have now been provided.

      (8) Decoding analyses: I suppose there is a copy/paste error in the T-values as nearly all T-values on pages 11 and 12 are identical (2.76) leading to highly significant p-values (0.001) as well as non-significant effects (>0.05). Please check.

      Thank you for bringing this to our attention. This error has now been corrected.

      (9) Page 12:  There were some misleading phrases in the result section. To give one example: 'control analyses was slightly above change' - this sounds like a close to non-significant effect, but it was indeed a highly significant effect of p<0.001. Please revise.

      This phrase was part of the prediction decoding analysis and has therefore been removed.

      (10) Sample size: How was the sample size of the study be determined (N=31)? Why did only a subgroup of participants perform the behavioral categorization task after the EEG recording? With a larger sample, it would have been interesting to test if participants who showed better learning (larger difference in reaction times between valid and invalid conditions) also showed higher decoding accuracies.

      This has been clarified in the revised manuscript. In short, the larger sample size of N=31 was based on previous research; ten participants were initially tested as part of a pilot which was then expanded to include the categorisation task.

      (11) I assume catch trials were removed before data analyses?

      We have clarified that catch trials were indeed removed prior to analyses.

      (12) Page 23, 1st line: 'In each, the decoder...' Something is missing here.

      Thank you for bringing this to our attention, this sentence has now been rephrased as “In both valid and invalid analyses” in the revised manuscript.

      Discussion

      (1) The analysis over multiple trials showed dampening within the first 15 min followed by sharpening. I found the discussion of this finding very lengthy and speculative (page 17). I recommend shortening this part and providing only the main arguments that could stimulate future research.

      Thank you for the suggestion. Since Reviewer 3 has requested additional details in this part of the discussion, we have opted to keep this paragraph in the manuscript. However, we have also made it clearer that this section is relatively speculative and the arguments provided for the across trials dynamics are meant to stimulate further research.

      (2) As this task is purely perceptual, the results support the OPT for the area of visual perception. For action, different results have been reported. Suppression within-trial has been shown to be larger for expected than unexpected features of action targets and suppression even starts before the start of the movement without showing any evidence for sharpening ( e.g., Fuehrer et al., 2022, PNAS). For suppression across trials, it has been found that suppression decreases over the course of learning to associate a sensory consequence to a specific action (e.g., Kilteni et al., 2019, ELife). Therefore, expectation suppression might function differently in perception and action (an area that still requires further research). Please clarify the scope of your study and results on perceptual expectations in the introduction, discussion, and abstract.

      We have clarified the scope of the study in the revised manuscript.

      Figures

      (1) Figure 1A: Add 't' to the arrow to indicate time.

      This has been rectified.

      (2) Figure 3:  In the figure caption, sensory and memory decoding seem to be mixed up. Please correct. Please add what the dashed horizontal line indicates.

      Thank you for bringing this to our attention, this has been rectified.

      Reviewer 3  (Recommendations for the authors):

      I applaud the authors for a well-written introduction and an excellent summary of a complicated topic, giving fair treatment to the different accounts proposed in the literature. However, I believe a few additional studies should be cited in the Introduction, particularly time-resolved studies such as Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. This would provide the reader with a broader picture of the current state of the literature, as well as point the reader to critical time-resolved studies that did not find evidence in support of OPT, which are important to consider in the interpretation of the present results.

      The introduction has been expanded to include the aforementioned studies in the revised manuscript.

      Given previous neuroimaging studies investigating the present phenomenon, including with time-resolved measures (e.g. Kok et al., 2017; Han et al., 2019; Kumar et al., 2017; Meyer & Olson 2011), why do the authors think that their data, design, or analysis allowed them to find support for OPT but not previous studies? I do not see obvious modifications to the paradigm, data quantity or quality, or the analyses that would suggest a superior ability to test OPT predictions compared to previous studies. Given concerns regarding the data analyses (see points below), I think it is essential to convincingly answer this question to convince the reader to trust the present results.

      The most obvious alteration to the paradigm is the use of non-repeated stimuli. Each of the above time-resolved studies utilised repeated stimuli (either repeated, identical stimuli, or paired stimuli where pairings are changed but the pool of stimuli remains the same), allowing for RS to act as a confound as exemplars are still presented multiple times. By removing this confound, it is entirely plausible that we may find different time-resolved results given that it has been shown that RS and ES are separable in time (Todorovic & de Lange, 2012). We also test during learning rather than training participants on the task beforehand. By foregoing a training session, we are better equipped to assess OPT predictions as they emerge. In our across-trial results, learning appears to take place after approximately 15 minutes or 432 trials, at which point dampening reverses to sharpening. Had we trained the participants prior to testing, this effect would have been lost.

      What is actually decoded in the "prediction decoding" analysis? The authors state that it is "decoding the predictable trailing images based on the leading images" (p.11). The associated chance level (Figure 2E) is indicated as 50%. This suggests that the classes separated by the SVM are T6 vs T7. How this was done is however unclear. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there are only 2 possible trailing images, where one is the valid and the other the invalid image). How is it then possible that the analysis is performed separately for valid and invalid trials? Are the authors simply decoding which leading image was shown, but combine L1+L2 and L4+L5 into one class respectively? If so, this needs to be better explained in the manuscript. Moreover, the resulting decoder would in my opinion not decode the predicted image, but instead learn to dissociate the representation of L1+L2 from L4+L5, which may also explain why the time course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding (prestimulus) predictions (e.g. Kok et al. 2017). If this is indeed the case, I find it doubtful that this analysis relates to prediction. Instead for the prediction analysis to be informative about the predicted image the authors should, in my opinion, train the decoder on the representation of trailing images and test it during the prestimulus interval.

      As mentioned above, the prediction decoding analysis has been removed from the manuscript. The prediction decoding analysis was intended as a sanity check, as validity information was not yet available to participants.

      Related to the point above, were the leading/trailing image categories and their mapping to L1, L2, etc. in Figure 1B fixed across subjects? I.e. "'beach' and 'barn' as 'Leading' categories would result in 'church' as a 'Trailing' category with 75% validity" (p.20) for all participants? If so, this poses additional problems for the interpretation of the analysis discussed in the point above, as it may invalidate the control analyses depicted in Figure 2E, as systematic differences and similarities in the leading image categories could account for the observed results.

      Image categories and their mapping were indeed fixed across participants. While this may result in physical differences and similarities between images influencing results, counterbalancing categories across participants would not have addressed this issue. For example, had we swapped “beach” with “barn” in another participant, physical differences between images may still be reflected in the prediction decoding. On the other hand, counterbalancing categories across trials was not possible given our aim of examining the initial stages of learning over trials. Had we changed the mappings of categories throughout the experiment for each participant, we would have introduced reversal learning and nullified our ability to examine the initial stages of learning under flat priors. In any case, the prediction decoding analysis has been removed from the manuscript, as outlined above.

      Why was the neutral condition L3 not used for prediction decoding? After all, if during prediction decoding both the valid and invalid image can be decoded, as suggested by the authors, we would also expect significant decoding of T8/T9 during the L3 presentation.

      In the neutral condition, L3 was followed by T8 vs. T9 with 50% probability, precluding prediction decoding. While this could have served as an additional control analysis for EEG-based decoding, we have opted for removing prediction decoding from the analysis. However, in response to the other Reviewers’ comments, the neutral condition has now been included in the behavioral analysis.

      The following concern may arise due to a misunderstanding of the analyses, but I found the results in Figures 2C and 2E concerning. If my interpretation is correct, then these results suggest that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, the predicted (valid or invalid) image during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Does this seem reasonable? Unless I am misinterpreting the analyses, it seems implausible to me that a prediction but not actually shown image can be better decoded than an on-screen image. Moreover, to my knowledge studies reporting decoding of predictions can (1) decode expectations just above chance level (e.g. Kok et al., 2017; which is expected given the nature of what is decoded) and (2) report these prestimulus effects shortly before the anticipated stimulus onset, and not coinciding with the leading image onset ~800ms before the predicted stimulus onset. For the above reasons, the key results reported in the present manuscript seem implausible to me and may suggest the possibility of problems in the training or interpretation of the decoding analysis. If I misunderstood the analyses, the analysis text needs to be refined. If I understood the analyses correctly, at the very least the authors would need to provide strong support and arguments to convince the reader that the effects are reliable (ruling out bias and explaining why predictions can be decoded better than on-screen stimuli) and sensible (in the context of previous studies showing different time-courses and results).

      As explained above, we have addressed this concern by performing an additional analysis, implementing decoding based on image pixel values. Indeed we could not rule out the possibility that “prediction” decoding reflected stimulus differences between leading images.

      Relatedly, the authors use the prestimulus interval (-200 ms to 0 ms before predicted stimulus onset) as the baseline period. Given that this period coincides with prestimulus expectation effects ( Kok et al., 2017) , would this not result in a bias during trailing image decoding? In other words, the baseline period would contain an anticipatory representation of the expected stimulus ( Kok et al., 2017) , which is then subtracted from the subsequent EEG signal, thereby allowing the decoder to pick up on this "negative representation" of the expected image. It seems to me that a cleaner contrast would be to use the 200ms before leading image onset as the baseline.

      The analysis of trailing images aimed at testing specific hypotheses related to differences between decoding accuracy in valid vs. invalid trials. Since the baseline was by definition the same for both kinds of trials (since information about validity only appears at the onset of the trailing image), changing the baseline would not affect the results of the analysis. Valid and invalid trials would have the same prestimulus effect induced by the leading image.

      Again, maybe I misunderstood the analyses, but what exactly are the statistics reported on p. 11 onward? Why is the reported Tmax identical for multiple conditions, including the difference between conditions? Without further information this seems highly unlikely, further casting doubts on the rigor of the applied methods/analyses. For example: "In the sensory decoding analysis based on leading images, decoding accuracy was above chance for both valid (Tmax= 2.76, pFWE < 0.001) and invalid trials (Tmax= 2.76, pFWE < 0.001) from 100 ms, with no significant difference between them (Tmax= 2.76, pFWE > 0.05) (Fig. 2C)" (p.11).

      Thank you for bringing this to our attention. As previously mentioned, this copy error has been rectified in the revised manuscript.

      Relatedly, the statistics reported below in the same paragraph also seem unusual. Specifically, the Tmax difference between valid and invalid conditions seems unexpectedly large given visual inspection of the associated figure: "The decoding accuracy of both valid (Tmax = 2.76, pFWE < 0.001) and invalid trials (Tmax = 14.903, pFWE < 0.001)" (p.12). In fact, visual inspection suggests that the largest difference should probably be observed for the valid not invalid trials (i.e. larger Tmax).

      This copy error has also been rectified in the revised manuscript.

      Moreover, multiple subsequent sections of the Results continue to report the exact same Tmax value. I will not list all appearances of "Tmax = 2.76" here but would recommend the authors carefully check the reported statistics and analysis code, as it seems highly unlikely that >10 contrasts have exactly the same Tmax. Alternatively, if I misunderstand the applied methods, it would be essential to better explain the utilized method to avoid similar confusion in prospective readers.

      This error has also now been rectified. As mentioned above the prediction decoding analysis has been removed.

      I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease ( or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible.

      Thank you for the helpful suggestion. As previously mentioned we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1 %. Given the results of this analysis and to ensure a sufficient number of trials, we focussed our further analyses on bins 1-2 . This is explained in more detail in the revised manuscript.

      Relatedly, based on the literature there is no reason to assume that the dampening effect disappears with more training, thereby placing more burden of proof on the present results. Indeed, key studies supporting the dampening account (including human fMRI and MEG studies, as well as electrophysiology in non-human primates) usually seem to entail more learning than has occurred in bin 2 of the present study. How do the authors reconcile the observation that more training in previous studies results in significant dampening, while here the dampening effect is claimed to disappear with less training?

      The discussion of these findings has been expanded on in the revised manuscript. As previously outlined, many of the studies supporting dampening did not explicitly test the effect of learning as they emerge, nor did they control for RS to the same extent.

      The Methods section is quite bare bones. This makes an exact replication difficult or even impossible. For example, the sections elaborating on the GLM and cluster-based FWE correction do not specify enough detail to replicate the procedure. Similarly, how exactly the time points for significant decoding effects were determined is unclear (e.g., p. 11). Relatedly, the explanation of the decoding analysis, e.g. the choice to perform PCA before decoding, is not well explained in the present iteration of the manuscript. Additionally, it is not mentioned how many PCs the applied threshold on average resulted in.

      Thank you for this suggestion, we have described our methods in more detail.

      To me, it is unclear whether the PCA step, which to my knowledge is not the default procedure for most decoding analyses using EEG, is essential to obtain the present results. While PCA is certainly not unusual, to my knowledge decoding of EEG data is frequently performed on the sensor level as SVMs are usually capable of dealing with the (relatively low) dimensionality of EEG data. In isolation this decision may not be too concerning, however, in combination with other doubts concerning the methods and results, I would suggest the authors replicate their analyses using a conventional decoding approach on the sensory level as well.

      Thank you for this suggestion, we have explained our decision to use PCA in the revised manuscript.

      Several choices, like the binning and the focus on bins 1-2 seem rather post-hoc. Consequently, frequentist statistics may strictly speaking not be appropriate. This further compounds above mentioned concerns regarding the reliability of the results.

      The reasoning behind our decision to focus on bins 1-2 is now explained in more detail in the revised manuscript.

      A notable difference in the present study, compared to most studies cited in the introduction motivating the present experiment, is that categories instead of exemplars were predicted.

      This seems like an important distinction to me, which surprisingly goes unaddressed in the Discussion section. This difference might be important, given that exemplar expectations allow for predictions across various feature levels (i.e., even at the pixel level), while category predictions only allow for rough (categorical) predictions.

      The decision to use categorical predictions over exemplars lies in the issue of RS, as it is impossible to control for RS while repeating stimuli over many trials. This has been discussed in more detail in the revised manuscript.

      While individually minor problems, I noticed multiple issues across several figures or associated figure texts. For example: Figure 1C only shows valid and invalid trials, but the figure text mentions the neutral condition. Why is the neutral condition not depicted but mentioned here? Additionally, the figure text lacks critical information, e.g. what the asterisk represents. The error shading in Figure 2 would benefit from transparency settings to not completely obscure the other time-courses. Increasing the figure content and font size within the figure (e.g. axis labels) would also help with legibility (e.g. consider compressing the time-course but therefore increasing the overall size of the figure). I would also recommend using more common methods to indicate statistical significance, such as a bar at the bottom of the time-course figure typically used for cluster permutation results instead of a box. Why is there no error shading in Figure 2A but all other panels? Fig 2C-F has the y-axis label "Decoding accuracy (%)" but certainly the y-axis, ranging roughly from 0.2 to 0.7, is not in %. The Figure 3 figure text gives no indication of what the error bars represent, making it impossible to interpret the depicted data. In general, I would recommend that the authors carefully revisit the figures and figure text to improve the quality and complete the information.

      Thank you for the suggestions. Figure 1C now includes the neutral condition. Asterisks denote significant results. The font size in Figure 2C-E has been increased. The y-axis on Figure 2C-E has been amended to accurately reflect decoding accuracy in percentage. Figure 2A has error shading, however, the error is sufficiently small that the error shading is difficult to see. The error bars in Figure 3 have been clarified.

      Given the choice of journal (eLife), which aims to support open science, I was surprised to find no indication of (planned) data or code sharing in the manuscript.

      Plans for sharing code/data are now outlined in the revised manuscript.

      While it is explained in sufficient detail later in the Methods section, it was not entirely clear to me, based on the method summary at the beginning of the Results section, whether categories or individual exemplars were predicted. The manuscript may benefit from clarifying this at the start of the Results section.

      Thank you for this suggestion, following this and suggestions from other reviewers, the experimental paradigm and the mappings between categories has been further explained in the revised manuscript, to make it clearer that predictions are made at the categorical level.

      "Unexpected trials resulted in a significantly increased neural response 150 ms after image onset" (p.9). I assume the authors mean the more pronounced negative deflection here. Interpreting this, especially within the Results section as "increased neural response" without additional justification may stretch the inferences we can make from ERP data; i.e. to my knowledge more pronounced ERPs could also reflect increased synchrony. That said, I do agree with the authors that it is likely to reflect increased sensory responses, it would just be useful to be more cautious in the inference.

      Thank you for the interesting comment, this has been rephrased as a “more pronounced negative deflection” in the revised manuscript.

      Why was the ERP analysis focused exclusively on Oz? Why not a cluster around Oz? For object images, we may expect a rather wide dipole.

      Feuerriegel et al (2021) have outlined issues questioning the robustness of univariate analyses for ES, as such we opted for a targeted ROI approach on the channel showing peak amplitude of the visually evoked response (Fig. 2B). More details on this are in the revised manuscript.           

      How exactly did the authors perform FWE? The description in the Method section does not appear to provide sufficient detail to replicate the procedure.

      FWE as implemented in SPM is a cluster-based method of correcting for multiple comparisons using random field theory. We have explained our thresholding methods in more detail in the revised manuscript.

      If I misunderstand the authors and they did indeed perform standard cluster permutation analyses, then I believe the results of the timing of significant clusters cannot be so readily interpreted as done here (e.g. p.11-12); see: Maris & Oostenveld 2007; Sassenhagen & Dejan 2019.

      All statistics were based on FWE under random field theory assumptions (as implemented in SPM) rather than on cluster permutation tests (as implemented in e.g.  Fieldtrip)

      Why did the authors choose not to perform spatiotemporal cluster permutation for the ERP results?

      As mentioned above, we opted to target our ERP analyses on Oz due to controversies in the literature regarding univariate effects of ES (Feuerriegel et al., 2021).

      Some results, e.g. on p.12 are reported as T29 instead of Tmax. Why?

      As mentioned above, prediction decoding analyses have been removed from the manuscript.

    1. eLife Assessment

      This valuable manuscript addresses the longstanding question of how the brain maintains serial order in working memory, proposing a biologically grounded model based on synaptic augmentation mechanisms that operates on longer time scales than facilitation. The authors show that augmentation provides a mechanism by which this order can be maintained in memory thanks to a temporal gradient of synaptic efficacies. Although the evidence remains incomplete at present, it can be made stronger by demonstrating robustness to network heterogeneity, spiking, and threshold values for encoding the working memory.

    2. Reviewer #1 (Public review):

      Summary:

      The issue of how the brain can maintain the serial order of presented items in working memory is a major unsolved question in cognitive neuroscience. It has been proposed that this serial order maintenance could be achieved thanks to periodic reactivations of different presented items at different phases of an oscillation, but the mechanisms by which this could be achieved by brain networks, as well as the mechanisms of read-out, are still unclear. In an influential 2008 paper, the authors have proposed a mechanism by which a recurrent network of neurons could maintain multiple items in working memory, thanks to `population spikes' of populations of neurons encoding for the different items, occurring at alternating times. These population spikes occur in a specific regime of the network and are a result of synaptic facilitation, an experimentally observed type of synaptic short-term dynamics with time scales of order hundreds of ms.

      In the present manuscript, the authors extend their model to include another type of experimentally observed short-term synaptic plasticity termed synaptic augmentation, which operates on longer time scales on the order of 10s. They show that while a network without augmentation loses information about serial order, augmentation provides a mechanism by which this order can be maintained in memory thanks to a temporal gradient of synaptic efficacies. The order can then be read out using a read-out network whose synapses are also endowed with synaptic augmentation. Interestingly, the read-out speed can be regulated using background inputs.

      Strengths:

      This is an elegant solution to the problem of serial order maintenance that only relies on experimentally observed features of synapses. The model is consistent with a number of experimental observations in humans and monkeys. The paper will be of interest to a broad readership, and I believe it will have a strong impact on the field.

      Weaknesses:

      (1) The network they propose is extremely simple. This simplicity has pros and cons: on the one hand, it is nice to see the basic phenomenon exposed in the simplest possible setting. On the other hand, it would also be reassuring to check that the mechanism is robust when implemented in a more realistic setting, using, for instance, a network of spiking neurons similar to the one they used in the 2008 paper. The more noisy and heterogeneous the setting, the better.

      (2) One major issue with the population spike scenario is that (to my knowledge) there is no evidence that these highly synchronized events occur in delay periods of working memory experiments. It seems that highly synchronized population spikes would imply (a) a strong regularity of spike trains of neurons, at odds with what is typically observed in vivo (b) high synchronization of neurons encoding for the same item (and also of different items in situations where multiple items have to be held in working memory), also at odds with in vivo recordings that typically indicate weak synchronization at best. It would be nice if the authors at least mention this issue, and speculate on what could possibly bridge the gap between their highly regular and synchronized network, and brain networks that seem to lie at the opposite extreme (highly irregular and weakly synchronized). Of course, if they can demonstrate using a spiking network simulation that they can bridge the gap, even better.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors present a model to explain how working memory (WM) encodes both existence and timing simultaneously using transient synaptic augmentation. A simple yet intriguing idea.

      The model presented here has the potential to explain what previous theories like 'active maintenance via attractors' and 'liquid state machine' do not, and describe how novel sequences are immediately stored in WM. Altogether, the topic is of great interest to those studying higher cognitive processes, and the conclusions the authors draw are certainly thought-provoking from an experimental perspective. However, several questions remain that need to be addressed.

      The study relates to the well-known computational theory for working memory, which suggests short-term synaptic facilitation is required to maintain working memory, but doesn't rely on persistent spiking. This previous theory appears similar to the proposed theory, except for the change from facilitation to augmentation. A more detailed explanation of why the authors use augmentation instead of facilitation in this paper is warranted: is the facilitation too short to explain the whole process of WM? Can the theory with synaptic facilitation also explain the immediate storage of novel sequences in WM?

      In Figure 1, the authors mention that synaptic augmentation leads to an increased firing rate even after stimulus presentation. It would be good to determine, perhaps, what the lowest threshold is to see the encoding of a WM task, and whether that is biologically plausible.

      In the middle panel of Figure 4, after 15-16 sec, when the neuronal population prioritizes with the second retro-cue, although the second retro-cue item's synaptic spike dominates, why is the augmentation for the first retro-cue item higher than the second-cue augmentation until the 20 sec?

    4. Author response:

      Reviewer #1 (Public Review):

      (1) The network they propose is extremely simple. This simplicity has pros and cons: on the one hand, it is nice to see the basic phenomenon exposed in the simplest possible setting. On the other hand, it would also be reassuring to check that the mechanism is robust when implemented in a more realistic setting, using, for instance, a network of spiking neurons similar to the one they used in the 2008 paper. The more noisy and heterogeneous the setting, the better.

      The choice of a minimal model to illustrate our hypothesis is deliberate. Our main goal was to suggest a physiologically-grounded mechanism to rapidly encode temporally-structured information (i.e., sequences of stimuli) in Working Memory, where none was available before. Indeed, as discussed in the manuscript, previous proposals were unsatisfactory in several respects. In view of our main goal, we believe that a spiking implementation is beyond the scope of the present work.

      We would like to note that the mechanism originally proposed in Mongillo et al. (2008), has been repeatedly implemented, by many different groups, in various spiking network models with different levels of biological realism (see, e.g., Lundquivst et al. (2016), for an especially ‘detailed’ implementation) and, in all cases, the relevant dynamics has been observed. We take this as an indication of ‘robustness’; the relevant network dynamics doesn’t critically depend on many implementation details and, importantly, this dynamics is qualitatively captured by a simple rate model (see, e.g., Mi et al. (2017)).

      In the present work, we make a relatively ‘minor’ (from a dynamical point of view) extension of the original model, i.e., we just add augmentation. Accordingly, we are fairly confident that a set of parameters for the augmentation dynamics can be found such that the spiking network behaves, qualitatively, as the rate model. A meaningful study, in our opinion, then would require extensively testing the (large) parameters’ space (different models of augmentation?) to see how the network behavior compares with the relevant experimental observations (which ones? behavioral? physiological?). As said above, we believe that this is beyond the scope of the present work.       

      This being said, we definitely agree with the reviewer that not presenting a spiking implementation is a limitation of the present work. We will clearly acknowledge, and discuss, this limitation in the revised version.

      (2) One major issue with the population spike scenario is that (to my knowledge) there is no evidence that these highly synchronized events occur in delay periods of working memory experiments. It seems that highly synchronized population spikes would imply (a) a strong regularity of spike trains of neurons, at odds with what is typically observed in vivo (b) high synchronization of neurons encoding for the same item (and also of different items in situations where multiple items have to be held in working memory), also at odds with in vivo recordings that typically indicate weak synchronization at best. It would be nice if the authors at least mention this issue, and speculate on what could possibly bridge the gap between their highly regular and synchronized network, and brain networks that seem to lie at the opposite extreme (highly irregular and weakly synchronized). Of course, if they can demonstrate using a spiking network simulation that they can bridge the gap, even better.

      Direct experimental evidence (in monkeys) in support of the existence of highly synchronized events -- to be identified with the ‘population spikes’ of our model -- during the delay period of a memory task is available in the literature and we have cited it, i.e., Panichello et al. (2024). In the revised version, we will provide an explicit discussion of the results of Panichello et al. (2024) and how these results directly relate to our model. After submission, we became aware of another experimental study (in humans) specifically dealing with sequence memory, i.e., Liebe et al. (2025). Their results, again, are fully consistent with our model. We will also provide an explicit discussion of these results in the revised version.

      We note that there is no fundamental contradiction between highly synchronized events in ‘small’ neural populations (e.g., a cell assembly) on one hand, and temporally irregular (i.e., Poisson-like) spiking at the single-neuron level and weakly synchronized activity at the network level, on the other hand. This was already illustrated in our original publication, i.e., Mongillo et al. (2008) (see, in particular, Fig. S2).

      We further note that the mechanism we propose to encode temporal order -- a temporal gradient in the synaptic efficacies brought about by synaptic augmentation -- would also work if the memory of the items is maintained by ‘tonic’ persistent activity (i.e., without highly synchronized events), provided this activity occurs at suitably low rates such as to prevent the saturation of the synaptic augmentation.

      We will include a detailed discussion of these points in the revised version.

      Reviewer #2 (Public Review):

      The study relates to the well-known computational theory for working memory, which suggests short-term synaptic facilitation is required to maintain working memory, but doesn't rely on persistent spiking. This previous theory appears similar to the proposed theory, except for the change from facilitation to augmentation. A more detailed explanation of why the authors use augmentation instead of facilitation in this paper is warranted: is the facilitation too short to explain the whole process of WM? Can the theory with synaptic facilitation also explain the immediate storage of novel sequences in WM?

      In the model, synaptic dynamics displays both short-term facilitation and augmentation (and shortterm depression). Indeed, synaptic facilitation, alone, would be too short-lived to encode novel sequences. This is illustrated in Fig. 1B. We will provide a more detailed discussion of this point in the revised version. 

      In Figure 1, the authors mention that synaptic augmentation leads to an increased firing rate even after stimulus presentation. It would be good to determine, perhaps, what the lowest threshold is to see the encoding of a WM task, and whether that is biologically plausible.

      We believe that this comment is related to the above point. The reviewer is correct; augmentation alone would require fairly long stimulus presentations to encode an item in WM. ‘Fast’ encoding, indeed, is guaranteed by the presence of short-term facilitation. We will emphasize this important point in the revised version.

      In the middle panel of Figure 4, after 15-16 sec, when the neuronal population prioritizes with the second retro-cue, although the second retro-cue item's synaptic spike dominates, why is the augmentation for the first retro-cue item higher than the second-cue augmentation until the 20 sec?

      This is because of the slow build-up and slow decay of the augmentation. When the second item is prioritized, and the corresponding neuronal population re-activates, its augmentation level starts to increase. At the same time, as the first item is now de-prioritized and the corresponding neuronal population is now silent, its augmentation level starts to decrease. Because of the ‘slowness’ of both processes (i.e., augmentation build-up and decay), it takes about 5 seconds for the augmentation level of the second item to overcome the augmentation level of the first item.

      We note that the slow time scales of the augmentation dynamics, consistently with experimental observations, are necessary for our mechanism to work.

    1. eLife Assessment

      This important paper takes a novel approach to the problem of automatically reconstructing long-range axonal projections from stacks of images. The key innovation is to separate the identification of sections of an axon from the statistical rules used to constrain global structure. The authors provide compelling evidence that their method is a significant improvement over existing measures in circumstances where the labelling of axons and dendrites is relatively dense.

    2. Reviewer #1 (Public review):

      Summary:

      The authors introduce a novel algorithm for the automatic identification of long-range axonal projections. This is an important problem as modern high-throughput imaging techniques can produce large amounts of raw data, but identifying neuronal morphologies and connectivities requires large amounts of manual work. The algorithm works by first identifying points in three-dimensional space corresponding to parts of labelled neural projections, these are then used to identify short sections of axon using an optimisation algorithm and the prior knowledge that axonal diameters are relatively constant. Finally, a statistical model that assumes axons tend to be smooth is used to connect the sections together into complete and distinct neural trees. The authors demonstrate that their algorithm is far superior to existing techniques, especially when a dense labelling of the tissue means that neighbouring neurites interfere with the reconstruction. Despite this improvement, however, the accuracy of reconstruction remains below 90%, so manual proof-reading is still necessary to produce accurate reconstructions of axons.

      Strengths:

      The new algorithm combines local and global information to make a significant improvement on the state-of -the-art for automatic axonal reconstruction. The method could be applied more broadly and might have applications to reconstructions of electron microscopy data, where similar issues of high-throughput imaging and relatively slow or inaccurate reconstruction remain.

      Weaknesses:

      There are three weaknesses with the algorithm and manuscript.

      (1) The best reconstruction accuracy is below 90%, which does not fully solve the problem of needing manual proof-reading.

      (2) The 'minimum information flow tree' model the authors use to construct connected axonal trees has the potential to bias data collection. In particular, the assumption that axons should always be as smooth as possible is not always correct. This is a good rule-of-thumb for reconstructions, but real axons in many systems can take quite sharp turns and this is also seen in the data presented in the paper (Fig 1C). I would like to see explicit acknowledgement of this bias in the current manuscript and ideally a relaxation of this rule in any later versions of the algorithm.

      (3) The writing of the manuscript is not always as clear as it could be. The manuscript would benefit from careful copy editing for language, and the Methods section in particular should be expanded to more clearly explain what each algorithm is doing. The pseudo code of the Supplemental Information could be brought into the Methods if possible as these algorithms are so fundamental to the manuscript.

      Comments on revisions: I have no further comments or recommendations.

    3. Reviewer #2 (Public review):

      The authors have addressed my comments in this revised version of their manuscript. PointTree is an improved method for the reconstruction of neuronal anatomy that will be useful for neuroscientists.

      In this manuscript, Cai et al. introduce PointTree, a new automated method for the reconstruction of complex neuronal projections. This method has the potential to drastically speed up the process of reconstructing complex neurites. The authors use semi-automated manual reconstruction of neurons and neurites to provide a 'ground-truth' for comparison between PointTree and other automated reconstruction methods. The reconstruction performance is evaluated for precision, recall and F1-score and positions. The performance of PointTree compared to other automated reconstruction methods is impressive based on these 3 criteria.

      As an experimentalist, I will not comment on the computational aspects of the manuscript. Rather, I am interested in how PointTree's performance decrease in noisy samples. This is because many imaging datasets contain some level of background noise for which the human eye appears essential for accurate reconstruction of neurites. Although the samples presented in Figure 5 represent an inherent challenge for any reconstruction method, the signal to noise ratio is extremely high (also the case in all raw data images in the paper). It would be interesting to see how PointTree's performance change in increasingly noisy samples, and for the author to provide general guidance to the scientific community as to what samples might not be accurately reconstructed with PointTree.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce a novel algorithm for the automatic identification of longrange axonal projections. This is an important problem as modern high-throughput imaging techniques can produce large amounts of raw data, but identifying neuronal morphologies and connectivities requires large amounts of manual work. The algorithm works by first identifying points in three-dimensional space corresponding to parts of labelled neural projections, these are then used to identify short sections of axons using an optimisation algorithm and the prior knowledge that axonal diameters are relatively constant. Finally, a statistical model that assumes axons tend to be smooth is used to connect the sections together into complete and distinct neural trees. The authors demonstrate that their algorithm is far superior to existing techniques, especially when dense labelling of the tissue means that neighbouring neurites interfere with the reconstruction. Despite this improvement, however, the accuracy of reconstruction remains below 90%, so manual proofreading is still necessary to produce accurate reconstructions of axons.

      Strengths:

      The new algorithm combines local and global information to make a significant improvement on the state-of-the-art for automatic axonal reconstruction. The method could be applied more broadly and might have applications to reconstructions of electron microscopy data, where similar issues of highthroughput imaging and relatively slow or inaccurate reconstruction remain.

      We thank the reviewer for their positive comments and for taking the time to review our manuscript. We are truly grateful that the reviewer recognized the value of our method in automatically reconstructing long-range axonal projections. While we report that our method achieves reconstruction accuracy of approximately 85%, we fully acknowledge that manual proofreading is still necessary to ensure accuracy greater than 95%. We also appreciate the reviewer’s insightful suggestion regarding the potential adaptation of our algorithm for reconstructing electron microscopy (EM) data, where similar challenges in high-throughput imaging and relatively slow or inaccurate reconstruction persist. We look forward to exploring ways to integrate our method with EM data in future work.

      Weaknesses:

      There are three weaknesses in the algorithm and manuscript.

      (1) The best reconstruction accuracy is below 90%, which does not fully solve the problem of needing manual proofreading.

      We sincerely appreciate the reviewer's valuable insights regarding reconstruction accuracy. Indeed, as illustrated in Figure S4, our current best automated reconstruction accuracy on fMOST data is still below 90%. This indicates that manual proofreading remains essential to ensure reliability.

      For the reconstruction of long-range axonal projections, ensuring the accuracy of the reconstruction process necessitates manual revision of the automatically generated results. Existing literature has demonstrated that a higher accuracy in automatic reconstruction correlates with a reduced need for manual revisions, thereby facilitating an accelerated reconstruction process (Winnubst et al., Cell 2019; Liu et al., Nature Methods 2025).

      As the reviewer rightly points out, achieving an accuracy exceeding 95% currently necessitates manual proofreading. Although our method does not completely eliminate this requirement, it significantly alleviates the proofreading workload by: 1) Minimizing common errors in regions with dense neuron distributions; 2) Providing more reliable initial reconstructions; and 3) Reducing the number of corrections needed during the proofreading process.

      In the future, we will continue to enhance our reconstruction framework. As imaging systems achieve higher signal-to-noise ratios and deep learning techniques facilitate more accurate foreground detection, we anticipate that our method will attain even greater reconstruction accuracy. Furthermore, we plan to develop a software system capable of predicting potential error locations in our automated reconstruction results, thereby streamlining manual revisions. This approach distinguishes itself from existing models by obviating the need for individual traversal of the brain regions associated with each neuron reconstruction.

      (2) The 'minimum information flow tree' model the authors use to construct connected axonal trees has the potential to bias data collection. In particular, the assumption that axons should always be as smooth as possible is not always correct. This is a good rule-of-thumb for reconstructions, but real axons in many systems can take quite sharp turns and this is also seen in the data presented in the paper (Figure 1C). I would like to see explicit acknowledgement of this bias in the current manuscript and ideally a relaxation of this rule in any later versions of the algorithm.

      We appreciate the reviewer's insightful opinion regarding the potential bias introduced by our minimum information flow tree model. The reviewer is absolutely correct in noting that while axon smoothness serves as a useful reconstruction heuristic, it should not be treated as an absolute constraint given that real axons can exhibit sharp turns (as shown in Figure 1C). In response to this valuable feedback, we add explicit discussion of this limitation in Discussion section as follow: “Finally, the minimal information flow tree’s fundamental assumption, that axons should be as smooth as possible does not always hold true.

      In fact, real axons can take quite sharp turns leading the algorithm to erroneously separate a single continuous axon into disjoint neurites.”

      In our reconstruction process, the post-processing approach partially mitigates erroneous reconstructions derived from this rule. Specifically: The minimum information flow tree will decompose such structures into two separate branches (Fig. S7A), but the decomposition node is explicitly recorded. The newly decomposed branches attempt to reconnect by searching for plausible neurites starting from their head nodes (determined by the minimum information flow tree). If no connectable neurites are found, the branch is automatically reconnected to its originally recorded decomposition node (Fig. S7B). In Fig.S7C, two reconstruction examples demonstrate the effectiveness of the post-processing approach.

      As pointed out by the reviewers, the proposed rule for revising neuron reconstruction does not encompass all scenarios. Relaxing the constraints of this rule may lead to numerous new erroneous connections. Currently, the proposed rule is solely based on the positions of neurite centerlines and does not integrate information regarding the intensity of the original images or segmentation data. Incorporating these elements into the rule could potentially reduce reconstruction errors. 

      (3) The writing of the manuscript is not always as clear as it could be. The manuscript would benefit from careful copy editing for language, and the Methods section in particular should be expanded to more clearly explain what each algorithm is doing. The pseudo-code of the Supplemental Information could be brought into the Methods if possible as these algorithms are so fundamental to the manuscript.

      We sincerely thank the reviewer for these valuable suggestions to improve our manuscript’s clarity and methodological presentation. We have implemented the following revisions:

      (1) Language Enhancement: we have conducted rigorous internal linguistic reviews to address grammatical inaccuracies and improve textual clarity.

      (2) Methods Expansion and Pseudo-code Integration: we have incorporated all relevant derivations from the Supplementary Materials into the Methods section, with additional explanatory text to clarify the purpose and implementation of each algorithm. All mathematical formulations have been systematically rederived with modifications to variable nomenclature, subscript/superscript notations and identified errors in the original submission. All pseudocode from Supplementary Materials has been integrated into their corresponding methods subsection.

      Reviewer #2 (Public review):

      In this manuscript, Cai et al. introduce PointTree, a new automated method for the reconstruction of complex neuronal projections. This method has the potential to drastically speed up the process of reconstructing complex neurites. The authors use semi-automated manual reconstruction of neurons and neurites to provide a 'ground-truth' for comparison between PointTree and other automated reconstruction methods. The reconstruction performance is evaluated for precision, recall, and F1-score and positions. The performance of PointTree compared to other automated reconstruction methods is impressive based on these 3 criteria.

      As an experimentalist, I will not comment on the computational aspects of the manuscript. Rather, I am interested in how PointTree's performance decreases in noisy samples. This is because many imaging datasets contain some level of background noise for which the human eye appears essential for the accurate reconstruction of neurites. Although the samples presented in Figure 5 represent an inherent challenge for any reconstruction method, the signal-to-noise ratio is extremely high (also the case in all raw data images in the paper). It would be interesting to see how PointTree's performance changes in increasingly noisy samples, and for the author to provide general guidance to the scientific community as to what samples might not be accurately reconstructed with PointTree.

      We thank the reviewer for her/his time reviewing our manuscript and the interest on how PointTree perform on noisy samples. It is important to clarify that PointTree is solely responsible for the reconstruction of neurons from the foreground regions of neural images. The foreground regions of these neuronal images are obtained through a deep learning segmentation network. In cases where the image has a low signal-to-noise ratio, if the segmentation network can accurately identify the foreground areas, then PointTree will be able to accurately reconstruct neurons. In fact, existing deep learning networks have demonstrated their capability to effectively extract foreground regions from low signal-to-noise ratio images; therefore, PointTree is well-suited for processing neuronal images characterized by low signal-to-noise ratios.

      In the revised manuscript, we conducted experiments on datasets with varying signal-to-noise ratios (SNR). The results demonstrate that Unet3D is capable of identifying the foreground regions in low-SNR images, thereby supporting the assertion that PointTree has broad applicability across diverse neuronal imaging datasets. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      It would be interesting to see how PointTree's performance changes in increasingly noisy samples, and for the author to provide general guidance to the scientific community as to what samples might not be accurately reconstructed with PointTree.

      We extend our heartfelt gratitude to the reviewer for their insightful suggestion concerning experiments involving different noisy samples. Here are the details of the datasets used:

      LSM dataset: Mean SNR = 5.01, with 25 samples, and a volume size of 192×192×192.

      fMOST dataset: Mean SNR = 8.68, with 25 samples, and a volume size of 192×192×192.

      HD-fMOST dataset: Mean SNR = 11.4, with 25 samples, and a volume size of 192×192×192.

      The experimental results reveal that, thanks to the deep learning network's robust feature extraction capabilities, even when working with low-SNR data (as depicted in Figure 4B, first two columns of the top row), satisfactory segmentation results (Figure 4B, first two columns of the third row) were achieved. These results laid a solid foundation for subsequent accurate reconstruction.

      PointTree demonstrated consistent mean F1-scores of 91.0%, 90.0%, and 93.3% across the three datasets, respectively. This underscores its reconstruction robustness under varying SNR conditions when supported by the segmentation network. For more in-depth information, please refer to the manuscript section titled "Reconstruction of data with different signal-to-noise ratios" and Figure 4.

    1. eLife Assessment

      This important work substantially advances our understanding of the interaction among gut microbiota, lipid metabolism, and the host in type 2 diabetes. The evidence supporting the claims of the authors is solid, although additional experiments for the control FMT are not yet satisfactory. The work will be of interest to medical biologists working on microbiota and diabetes.

    2. Reviewer #1 (Public review):

      Summary:

      The authors tried to identify the relationships between gut microbiota, lipid metabolites and the host in type 2 diabetes (T2DM) by using spontaneously developed T2DM in macaques, considered among the best human models.

      Strengths:

      The authors compared comprehensively the gut microbiota, plasma fatty acids between spontaneous T2DM and the control macaques, and tried verified the results with macaques in high-fat diet-fed mice model.

      Weaknesses:

      The observed multi-omics on macaques can be done on humans, which weakens the conclusion of the manuscript, unless the observation/data on macaques could cover during the onset of T2DM that would be difficult to obtain from humans.<br /> Regarding the metabolomic analysis on fatty acids, the authors did not include the results obtained form the macaque fecal samples which should be important considering the authors claimed the importance of gut microbiota in the pathogenesis of T2DM. Instead, the authors measured palmitic acid in the mouse model and tried to validate their conclusions with that.

      In murine experiments, palmitic acid-containing diet were fed to mice to induce diabetic condition, but this does not mimic spontaneous T2DM in macaques, since the authors did not measure in macaque feces (or at least did not show the data from macaque feces of) palmitic acid or other fatty acids; instead, they assumed from blood metabolome data that palmitic acid would be absorbed from the intestine to affect the host metabolism, and added palmitic acid in the diet in mouse experiments. Here involves the probable leap of logic to support their conclusions and title of the study.

      In addition, the authors measured omics data after, but not before, the onset of spontaneous T2DM of macaques. This can reveal microbiota dysbiosis driven purely by disease progression, but does not support the causative effect of gut microbiota on T2DM development that the authors claims.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors tried to identify the relationships among the gut microbiota, lipid metabolites, and the host in type 2 diabetes (T2DM) by using macaques that spontaneously develop T2DM, considered one of the best models of the human disease.

      Strengths:

      The authors comprehensively compared the gut microbiota and plasma fatty acids between macaques with spontaneous T2DM and control macaques and verified the results with macaques on a high-fat diet-fed mice model.

      Weaknesses:

      Comment 1: The observed multi-omics of the macaques can be done on humans, which weakens the impact of the conclusion of the manuscript.

      We fully acknowledge the critical role of human studies in T2DM research. In our study, the spontaneous T2DM macaque model provided a unique window to address inherent challenges in human studies, including medication interference and environmental heterogeneity. Human studies have struggled to standardize confounding factors such as diet, exercise, and antibiotic use. Moreover, most human T2DM patients receive long-term glucose-lowering medications (e.g., metformin), which directly alter gut microbiota composition and function, masking disease-associated microbial signatures (Sun et al., 2018; Petakh et al., 2023). In contrast, the spontaneous T2DM macaques, untreated with glucose-lowering drugs or antibiotics under strictly controlled conditions, revealed microbiota dysbiosis driven purely by disease progression. Our work bridged the gap between rodent studies and human clinical trials, providing an important clinical reference for guiding targeted interventions, particularly microbiota modulation. We sincerely appreciate the valuable comments. We have added background to the part of the introduction, “In fact, T2DM macaques avoid medication interference and environmental heterogeneity under controlled experimental conditions, and share key pathological features with humans, such as amyloidosis of pancreatic islets, which is absent in mouse models (25, 26), suggesting that T2DM macaques are the optimal animal model for simulating human T2DM and its complications (27).” (Lines 98-103).

      References:

      Sun L., Xie C., Wang G., Wu Y., Wu Q., Wang X., Liu J., Deng Y., Xia J., et al. 2018) Gut microbiota and intestinal FXR mediate the clinical benefits of metformin Nat. Med 24:1919-1929 https://doi.org/10.1038/s41591-018-0222-4

      Petakh P., Kamyshna I., Kamyshnyi A 2023) Effects of metformin on the gut microbiota: A systematic review Mol. metab 77:101805-101805 https://doi.org/10.1016/j.molmet.2023.101805

      Comment 2: In addition, the age and sex of the control macaque group did not necessarily match those of the T2DM group, leaving the possibility for compromising the analysis.

      Thank you for pointing this out. The availability of spontaneous T2DM macaques is very limited. Wang et al. (2018) identified only nine diabetic macaques among 2,000 screened, and our prior study (Jiang et al., 2022) found merely seven diabetic cases in 1,408 macaques. In this work, we obtained eight spontaneous T2DM macaques with FPG ≥ 7 mmol/L and eight heathy control macaques with FPG ≤ 6.1 mmol/L (three consecutive detections, each detection interval of one month) from a population of 1,698 captive macaques. To avoid confound factors affect the investigated macaques, all macaques were individually housed with standardized diets and environmental controls. While age and sex partially matched, controls originated from the same population to minimize confounding. The T2DM and control groups were matched for age period (5 adult and 3 elder) and had comparable mean ages (mean age of T2DM individuals = 12.88, mean age of control individuals = 11.25) (Table S1). In terms of gender matching, we compared blood metabolome data of 12 healthy adult female and 12 healthy adult male macaques from another study (Liu et al., 2023) and obtained only a small number of differential metabolites that were not associated with tryptophan (Table 1). We acknowledge this limitation and will prioritize matched controls in future studies.

      Author response table 1.

      List of all differential metabolites.

      References:

      Wang J., Xu S., Gao J., Zhang L., Zhang Z., Yang W., Li Y., Liao S., Zhou H., Liu P., et al. 2018) SILAC-based quantitative proteomic analysis of the livers of spontaneous obese and diabetic rhesus monkeys Am. J. Physiol-endoc. M 315:E29-E306 https://doi.org/10.1152/ajpendo.00016.2018

      Jiang C., Pan X., Luo J., Liu X., Zhang L., Liu Y., Lei G., Hu G., Li J 2022) Alterations in microbiota and metabolites related to spontaneous diabetes and pre-diabetes in rhesus macaques Genes 13:1513 https://doi.org/10.3390/genes13091513

      Liu X., Liu X.Y., Wang X.Q., Shang K., Li J.W., Lan Y., Wang J., Li J., et al. 2023). Multi-Omics Analysis Reveals Changes in Tryptophan and Cholesterol Metabolism before and after Sexual Maturation in Captive Macaques BMC Genomics 24:308. https://doi.org/10.1186/s12864-023-09404-3

      Comment 3: Regarding the metabolomic analysis, the authors did not include fecal samples which are important, considering the authors' claim about the importance of gut microbiota in the pathogenesis of T2DM.

      We thank the reviewer for this suggestion. This study employed untargeted metabolomics on macaque fecal samples to identify metabolites associated with spontaneously developing T2DM. To validate the metabolites identified through the untargeted metabolomic analysis, we conducted targeted medium- and long-chain fatty acid (MLCFA) metabolomics on macaque serum, and we further quantitatively examined the content of palmitic acid (PA) in mice feces, ileum, and serum. Although targeted MLCFA metabolomics was not performed on macaque fecal samples, we performed untargeted metabolomics on macaque feces and confirmed the contribution of PA in mice that underwent fecal microbiota transplantation (FMT) from T2DM macaques. We have added future expectations in the part of the discussion, “Previous studies have shown that insulin-resistant patients exhibit increased fecal monosaccharides associated with microbial carbohydrate metabolism (70). Furthermore, commensal species of Lachnospiraceae actively overproduce long-chain fatty acids during metabolic dysfunction through altered bacterial lipid metabolism. The microbe-derived fatty acids impair intestinal epithelial integrity to exacerbate metabolic dysregulation (71). Given that microbial metabolic activity causally modulates host metabolic homeostasis, the content change of PA was potentially associated with a dynamic equilibrium between host absorption and microbial metabolism. Further integrative studies on the fecal fatty acid metabolome, microbial PA metabolism, and functional pathways will be crucial for delineating causal links between dysbiosis and lipid metabolic dysfunction in T2DM.” (Lines 426-437).

      Comment 4: In the mouse experiments, the control group should be given a FMT from control macaques rather than just untreated SPF mice since the fecal microbiota composition is likely very different between macaques and mice.

      Thanks for your helpful suggestion. We recognized the importance of a FMT control group and supplemented mouse experiments (using the C57BL/6J strain) with FMT from control macaques (HFT group). Another group of mice without FMT was set as control. Due to the lengthy experimental period, observations were concluded at 30 days post-FMT. We compared changes in the gut microbiota before and after antibiotic treatment in mice (-14D and 0D), and tracked body weight and fasting plasma glucose (FPG) levels from day -14 to day 30. At 30 days after FMT, fecal samples from all groups were collected for 16S rRNA sequencing. Additionally, samples of T2DM microbiota transplant (TP), and control transplant (HTP) were sequenced. Finally, we integrated the 16S sequencing data from the FTPA group (palmitic acid (PA) diet and FMT from T2DM macaques) and FT group (normal diet and FMT from T2DM macaques) at day 30 for combined analysis. The results showed that the antibiotic treatment used in this study effectively depleted the gut microbiota. Following FMT, gut microbial diversity stabilized within 30 days, with similar microbial community proportions between HFT and control groups. Core functional groups of the healthy microbiota (Bacteroidota and Bacillota) stably colonized mice despite host species divergence, confirming that T2DM phenotypes originate specifically from macaque microbiota. Importantly, increased abundance of Lachnospiraceae (including genera Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium) and the key species Ruminococcus gnavus (current name: Mediterraneibacter gnavus) were also observed in FT group versus HFT group on day 30, validating our original findings. We have added findings in the results, “To eliminate interference from host species divergence in gut microbiota composition, we supplemented mouse experiments using FMT from control macaques (HFT group) (Figure S4A). By day 30, the HFT group exhibited significantly lower body weight than the untreated control group (p < 0.05) (Figure S4B). Throughout the experimental period, FPG levels in both HFT and control groups remained within the normal range (< 6 mmol/L) without significant differences, indicating that transplantation of control macaque microbiota did not induce glycemic alterations (Figure S4C).” (Lines 276-283), and “Integrating 16S rRNA sequencing data from the HFT, FT, and FTPA groups showed that the antibiotic treatment effectively depleted the gut microbiota, resulting in microbial diversity decreased sharply, with the dominant phyla shifting from Bacteroidota and Bacillota to Pseudomonadota (Figure S4D-G). The HFT group restored microbial diversity within 30 days, achieving community proportions comparable to untreated controls. Core functional phyla (Bacteroidota and Bacillota) stably colonized in HFT group (Figure S4D-I). Critically, FT and FTPA groups exhibited increased Lachnospiraceae (including genera Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium) compared with the HFT group on day 30. In addition, LEfSe comparison identified significant R. gnavus (current name: M. gnavus) enrichment in the FT group (LDA > 3, p < 0.01) (Figure S4J-M).” (Lines 324-334, 825-837). Specifically:

      (1) Experimental design: transplant preparation and FMT from control macaques

      After single cage feeding and FPG detection, fecal samples from three control macaques were collected and mixed for transplantation preparation. Then, 4 ml diluent (Berland et al., 2021) was added per gram of feces. Sodium L-ascorbic acid (5% (w/v)) and L-cysteine hydrochloride monohydrate (0.1% (w/v)) were added to all suspensions (The sterile diluent of control group was added with the same amount of reagent). The mixture was homogenized and filtered sequentially through 200, 400, and 800 μm sterile mesh screens. The filtrate was centrifuged (600 × g, 5 min), and supernatants were aliquoted (400 μL/tube) for storage at -80°C. For use, the transplant was quickly thawed in a 37℃ water bath.

      Specific-pathogen-free male C57BL/6J mice aged 6 weeks were randomized into control and HFT (receiving FMT from control macaques) groups. Mice received antibiotic water (ampicillin, neomycin sulfate, and metronidazole, 1 g/L each) from days -14 to 0. All mice were maintained under standard conditions (12h light/dark, 22-25°C, 40-60% humidity) with sterile diet and twice-daily water changes. Body weight, fasting plasma glucose (FPG) were monitored, and fecal samples were collected throughout the study, with fecal 16S rRNA sequencing performed (Figure S4). The study was approved by the Ethics Committee of College of Life Sciences, Sichuan University, and conducted in accordance with the local legislation and institutional requirements.

      (2) Results

      Body weight monitoring revealed no significant difference between HFT and control groups before (-14D) and after (0D) antibiotic treatment. By day 30, the HFT group exhibited significantly lower body weight than the untreated control group (p < 0.05) (Figure S4B). Throughout the experimental period, FPG levels in both HFT and control groups remained within the normal range (< 6 mmol/L) without significant differences, indicating that transplantation of control macaque microbiota did not induce glycemic alterations (Figure S4C).

      Shannon and Simpson indices showed a significant reduction in gut microbiota diversity after antibiotic treatment (0D) (p < 0.01) (Figure S4D,E). The intestinal microbiota of normal mice (-14D) was predominantly composed of Bacteroidota and Bacillota. After two weeks of antibiotic treatment (0D), microbial diversity decreased sharply compared to the -14D group, with the dominant phyla shifting from Bacteroidota and Bacillota to Pseudomonadota (Author response image 1A; Figure S4L). In healthy gut homeostasis, obligate anaerobes such as Bacillota and Bacteroidota maintain intestinal equilibrium. Antibiotic disruption induced dysbiosis in mice, causing substantial restructuring of fecal microbial composition. During dysbiosis, colon epithelial cells shift to anaerobic glycolysis for energy production, increasing epithelial oxygenation and driving expansion of facultative anaerobic Pseudomonadota (de Nies et al., 2023; Szajewska et al., 2024).

      NMDS analysis of integrated 16S rRNA sequencing data of FTPA30D (PA diet and FMT from T2DM macaques) and FT30D (normal diet and FMT from T2DM macaques) revealed high intra-group repeatability among pre-antibiotic (-14D), post-antibiotic (0D), HFT30D, T2DM microbiota transplant (TP), and control transplant (HTP) groups. The 0D group showed maximal separation from other clusters, while the -14D, control30D, and HFT30D clustered closely together, with HFT30D nearest to control30D (Figure S4F). On the day 30, all groups showed restoration of microbiota community structure, and the composition of gut microbiota in HFT30D was basically consistent with the control30D group at all taxonomic levels (Author response image 1A-C). At the phylum level, HFT30D group showed significantly reduced relative abundance of Pseudomonadota and increased abundance of Bacteroidota, Bacillota_A, Bacillota_I, and gut barrier-enhancing Verrucomicrobiota (Author response image 1A). These findings demonstrated that FMT from control macaques effectively restored the gut microbiota of antibiotic-treated mice toward a normative state.

      Author response image 1.

      Composition of gut microbiota in mice. (A) Phylum level; (B) Family level; (C) Genus level.

      At the phylum level, the FT30D and FTPA30D groups exhibited lower proportions of Bacteroidota/Bacillota compared to the HFT30D (Author response image 1A). Family-level analysis revealed markedly increased abundance of Lactobacillaceae and Lachnospiraceae in FTPA30D and FT30D groups relative to HFT30D, consistent with the changes in the microbiota of spontaneously T2DM macaques (Author response image 1B). Notably, while both HTP and TP groups contained Lachnospiraceae, only FT30D and FTPA30D mice demonstrated significant increase of this family, which was close to that in TP group. Although Muribaculaceae and Bacteroidaceae showed partial recovery in these groups, their relative abundances remained substantially lower than in control30D and HFT30D groups, suggesting that microbiota transplantation from T2DM macaques may reduce specific beneficial taxa while promoting expansion of conditionally pathogenic or metabolically-altered bacteria, such as Lachnospiraceae.

      Further analysis of Lachnospiraceae dynamics revealed that at the genus level, most Lachnospiraceae members exhibited higher abundance in the TP group compared to the HTP group. FT30D and FTPA30D groups showed increased abundance of Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium relative to HFT30D group, consistent with prior analyses (Figure S4). LEfSe comparison between FT30D and HFT30D identified significantly enriched Ruminococcus gnavus (current name: Mediterraneibacter gnavus) in FT30D recipients (LDA > 3, p < 0.01), corroborating earlier findings (Figure S4L). As a mucin-degrading microbe, R. gnavus (current name: M. gnavus) promotes insulin resistance through modulation of tryptamine/phenethylamine levels (Zhai et al., 2023) and exhibits pro-inflammatory properties (Henke et al., 2019; Paone and Cani, 2020). The absence of R. gnavus (current name: M. gnavus) enrichment in FTPA30D was potentially related to differential long-term impacts of T2DM microbiota transplantation across the 30- versus 120-day experimental timelines.

      Author response image 2.

      Identification of differential microbiota in mice. (A) Linear discriminant analysis Effect Size (LEfSe) analysis between pre-antibiotic (-14D) and post-antibiotic (0D) groups; (B) HFT and FTPA groups; (C) HFT and FT groups.

      References:

      Berland M., Cadiou J., Levenez F., Galleron N., Quinquis B., Thirion F., Gauthier F., Le ChatelierE., Plaza Oñate F., Schwintner C., et al. 2021) High engraftment capacity of frozen ready-to-use human fecal microbiota transplants assessed in germ-free mice Sci. Rep 11 https://doi.org/10.1038/s41598-021-83638-7

      Szajewska H., Scott KP., Meij T de., Forslund-Startceva S.K., Knight R., Koren O., Little P., Johnston B.C., Łukasik J., Suez J., Tancredi D.J., Sanders M.E 2024) Antibiotic-perturbed microbiota and the role of probiotics Nat. Rev. Gastro. Hepat 1-18 https://doi.org/10.1038/s41575-024-01023-x

      de Nies L., Kobras C.M., Stracy M 2023) Antibiotic-induced collateral damage to the microbiota and associated infections. Nat. Rev. Microbiol 21:789-804 https://doi.org/10.1038/s41579-023-00936-9

      Zhai L., Xiao H., Lin C., Wong H.L.X., Lam Y.Y., Gong M., Wu G., Ning Z., Huang C., Zhang Y., et al. 2023) Gut microbiota-derived tryptamine and phenethylamine impair insulin sensitivity in metabolic syndrome and irritable bowel syndrome Nat. Commun 14 https://doi.org/10 .1038/s41467-023-40552-y

      Henke M.T., Kenny D.J., Cassilly C.D., Vlamakis H., Xavier R.J., Clardy J 2019) Ruminococcusgnavus, a member of the human gut microbiome associated with Crohn's disease, produces an inflammatory polysaccharide Proc. Nat. Acad. Sci 116:12672-12677 https://doi.org/10.1073/pnas.1904099116

      Paone P., Cani P.D 2020) Mucus barrier, mucins and gut microbiota: the expected slimy partners? Gut 69:2232-2243 https://doi.org/10.1136/gutjnl-2020-322260

      Comment 5: Additionally, the palmitic acid-containing diets fed to mice to induce a diabetes-like condition do not mimic spontaneous T2DM in macaques.

      Thanks for your helpful suggestion. We agree that the palmitic acid (PA)-containing diet alone could not fully mimic spontaneous T2DM in macaques. In our study, the PA diet was employed in mouse experiments to investigate whether gut microbiota modulates serum PA levels and mediates T2DM progression. Our critical finding revealed that microbiota was essential for enhanced PA absorption, while simply increasing dietary levels of PA did not effectively enhance intestinal uptake. The fecal microbiota transplantation (FMT) combined with PA-diet approach successfully induced prediabetic states in mice, which can be further applied to the induction of T2DM in macaques. We have added future expectations in the part of the discussion, “Our study highlights the essential roles of gut microbiota in T2DM development, which may account for the inability of prior studies to induce T2DM in macaques through high-fat diet intervention alone (28, 29). Furthermore, applying this approach to induce T2DM in macaques will enable deeper investigation into gut-microbiota-driven mechanisms underlying disease pathogenesis.” (Lines 393-398).

      Reviewer #1 (Recommendations for the authors):

      General comments

      Comment 1: The authors used macaques in this study. The author claims that macaques may be the best animal model to investigate the relationships among gut microbiota, lipid metabolites, and the host in type 2 diabetes (T2DM). However, there have already been some studies investigating these relationships in humans (for example, doi: 10.1016/j.cmet.2022.12.013, and doi: 10.1038/s41586-023-06466-x). The authors should cite and discuss these papers.

      We thank the reviewer for this suggestion. We have cited the two papers in the part of discussion, “Previous studies have shown that insulin-resistant patients exhibit increased fecal monosaccharides associated with microbial carbohydrate metabolism (70). Furthermore, commensal species of Lachnospiraceae actively overproduce long-chain fatty acids during metabolic dysfunction through altered bacterial lipid metabolism. The microbe-derived fatty acids impair intestinal epithelial integrity to exacerbate metabolic dysregulation (71).” (Lines 426-432).

      Specific comments

      Major:

      Comment 2: (1) First of all, sex and age of the T2DM and control groups are different (Suppl Table 1). Since the size of the captive population is 1,698, the authors should be able to select the factors including the sex and age of the control group to match those of the T2DM group and they should do so.

      In this work, we obtained eight spontaneous T2DM macaques with FPG ≥ 7 mmol/L and eight heathy control macaques with FPG ≤ 6.1 mmol/L (three consecutive detections, each detection interval of one month) from a population of 1,698 captive macaques. To avoid confound factors affect the investigated macaques, all macaques were individually housed with standardized diets and environmental controls. While age and sex partially matched, controls originated from the same population to minimize confounding. The T2DM and control groups were matched for age period (5 adult and 3 elder) and had comparable mean ages (mean age of T2DM individuals = 12.88, mean age of control individuals = 11.25) (Table S1). In terms of gender matching, we compared blood metabolome data of 12 healthy adult female and 12 healthy adult male macaques from another study (Liu et al., 2023) and obtained only a very small number of differential metabolites that were not associated with tryptophan (Author response table 1). We acknowledge this limitation and will prioritize matched controls in future studies.

      References:

      Liu X., Liu X.Y., Wang X.Q., Shang K., Li J.W., Lan Y., Wang J., Li J., et al. 2023). Multi-Omics Analysis Reveals Changes in Tryptophan and Cholesterol Metabolism before and after Sexual Maturation in Captive Macaques BMC Genomics 24:308. https://doi.org/10.1186/s12864-023-09404-3

      Comment 3: (2) Are the normal ranges known for the parameters of macaques shown in Table 1? If so, the authors should include those values in Table 1. If not, the authors should show the values of average and SD or SE of all 1,698 individuals as the reference.

      We thank the reviewer for this suggestion. In this study, the normal ranges of fasting plasma glucose (FPG), fasting plasma insulin (FPI), homeostasismodel assessment- insulin resistance (HOMA-IR), and glycosylated hemoglobin A1cwe (HbA1c) were referenced against human standards. According to the American Diabetes Association (ADA) for glucose metabolism status and the diagnostic criteria for diabetes, individuals with FPG ≥ 7 mmol/L were diagnosed as T2DM subjects, and individuals with FPG ≤ 6.1 mmol/L were controls. More sensitive assays show a normal fasting plasma insulin level to be under 12 μU/mL (Matsuda and DeFronzo, 1999). HOMA-IR ≥ 2.67 indicated the possibility of insulin resistance, which is used in clinical diagnosis (Lorenzo et al., 2012). HbA1c percentages higher than 6.5% were used as an auxiliary diagnostic index for diabetic macaques (Cowie et al., 2010). The normal ranges of triglycerides (TG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL), and low-density lipoprotein cholesterol (LDL) were referenced against the blood lipid index of rhesus macaques (Yu et al., 2019). We have added the normal ranges of parameters to Table 1, “FPG: fasting plasma glucose (normal range: ≤ 6.1 mmol/L); FPI: fasting plasma insulin (normal range: ≤ 12 μU/mL); HOMA-IR: homeostasismodel assessment- insulin resistance (normal range: ≤ 2.67); BMI: body mass index; HbA1c: glycosylated hemoglobin A1c (normal range: < 6.5%); TG: triglycerides (normal range: 0.95±0.47 mmol/L); TC: total cholesterol (normal range: 3.06±0.98 mmol/L); HDL: high-density lipoprotein cholesterol (normal range: 1.62±0.46 mmol/L); LDL: low-density lipoprotein cholesterol (normal range: 2.47±0.98 mmol/L). (30, 31, 32, 33).”.

      References:

      Matsuda M., DeFronzo R.A 1999) Insulin sensitivity indices obtained from oral glucose tolerance testing: comparison with the euglycemic insulin clamp Diabetes care 22:1462-1470 https://doi.org/10.2337/diacare.22.9.1462

      Lorenzo C., Hazuda H.P., Haffner S.M 2012) Insulin resistance and excess risk of diabetes in Mexican-Americans: the San Antonio Heart Study J. Clin. Endocr. Metab 97:793-799 https://doi.org/10.1210/jc.2011-2272

      Cowie C.C., Rust K.F., Byrd-Holt D.D., Gregg E.W., Ford E.S., Geiss L.S., Bainbridge K.E., Fradkin J.E 2010) Prevalence of diabetes and high risk for diabetes using A1C criteria in the US population in 1988–2006 Diabetes care 33:562-568 https://doi.org/10.2337/dc09-1524

      Yu W., Hao X., Yang F., Ma J., Zhao Y., Li Y., Wang J., Xu H., Chen L., Liu Q., et al. 2019) Hematological and biochemical parameters for Chinese rhesus macaque PLoS One 14:e0222338 https://doi.org/10.1371/journal.pone.0222338

      Comment 4: (3) The authors measured the fasting plasma glucose (FPG) levels, but it is common to measure whole blood glucose since glucose is consumed during the processing of obtaining plasma which could compromise the results. Please explain why plasma glucose levels were measured.

      The criteria for screening spontaneous T2DM macaques were guided by the American Diabetes Association (ADA) for glucose metabolism status and the diagnostic criteria for diabetes. Individuals with FPG ≥ 7 mmol/L were diagnosed as T2DM subjects, and individuals with FPG ≤ 6.1 mmol/L were controls. For the identified subjects, a total of three times of FPG tests were employed, with an interval of one month to reduce the possible error. These individuals were raised in a single cage, and blood samples were collected after an overnight fast at least 12 h. After the three test results meet the standards, venous blood was collected for FPG testing to ensure the reliability of the data to the greatest extent. We have added FPG values of three time to the Table S1.

      Comment 5: (4) Since the BMI of the T2DM and control groups did not significantly differ (p>0.05, Table 1), the food intake of the two groups may not significantly differ as well. The authors should examine the food intake data. The food intake is also important in considering the relevance of feeding the PA diet in mice experiments. Were the intake of T2DM macaques including PA more than the control group?

      All macaques in this study were individually housed under standardized environments with timed and measured feeding to minimize confounders. Given the non-significant BMI difference between T2DM and control groups, food intake was probably not significantly different. In this study, our findings highlight the essential roles of gut microbiota in T2DM development, and this is probable also the reason that previous studies have failed to induce T2DM in macaques because they have only used a high-fat diet (Ji et al., 2012; Tang, 2020). We agree that PA intake in T2DM macaques warrants focused investigation. Future investigations will incorporate detailed dietary monitoring including palmitic acid (PA) intake and nutrient composition to examine potential relationships between specific dietary components, metabolic parameters, and diabetes progression.

      References

      Ji F., Jin L., Zeng X., Zhang X., Zhang Y., Sun Y., Gao L., He H., Rao J., Liu X., et al. 2012) Comparison of gene expression between naturally occurring and diet-induced T2DM in cynomolgus monkeys Dongwuxue Yanjiu 33:79–84 https://doi.org/10.3724/SP.J.1141.2012 .01079

      Tang MT. 2020) Study on the Role of Glucose and Lipid in the Establishment of Type 2 Diabetic Cynomolgus Monkey Model M.S. Thesis, Dept. Veterinary Med., South China Agricultural Univ. 2020

      Comment 6: (5) It may be that the fecal microbiome of the T2DM macaques is involved in the pathogenesis of T2DM; however, it is more important how the gut microbiota compositions were obtained/established by those T2DM macaques. There was no description of when the fecal samples were collected during the course of T2DM. If it was after T2DM symptoms appeared, the authors should perform gut metagenome and also gut metabolome analyses to see the change in those parameters to try to understand how gut microbiome changes are induced leading to T2DM pathogenesis.

      The spontaneous T2DM macaques untreated with glucose-lowering drugs or antibiotics, revealed microbiota dysbiosis driven purely by disease progression. After macaques met diagnostic thresholds across three FPG assessments (each detection interval of one month), we collected fresh fecal samples and stored them aseptically at -80 °C until analysis. The scarcity of spontaneous T2DM macaques precludes invasive sampling, restricting tissue collection to naturally deceased diabetic individuals, which prevented us to explicitly define the disease stage of the T2DM individuals. We recognize the scientific value of gut metagenomic and metabolomic analyses to track microbiome evolution during diabetes progression. This study explored the interaction of gut microbiota and metabolites in T2DM macaques, and future studies can continue to investigate its dynamic changes in the disease process of T2DM.

      Comment 7: (6) Regarding the fatty acids, the authors only measured them in the plasma, but they also should measure in feces, since the authors focus on gut microbiota; in addition, a recent report showed fecal fatty acids, especially elaidic acid, contributed the pathogenesis of obesity and T2DM by acting on the gut epithelial cells (doi: 10.1016/j.cmet.2022.12.013). Besides, this study showed the link between a Lachnospiraceae species and fecal palmitic and elaidic acids, which the authors also focused on in this manuscript.

      We thank the reviewer for this suggestion. This study employed untargeted metabolomics on macaque fecal samples to identify metabolites associated with spontaneously developing T2DM. To validate the metabolites identified through the untargeted metabolomic analysis, we conducted targeted medium- and long-chain fatty acid (MLCFA) metabolomics on macaque serum, and we further quantitatively examined the content of palmitic acid (PA) in mice feces, ileum, and serum. Although targeted MLCFA metabolomics was not performed on macaque fecal samples, we did perform untargeted metabolomics on macaque feces and confirmed the contribution of PA in mice that underwent fecal microbiota transplantation (FMT) from T2DM macaques. We have added future expectations in the part of the discussion, “Previous studies have shown that insulin-resistant individuals exhibit increased fecal monosaccharides associated with microbial carbohydrate metabolism (70). Furthermore, commensal species of Lachnospiraceae actively overproduce long-chain fatty acids during metabolic dysfunction through altered bacterial lipid metabolism. The microbe-derived fatty acids impair intestinal epithelial integrity to exacerbate metabolic dysregulation (71). Given that microbial metabolic activity causally modulates host metabolic homeostasis, the content change of PA was potentially associated with a dynamic equilibrium between host absorption and microbial metabolism. Further integrative studies on the fecal fatty acid metabolome, microbial PA metabolism, and functional pathways will be crucial for delineating causal links between dysbiosis and lipid metabolic dysfunction in T2DM.” (Lines 426-437).

      Comment 8: (7) In FMT and PA diet experiments, SPF mice were used as the control group. However, the gut microbiota composition of the SPF mice is markedly different from that of macaques; the difference must be much bigger than the difference between T2DM and healthy control macaques; therefore, mice with FMT from healthy control macaques have to be used as the control group. As mentioned above (in point #4), is the feeding of mice with PA diet a relevant model reflecting the condition observed in macaques in this study?

      Thanks for your helpful suggestion. We recognized the importance of a FMT control group and supplemented mouse experiments (using the C57BL/6J strain) with FMT from control macaques (HFT group). Another group of mice without FMT was set as control. Due to the lengthy experimental period, observations were concluded at 30 days post-FMT. We compared changes in the gut microbiota before and after antibiotic treatment in mice (-14D and 0D), and tracked body weight and fasting plasma glucose (FPG) levels from day -14 to day 30. At 30 days after FMT, fecal samples from all groups were collected for 16S rRNA sequencing. Additionally, samples of T2DM microbiota transplant (TP), and control transplant (HTP) were sequenced. Finally, we integrated the 16S sequencing data from the FTPA group (palmitic acid (PA) diet and FMT from T2DM macaques) and FT group (normal diet and FMT from T2DM macaques) at day 30 for combined analysis. The results showed that the antibiotic treatment used in this study effectively depleted the gut microbiota. Following FMT, gut microbial diversity stabilized within 30 days, with similar microbial community proportions between HFT and control groups. Core functional groups of the healthy microbiota (Bacteroidota and Bacillota) stably colonized mice despite host species divergence, confirming that T2DM phenotypes originate specifically from macaque microbiota. Importantly, increased abundance of Lachnospiraceae (including genera Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium) and the key species Ruminococcus gnavus (current name: Mediterraneibacter gnavus) were also observed in FT group versus HFT group on day 30, validating our original findings. We have added findings in the results, “To eliminate interference from host species divergence in gut microbiota composition, we supplemented mouse experiments using FMT from control macaques (HFT group) (Figure S4A). By day 30, the HFT group exhibited significantly lower body weight than the untreated control group (p < 0.05) (Figure S4B). Throughout the experimental period, FPG levels in both HFT and control groups remained within the normal range (< 6 mmol/L) without significant differences, indicating that transplantation of control macaque microbiota did not induce glycemic alterations (Figure S4C).” (Lines 276-283), and “Integrating 16S rRNA sequencing data from the HFT, FT, and FTPA groups showed that the antibiotic treatment effectively depleted the gut microbiota, resulting in microbial diversity decreased sharply, with the dominant phyla shifting from Bacteroidota and Bacillota to Pseudomonadota (Figure S4D-G). The HFT group restored microbial diversity within 30 days, achieving community proportions comparable to untreated controls. Core functional phyla (Bacteroidota and Bacillota) stably colonized in HFT group (Figure S4D-I). Critically, FT and FTPA groups exhibited increased Lachnospiraceae (including genera Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium) compared with the HFT group on day 30. In addition, LEfSe comparison identified significant R. gnavus (current name: M. gnavus) enrichment in the FT group (LDA > 3, p < 0.01) (Figure S4J-M).” (Lines 324-334, 825-837).

      We agree that the PA-containing diet alone could not fully mimic spontaneous T2DM in macaques. In our study, the PA diet was employed in mouse experiments to investigate whether gut microbiota modulates serum PA levels and mediates T2DM progression. Our critical finding revealed that microbiota was essential for enhanced PA absorption, while simply increasing dietary levels of PA did not effectively enhance intestinal uptake. The FMT combined with PA-diet approach successfully induced prediabetic states in mice, which can be further applied to the induction of T2DM in macaques. We have added future expectations in the part of the discussion, “Our study highlights the essential roles of gut microbiota in T2DM development, which may account for the inability of prior studies to induce T2DM in macaques through high-fat diet intervention alone (28, 29). Furthermore, applying this approach to induce T2DM in macaques will enable deeper investigation into gut-microbiota-driven mechanisms underlying disease pathogenesis.” (Lines 393-398).

      Comment 9: FPG was measured here in the mouse experiments, but there was no description of whether mice were under fasting conditions, and this should be clarified. If there are no fasting durations, this should be described in the Materials and Methods section.

      As suggested, we have added description to the Materials and Methods section, “Throughout the experiment, body weight and feces were collected every month, FPG was detected every half month under fasting at least 12 h.” (Lines 619-620).

      Comment 10: From the PA contents in feces, ileum, and serum in mice (Figures 5A-D), the authors concluded that the absorption of PA was significantly enhanced in the ileum leading to the increase of PA in serum. However, it could also be possible that consumption of PA by gut microbiota occurs at the same time and the authors should discuss the possibility.

      We thank the reviewer for spotting this. We have added a discussion to the manuscript, “Previous studies have shown that insulin-resistant individuals exhibit increased fecal monosaccharides associated with microbial carbohydrate metabolism (70). Furthermore, commensal species of Lachnospiraceae actively overproduce long-chain fatty acids during metabolic dysfunction through altered bacterial lipid metabolism. The microbe-derived fatty acids impair intestinal epithelial integrity to exacerbate metabolic dysregulation (71). Given that microbial metabolic activity causally modulates host metabolic homeostasis, the content change of PA was potentially associated with a dynamic equilibrium between host absorption and microbial metabolism. Further integrative studies on the fecal fatty acid metabolome, microbial PA metabolism, and functional pathways will be crucial for delineating causal links between dysbiosis and lipid metabolic dysfunction in T2DM.” (Lines 426-437).

      Comment 11: (8) Nomenclature and classification of bacteria has been revised by the List of Prokaryotic names with Standing in Nomenclature (LPSN) (https://lpsn.dsmz.de/) and recognized as Global Core Biodata Resource in 2023. For example, Ruminococcus gnavus is now Mediterraneibacter gnavus. Therefore, the name of microbes should be corrected accordingly; one proposal is to show the revised correct name with the previous name in parenthesis, such as "Mediterraneibacter gnavus (previously Ruminococcus gnavus)".

      Thank you for pointing this out. We have corrected the name of microbe, “Ruminococcus (current name: Mediterraneibacter)”, “Ruminococcus gnavus (current name: Mediterraneibacter gnavus), and “R. gnavus (current name: M. gnavus)” (Lines 146, 313, 316-317, 336, 345, 367-368, 401, 404-405, 409, 448, 764-765)

      Minor:

      Comment 12:

      (1) The sentence starting "A total of..." (lines 143-144) seems grammatically wrong; a word such as "represented" should be inserted after "differentially", or alternatively "differentially" should be "differential"?

      (2) "medium-and" (line 220) needs a space between "medium-" and "and" to make it "medium- and".

      (3) Abbreviations should be spelled out when they appear for the first time in the main text; for example, WBC, NEU, and LYM in line 237.

      (4) Should FGP (line 437) be FPG?

      (5) What is the definition of "prediabetes" in mice? Is this clearly defined elsewhere?

      We sincerely thank the reviewer for careful reading. As suggested, we have improved the statements and revised it according to the requirements:

      (1) Line 143: “A total of 21 microbes were identified as differential microbes”.

      (2) Line 221: “targeted medium- and long-chain fatty acid”.

      (3) Lines 238-239: “white blood cell (WBC)”, “neutrophil (NEU)”, and “lymphocyte (LYM)”.

      (4) Line 472: “FPG, HbA1c and FPI were detected”.

      (5) Prediabetes or impaired glucose regulation (IGR) is diagnosed when one exhibits blood glucose level higher than normal yet below the diabetic threshold, which is even more prevalent than T2DM in the population (American Diabetes, 2021). Given the higher glycemic diagnostic criteria in mice, we assessed diabetic manifestations integrating physiological and pathological evidence. Compared to control mice, those receiving FMT from T2DM macaques combined with a high-palmitic-acid diet (FTPA group) developed prediabetic characteristics by day 120. Physiological alterations included elevated fasting plasma glucose (FPG), increased fasting plasma insulin (FPI), impaired glucose tolerance, heightened insulin resistance, weight gain, and elevated serum total cholesterol (TC) and triglyceride (TG) levels. Particularly in pathological changes, hepatocytes focal necrosis with inflammatory cell infiltration was commonly observed in FTPA group, alongside decreased volume in pancreatic islets and inflammatory cell infiltration (lines 258-276).

      References:

      American Diabetes Association 2021) 2. Classification and diagnosis of diabetes: standards of medical care in diabetes—2021 Diabetes care 44:S15-S33 https://doi.org/10.2337/dc21-S002

      Reviewer #2 (Public review):

      This study analyzes the interaction among the gut microbiota, lipid metabolism, and the host in type 2 diabetes (T2DM) using rhesus macaques. The authors first identified 8 macaques with T2DM from 1698 individuals. Then, they observed in T2DM macaques: dysbiosis by 16S rRNA gene amplicon analysis and shotgun sequencing, imbalanced tryptophan metabolism and fatty acid beta oxidization in the feces by metabolome analysis, increased plasma concentration of palmitic acid by MS analysis, and sn inflammatory gene signature of blood cells by transcriptomic analysis. Finally, they transplanted feces of T2DM macaques into mice and fed them with palmitic acid and showed that those mice became diabetic through increased absorption of palmitic acid in the ileum.

      Comment 1: This study clearly shows the interaction among gut microbiota, lipid metabolism, and the host in T2DM. The experiments were well designed and performed, and the data are convincing. One point I would suggest is that in the experiments of mice with FMT, control mice should be those colonized with feces of healthy macaques, but not with no FMT.

      See response to Reviewer 1, Public review comment 4.

    1. eLife Assessment

      This study provides valuable evidence indicating that SynGap1 regulates the synaptic drive and membrane excitability of parvalbumin- and somatostatin-positive interneurons in the auditory cortex. Since haplo-insufficiency of SynGap1 has been linked to intellectual disabilities without a well-defined underlying cause, the central question of this study is timely. The experimental data is solid, as in their revisions the authors successfully addressed questions related to changes in thalamocortical presynaptic excitability, the contradiction between spontaneous and mini EPSCs data, and the anatomical analysis of excitatory synapses.

    2. Reviewer #2 (Public review):

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      This is the third revision of the manuscript that has improved further, and the main issues were addressed. Specifically, the Authors addressed the contradiction of mEPSC and sEPSC data of the previous version by new experiments and revision of the manuscript text. While alternative explanations are still possible, the new control experiments provide necessary background for reproducibility and the manuscript text puts the observations in the right context. Furthermore, the manuscript now appropriately emphasizes that anatomical analysis was restricted to somatic excitatory synapses. Thus, the readers will be aware of the potential limitations of these measurements.

      Strengths:

      The questions are novel and relevant. Most of the issues in the experimental design are solved or answered.

      Weaknesses:

      Despite the interesting and novel questions, there are potential alternative interpretations of the observations, but these cannot be addressed within the breadth of a single paper.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant issues regarding the experimental design and potential misinterpretations of key findings. Consequently, the manuscript contributes little to our understanding of SynGap1 loss mechanisms.

      Major issues in the second version of the manuscript:

      In the review of the first version there were major issues and contradictions with the sEPSC and mEPSC data, and were not resolved after the revision, and the new control experiments rather confirmed the contradiction.

      In the original review I stated: "One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity. The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar." Contradictions remained after the revision of the manuscript. On one hand, the authors claimed in the revised version that "We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g), indicating that the observed difference in sEPSC amplitude (Figure 1b) could arise from decreased network excitability". On the other hand, later they show "no significative difference in either amplitude or inter-event intervals between sEPSC and mEPSC, suggesting that in acute slices from adult A1, most sEPSCs may actually be AP independent." The latter means that sEPSCs and mEPSCs are the same type of events, which should have the same sensitivity to manipulations.

      We thank the reviewer for the detailed comments. Our results suggest a diverse population of PV+ cells, with varying reliance on action potential-dependent and -independent release. Several PV+ cells indeed show TTX sensitivity (reduced EPSC event amplitudes following TTX application: See new Supplementary Figure 2b-e), but their individual responses are diluted when all cells are pooled together. To account for this variability, we recorded sEPSC followed by mEPSC from more mice of both genotypes (new Figure 1f-j). Further, following the editors and reviewers’ suggestions, we removed speculations about the role of network activity changes.

      In summary, our data confirmed that TTX blocked APs in PV+ cells and that recordings were stable as indicated by lack of changes in series resistance during the recording period in our experimental setup (new Suppl. Figure 2f-i). We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g, right), indicating that the observed difference in sEPSC amplitude (Figure 1c, right) could be due to impaired AP-dependent release in cHet mice and the presence of large-amplitude sEPSCs that are preferentially affected by TTX in control mice (new Suppl. Figure 2b-e). Conversely, cHet mice showed longer inter-mEPSC time interval (cumulative distribution in Figure 1g, left), and significantly lower charge transfer and DQ*f (Figure 1j) compared to controls littermates, suggesting a decrease of glutamatergic presynaptic release sites onto PV+ cells. 

      Concerns about the quality of the synapse counting experiments were addressed by showing additional images in a different and explaining quantification. However, the admitted restriction of the analysis of excitatory synapses to the somatic region represent a limitation, as they include only a small fraction of the total excitation - even if, the slightly larger amplitudes of their EPSPs are considered.

      We agree with the reviewer that restricting the anatomical analysis of excitatory synapses to PV cell somatic region is a limitation, as highlighted it in the discussion of the revised manuscript. Recent studies, based on serial block-face scanning electron microscopy, suggest that cortical PV+ interneurons receive more robust excitatory inputs to their perisomatic region as compared to pyramidal neurons (see for example, Hwang et al. 2021, Cerebral Cortex, http://doi.org/10.1093/cercor/bhaa378). It is thus possible that putative glutamatergic synapses, analysed by vGlut1/PSD95 colocalisation around PV+ cell somata, may be representative of a substantially major excitatory input population. Since analysing putative excitatory synapses onto PV+ dendrites would be difficult and require a much longer time, we re-phrased the text to more clearly highlight the rationale and limitation of this approach.

      New experiments using paired-pulse stimulation provided an answer to issues 3 and 4. Note that the numbering of the Figures in the responses and manuscript are not consistent.

      We are glad that the reviewer found that the new paired-pulse experiments answered previously raised concerns. We corrected the discrepancy in figure numbers in the manuscript. Thank you for noticing.

      I agree that low sampling rate of the APs does not change the observed large differences in AP threshold, however, the phase plots are still inconsistent in a sense that there appears to be an offset, as all values are shifted to more depolarized membrane potentials, including threshold, AP peak, AHP peak. This consistent shift may be due to a non-biological differences in the two sets of recordings, and, importantly, it may negate the interpretation of the I/f curves results (Fig. 5e).

      We agree with the reviewers that higher sampling rate would allow to more accurately assess different parameters, such as AP peak, half-width, rise time, etc., while it would not affect the large differences in AP threshold we observed between control and mutant mice. Since the phase plots to not add to our result analysis, we removed them from the revised manuscript. 

      Additional issues:

      The first paragraph of the Results mentioned that the recorded cells were identified by immunolabelling and axonal localization. However, neither the Results nor the Methods mention the criteria and levels of measurements of axonal arborization.

      Recorded MGE-derived interneurons were filled with biocytin, and their identity was confirmed by immunolabeling for neurochemical markers (PV or SST) and analysis of anatomical properties. In particular, whole biocytin-positive immunolabelled neurons were acquired using a Leica SP8-DLS confocal microscope (20x objective, NA 0.75; Z-step 1 1μm).  For each imaged neuron, which was the result of multiple merged confocal stacks, we visually determined the spatial distribution across cortical layers of the axonal arbor and whether its dendrites carried spines.  We added this information in the method section. Furthermore, to better represent our methodological approach, we added a new figure (Supplemental Figure 1) including 1) two examples of PV+ interneurons, showing dendrites devoid of spines and axons spreading from Layer II to Layer V (new Suppl. Figure 1a); and 2) two examples of SST+ interneurons showing dendritic with spines and axons projecting from Layer IV to Layer I where they gave rise to multiple collaterals (new Suppl. Figure 1b).  

      The other issues of the first review were adequately addressed by the Authors and the manuscript improved by these changes.

      We are happy the reviewer found that the other issues were well addressed.

      Reviewer #3 (Public review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences between control and mutants in both interneuron populations, although they claim a predominance in PV+ cells. These results suggest that altered PVinterneuron functions in the auditory cortex may contribute to the network dysfunctions observed in Syngap1 haploinsufficiency-related intellectual disability.

      The subject of the work is interesting, and most of the approach is rather direct and straightforward, which are strengths. There are also some methodological weaknesses and interpretative issues that reduce the impact of the paper.

      (1) Supplementary Figure 3: recording and data analysis. The data of Supplementary Figure 3 show no differences either in the frequency or amplitude of synaptic events recorded from the same cell in control (sEPSCs) vs TTX (mEPSCs). This suggests that, under the experimental conditions of the paper, sEPSCs are AP-independent quantal events. However, I am concerned by the high variability of the individual results included in the Figure. Indeed, several datapoints show dramatically different frequencies in control vs TTX, which may be explained by unstable recording conditions. It would be important to present these data as time course plots, so that stability can be evaluated. Also, the claim of lack of effect of TTX should be corroborated by positive control experiments verifying that TTX is working (block of action potentials, for example). Lastly, it is not clear whether the application of TTX was consistent in time and duration in all the experiments and the paper does not clarify what time window was used for quantification.

      We understand the reviewer’s concern about high variability. To account for this variability, we recorded sEPSC followed by mEPSC from more mice of both genotypes (see new Figure 1f-j). We confirmed that TTX worked as expected several times through the time course of this study, in different aliquots prepared from the same TTX vial that was used for all experiments. The results of the last test we performed, showing that TTX application blocks action potentials in a PV+ cell, are depicted in new Suppl. Figure 2a. Furthermore, new Suppl. Figure 2f-i shows series resistance (Rs) over time for 4 different PV+ interneurons, indicating recording stability. These results are representative of the entire population of recorded neurons, which we have meticulously analysed one by one. TTX was applied using the same protocol for all recorded neurons. In particular, sEPSCs were first sampled over a 2 min period. A TTX (1μM; Alomone Labs)-containing solution was then perfused into the recording chamber at a flow rate of 2 mL/min. We then waited for 5 min before sampling mEPSCs over a 2 min period. We added this information in the revised manuscript methods.

      (2)  Figure 1 and Supplementary Figure 3: apparent inconsistency. If, as the authors claim, TTX does not affect sEPSCs (either in the control or mutant genotype, Supplementary Figure 3 and point 1 above), then comparing sEPSC and mEPSC in control vs mutants should yield identical results. In contrast, Figure 1 reports a _selective_ reduction of sEPSCs amplitude (not in mEPSCs) in mutants, which is difficult to understand. The proposed explanation relying on different pools of synaptic vesicles mediating sEPSCs and mEPSCs does not clarify things. If this was the case, wouldn't it also imply a decrease of event frequency following TTX addition? However, this is not observed in Supplementary Figure 3. My understanding is that, according to this explanation, recordings in control solution would reflect the impact of two separate pools of vesicles, whereas, in the presence of TTX, only one pool would be available for release. Therefore, TTX should cause a decrease in the frequency of the recorded events, which is not what is observed in Supplementary Figure 3.

      To account for the large variability and clarify these results, we recorded sEPSCs followed by mEPSCs from more mice of both genotypes (new Figure 1f-j). We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g, right), indicating that the observed difference in sEPSC amplitude (Figure 1c, right) could be due to impaired AP-dependent release in cHet mice and the presence of large-amplitude sEPSCs that are preferentially affected by TTX in control mice (new Suppl. Figure 2b-e). Conversely, cHet mice showed longer inter-mEPSC time interval (cumulative distribution in Figure 1g, left), and significantly lower charge transfer and DQ*f (Figure 1j) compared to controls littermates, suggesting a decrease of glutamatergic presynaptic release sites. We rephrased the text in the revised manuscript according to the updated data and, following the reviewer’s suggestions, we removed speculations relying on different pools of synaptic vesicles.

      (3) Figure 1: statistical analysis. Although I do appreciate the efforts of the authors to illustrate both cumulative distributions and plunger plots with individual data, I am confused by how the cumulative distributions of Figure 1b (sEPSC amplitude) may support statistically significant differences between genotypes, but this is not the case for the cumulative distributions of Figure 1g (inter mEPSC interval), where the curves appear even more separated. A difference in mEPSC frequency would also be consistent with the data of Supplementary Fig 2b, which otherwise are difficult to reconciliate. I would encourage the authors to use the Kolmogorov-Smirnov rather than a t-test for the comparison of cumulative distributions.

      We thank the reviewer for this thoughtful suggestion. We recorded more mice of both genotypes and the updated data now show a significant difference between the cumulative distributions of the inter mEPSC intervals recorded from the two genotypes (new Figure 1g). For statistical analysis, we based our conclusion on the statistical results generated by LMM, modelling animal as a random effect and genotype as fixed effect. We used this statistical analysis because we considered the number of mice as independent replicates and the number of cells in each mouse as repeated measures (Berryer et al. 2016; Heggland et al., 2019; Yu et al., 2022). For cumulative distributions, the same number of events was chosen randomly from each cell and analysed by LMM, modelling animal as a random effect and genotype as fixed effect. The reason we decided to use LMM for our statistical analyses is based on the growing concern over reproducibility in biomedical research and the ongoing discussion on how data are analysed (see for example, Yu et al (2022), Neuron 110:21-35 https://doi: 10.1016/j.neuron.2021.10.030; Aarts et al. (2014). Nat Neurosci 17, 491–496. https://doi.org/10.1038/nn.3648). We acknowledge that patch-clamp data has been historically analysed using t-test and analysis of variance (ANOVA), or equivalent nonparametric tests. However, these tests assume that individual observations (recorded neurons in this case) are independent of each other. Whether neurons from the same mouse are independent or correlated variables is an unresolved question, but does not appear to be likely from a biological point of view. Statisticians have developed effective methods to analyze correlated data, including LMM.

      (4) Methods. I still maintain that a threshold at around -20/-15 mV for the first action potential of a train seems too depolarized (see some datapoints of Fig 5c and Fig7c) for a healthy spike. This suggest that some cells were either in precarious conditions or that the capacitance of the electrode was not compensated properly.

      As suggested by the reviewer, in the revised figures we excluded the neurons with threshold at -20/-15 mV. In addition, we performed statistical analysis with and without these cells (data reported below) and found that whether these cells are included or excluded, the statistical significance of the results does not change.

      Fig.5c: including the 2 outliers from cHet group with values of -16.5 and 20.6 mV: 42.6±1.01 mV in control, n=33 cells from 15 mice vs -35.3±1.2 mV in cHet, n=40 cells from 17 mice, ***p<0.001, LMM; excluding the 2 outliers from cHet group -42.6±1.01 mV in control, n=33 cells from 15 mice vs -36.2±1.1 mV in cHet, n=38 cells from 17 mice, ***p<0.001, LMM.

      Fig.7c: including the 2 outliers from cHet group with values of -16.5 and 20.6 mV: 43.4±1.6 mV in control, n=12 cells from 9 mice vs -33.9±1.8 mV in cHet, n=24 cells from 13 mice, **p=0.002, LMM; excluding the 2 outliers from cHet group -43.4±1.6 mV in control, n=12 cells from 9 mice vs -35.4±1.7 mV in cHet, n=22 cells from 13 mice, *p=0.037, LMM.

      (5) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties (Figure 8d,e); however, their evoked firing properties were affected with fewer AP generated in response to the same depolarizing current injection".

      This sentence is intrinsically contradictory. Action potentials triggered by current injections are dependent on the integration of passive and active properties. If the curves of Figure 8f are different between genotypes, then some passive and/or active property MUST have changed. It is an unescapable conclusion. The general _blanket_ statement of the authors that there are no significant changes in active and passive properties is in direct contradiction with the current/#AP plot.

      We agreed with the reviewer and rephrased the abstract, results and discussion according to better represent the data. As discussed in the previous revision, it's possible that other intrinsic factors, not assessed in this study, may have contributed to the effect shown in the current/#AP plot. 

      (6) The phase plots of Figs 5c, 7c, and 7h suggest that the frequency of acquisition/filtering of current-clamp signals was not appropriate for fast waveforms such as spikes. The first two papers indicated by the authors in their rebuttal (Golomb et al., 2007; Stevens et al., 2021) did not perform a phase plot analysis (like those included in the manuscript). The last work quoted in the rebuttal (Zhang et al., 2023) did perform phase plot analysis, but data were digitized at a frequency of 20KHz (not 10KHz as incorrectly indicated by the authors) and filtered at 10 kHz (not 2-3 kHz as by the authors in the manuscript). To me, this remains a concern.

      We agree with the reviewer that higher sampling rate would allow to more accurately assess different AP parameters, such as AP peak, half-width, rise time, etc. The papers were cited in context of determining AP threshold, not performing phase plot analysis. We apologize for the confusion and error. Finally, we removed the phase plots since they did not add relevant information. 

      (7)  The general logical flow of the manuscript could be improved. For example, Fig 4 seems to indicate no morphological differences in the dendritic trees of control vs mutant PV cells, but this conclusion is then rejected by Fig 6. Maybe Fig 4 is not necessary. Regarding Fig 6, did the authors check the integrity of the entire dendritic structure of the cells analyzed (i.e. no dendrites were cut in the slice)? This is critical as the dendritic geometry may affect the firing properties of neurons (Mainen and Sejnowski, Nature, 1996).

      As suggested by the reviewer, we removed Fig.4. All the reconstructions used for dendritic analysis contained intact cells with no evidently cut dendrites.

    1. eLife Assessment

      This paper describes the structure and connectivity of brain neurons that send descending connections to motor neurons and muscle in the fruit fly nerve cord, using a synapse-resolution connectome. This important work provides a wealth of hypotheses and predictions for future experimentation and modelling. Using state-of-the-art methods, the authors provide solid evidence for their conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      Cheong et al. use a synapse-resolution wiring map of the fruit fly nerve cord to comprehensively investigate circuitry between descending neurons (DNs) from the brain and motor neurons (MNs) that enact different behaviours. These neurons were painstakingly identified, categorised, and linked to existing genetic driver lines; this allows the investigation of circuitry to be informed by the extensive literature on how flights walk, fly, and escape from looming stimuli. New motifs and hypotheses of circuit function were presented. This work will be a lasting resource for those studying nerve cord function.

      Strengths:

      The authors present an impressive amount of work in reconstructing and categorising the neurons in the DN to MN pathways. There is always a strong link between the circuitry identified and what is known in the literature, making this an excellent resource for those interested in connectomics analysis or experimental circuits neuroscience. Because of this, there are many testable hypotheses presented with clear predictions, which I expect will result in many follow-up publications. Most MNs were mapped to the individual muscles that they innervate by linking this connectome to pre-existing light microscopy datasets. When combined with past fly brain connectome datasets (Hemibrain, FAFB) or future ones, there is now a tantalising possibility of following neural pathways from sensory inputs to motor neurons and muscle.

      Weaknesses:

      As with all connectome datasets, the sample size is low, limiting statistical analyses. Readers should keep this in mind, but note that this is the current state-of-the-art. Some figures are weakened by relying too much on depictions of wiring diagrams without additional quantification of connectivity. Readers may find the length of this work challenging, particularly the initial anatomical descriptions of the dataset, which span many figures and may not be of interest to those outside of the subfield.

    3. Reviewer #2 (Public review):

      Summary:

      In Cheong et al., the authors analyze a new motor system (ventral nerve cord) connectome of Drosophila. Through proofreading, cross-referencing with another female VNC connectome, they define key features of VNC circuits with a focus on descending neurons (DNs), motor neurons (MNs), and local interneuron circuits. They define DN tracts, MNs for limb and wing control and their nerves (although their sample suffers for a subset of MNs). They establish connectivity between DNs and MNs (minimal). They perform topological analysis of all VNC neurons including interneurons. They focus specifically on identifying core features of flight circuits (control of wings and halteres), leg control circuits with a focus on walking rather than other limbed behaviors (grooming, reaching, etc.), intermediate circuits like those for escape (GF). They put these features in the context of what is known or has been posited about these various circuits.

      Strengths

      Some strengths of the manuscript include the matching of new DN and MN types to light microscopy, including serial homology of leg motor neurons. This is a valuable contribution that will certainly open up future lines of experimental work. As well, the analysis of conserved connectivity patterns within each leg neuromere and interconnecting connectivity patterns between neuromeres will be incredibly valuable. The standard leg connectome is very nice. Finally, the finding of different connectivity statistics (degrees of feedback) in different neuropils is quite interesting and will stimulate future work aimed at determining its functional significance.

      Weaknesses

      The degradation of many motor neurons is unfortunate. Figure 5 supplement 1 shows that roughly 50% of the leg motor neurons have significantly compromised connectivity data, whereas for non-leg motor neurons, few seem to be compromised. As well, the infomap communities don't seem to be so well controlled/justified. Community detection can be run on any graph - why should I believe that the VNC graph is actually composed of discrete communities? Perhaps this comes from a lack of familiarity with the infomap algorithm, but I imagine most readers will be similarly unfamiliar with it, so more work should be done to demonstrate the degree to which these communities are really communities that connect more within than across communities.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Cheong et al. use a synapse-resolution wiring map of the fruit fly nerve cord to comprehensively investigate circuitry between descending neurons (DNs) from the brain and motor neurons (MNs) that enact different behaviours. These neurons were painstakingly identified, categorised, and linked to existing genetic driver lines; this allows the investigation of circuitry to be informed by the extensive literature on how flights walk, fly, and escape from looming stimuli. New motifs and hypotheses of circuit function were presented. This work will be a lasting resource for those studying nerve cord function.

      Strengths:

      The authors present an impressive amount of work in reconstructing and categorising the neurons in the DN to MN pathways. There is always a strong link between the circuitry identified and what is known in the literature, making this an excellent resource for those interested in connectomics analysis or experimental circuits neuroscience. Because of this, there are many testable hypotheses presented with clear predictions, which I expect will result in many follow-up publications. Most MNs were mapped to the individual muscles that they innervate by linking this connectome to pre-existing light microscopy datasets. When combined with past fly brain connectome datasets (Hemibrain, FAFB) or future ones, there is now a tantalising possibility of following neural pathways from sensory inputs to motor neurons and muscle.

      Weaknesses:

      As with all connectome datasets, the sample size is low, limiting statistical analyses. Readers should keep this in mind, but note that this is the current state-of-the-art. Some figures are weakened by relying too much on depictions of wiring diagrams as evidence of circuit function, similarity between neuropils, etc. without additional quantitative justification.

      We thank the reviewer for their helpful comments. We are excited about the release of this densely reconstructed connectome and its potential to facilitate circuit exploration in the VNC. We note that while statistical methods for analyzing complicated networks such as the connectome are still being developed, the wiring diagrams presented are themselves visualizations of quantitative data. We address specific concerns below.

      Reviewer #2 (Public Review):

      Summary:

      In Cheong et al., the authors analyze a new motor system (ventral nerve cord) connectome of Drosophila. Through proofreading, cross-referencing with another female VNC connectome, they define key features of VNC circuits with a focus on descending neurons (DNs), motor neurons (MNs), and local interneuron circuits. They define DN tracts, MNs for limb and wing control, and their nerves (although their sample suffers for a subset of MNs). They establish connectivity between DNs and MNs (minimal). They perform topological analysis of all VNC neurons including interneurons. They focus specifically on identifying core features of flight circuits (control of wings and halteres), leg control circuits with a focus on walking rather than other limbed behaviors (grooming, reaching, etc.), and intermediate circuits like those for escape (GF). They put these features in the context of what is known or has been posited about these various circuits.

      Strengths:

      Some strengths of the manuscript include the matching of new DN and MN types to light microscopy, including the serial homology of leg motor neurons. This is a valuable contribution that will certainly open up future lines of experimental work.

      Also, the analysis of conserved connectivity patterns within each leg neuromere and interconnecting connectivity patterns between neuromeres will be incredibly valuable. The standard leg connectome is very nice.

      Finally, the finding of different connectivity statistics (degrees of feedback) in different neuropils is quite interesting and will stimulate future work aimed at determining its functional significance.

      We thank the reviewer for their constructive feedback, and are optimistic about the utility of the MANC connectome to the Drosophila neurobiology community in dissecting VNC circuit function.

      Weaknesses:

      First, it seems like quite a limitation that the neurotransmitter predictions were based on training data from a fairly small set of cells, none of which were DNs. It's wonderful that the authors did the experimental work to map DN neurotransmitter identity using FISH, and great that the predictions were overall decently accurate for both ACh and Glu, but unfortunate that they were not accurate for GABA. I hope there are plans to retrain the neurotransmitter predictions using all of this additional ground truth experimental data that the authors collected for DNs, in order to provide more accurate neurotransmitter type predictions across more cell types.

      The reviewer makes an excellent suggestion, and collecting further ground truth data and retraining the neurotransmitter classifier is an ongoing research project. 

      Second, the degradation of many motor neurons is unfortunate. Figure 5 Supplement 1 shows that roughly 50% of the leg motor neurons have significantly compromised connectivity data, whereas, for non-leg motor neurons, few seem to be compromised. If that is the correct interpretation of this figure, perhaps a sentence like this that includes some percentages (~50% of leg MNs, ~5% of other MNs) could be added to the main text so that readers can get a sense of the impact more easily.

      Thank you for this suggestion. We have added a line describing the percentage of leg and other MNs affected (L416-417).

      As well, Figure 5 Supplement 1 caption says "Note that MN groups where all members of the group have reconstruction issues may not be flagged" - could the authors comment on how common they think this is based on manual inspection? If it changes the estimate of the percentage of affected leg motor neurons from 50% to 75% for example, this caveat in the current analysis would need to be addressed more directly. Comparing with FANC motor neurons could perhaps be an alternative/additional approach for estimating the number of motor neurons that are compromised.

      We agree that a direct comparison to another dataset, such as FANC, would aid in identifying reconstruction issues. However, a full analysis is not currently possible as only a minority of FANC neurons have been proofread or annotated. We were able to gain some insights into reconstruction quality by looking at T1 motor neurons, where FANC MN reconstruction is more complete. As reported in the submitted manuscript, we were able to confidently match T1 MNs between FANC and MANC for all but one MN (we are missing one ltm MN on the right side of MANC). While some of the MANC neurons had smaller/less dense arbors than FANC, none of them would have been flagged as having reconstruction issues. However, for FANC, we observe that neurons on the right have less dense arbors and fewer reconstructed synapses than neurons on the left.  We have prepared a reviewer figure analyzing the consistency of synapse counts for the T1 (front leg) MNs:

      Author response image 1.

      In these results (MANC on the left, FANC on the right) we compare the number of input synapses on matched motor neurons on the left (LHS) and right hand side (RHS) of each dataset. We see that the MANC distribution is much more symmetric, indicating left and right hand side synapse counts for matched MNs are more similar in MANC. This is likely largely due to the left-right difference in reconstruction completeness in the FANC T1 leg neuropils. The number of synapses per cell type is also more variable in FANC. Overall, we recommend that end users should inspect the morphology and total synapse counts of individual MNs of interest in either dataset as part of any detailed analysis.

      This analysis might benefit from some sort of control for true biological variability in the number of MN synapses between left and right or across segments. I assume the authors chose the threshold of 0.7 because it seemed to do a good job of separating degraded neurons from differences in counts that could just be due to biological variability or reconstruction imperfections, but perhaps there's some way to show this more explicitly. For example, perhaps show how much variability there is in synapse counts across all homologs for one or two specific MN types that are not degraded and are reconstructed extremely well, so any variability in input counts for those neurons is likely to be biologically real. Especially because the identification of serial homologs among motor neurons is a key new contribution of this paper, a more in-depth analysis of similarities and differences in homologous leg MNs across segments could be interesting to the field if the degradation doesn't preclude it.

      We agree that there can be ambiguity in whether variability in synapse counts between left-right homologs of a MN type represents biological variability or technical issues. We have added a comparison of synapse counts of T1 leg MNs in MANC (Left) vs FANC (Right) as noted in the previous point. As the number of connectomes available to us increases, we will have a better idea of how synapse counts of MNs vary within and between animals.

      Fourth, the infomap communities don't seem to be so well controlled/justified. Community detection can be run on any graph - why should I believe that the VNC graph is actually composed of discrete communities? Perhaps this comes from a lack of familiarity with the infomap algorithm, but I imagine most readers will be similarly unfamiliar with it, so more work should be done to demonstrate the degree to which these communities are really communities that connect more within than across communities.

      A priori we expect that there is some degree of functional division between circuits controlling different limbs or motor systems, given current evidence that VNC neuropils and neural hemilineages are relatively specialized in controlling motor output. We have added this explanation to section 2.4.2 (L633-635).

      The Infomap algorithm was chosen out of several directed and undirected community detection methods that we tried, as it defined communities that each had connectivity with narrow and specific motor neuron subclasses. For example, it labeled populations in each of the six leg neuropils as belonging to distinct communities. We think this provides an interesting partitioning of the VNC network that could have biological relevance (which future functional studies should investigate). To the reviewer’s final sentence, we do show intra- vs inter-community connectivity in Fig. 9–supplement 1B. Notably, most communities except several small ones have far more intra-community connectivity than inter-community connectivity. We have added text highlighting this observation (L656-658).

      We do, however, agree with the general point of the reviewer that it is not yet known which community detection methods are ‘optimal’ for use with connectomics data, so we have added further text (L679-683) explaining that community detection in MANC will require further investigation and validation in the future.

      I think the length of this manuscript reduces its potential for impact, as I suspect the reality is that many people won't read through all 140 pages and 21 main figures of (overall excellent) work and analysis.

      We intend this paper to serve not only as a first look into the organization of descending-to-motor circuits, but also as a resource for future investigations in MANC. The provided detail is intended to serve these purposes.

      Reviewer #1 (Recommendations For The Authors):

      General comments:

      I find that there are too many main figures with too much content in them, as well as too much corresponding text. Much of the initial anatomical identification and description could be summarised in fewer main figures, with more supplementary figures if the authors desired. I think there is a lot of great insight in this paper, particularly in the second half, but I am concerned that the extensive detail in the initial sections may challenge reader engagement through to the later sections of the paper. It would also be useful to have a higher level and shorter discussion.

      Reiterating our response from above, we intend this paper to serve not only as a first look into the organization of descending-to-motor circuits, but also as a resource for future investigations in MANC. The provided detail is intended to serve these purposes.

      There is sometimes an over-reliance on wiring diagrams or complex plots as evidence without further quantification. I will mention several examples below, as well as additional suggestions.

      Specific comments:

      In Figure 2E, how are DNs divided into pair vs population type? This was a very interesting idea, particularly in light of "command-like" neurons vs ensembles of DNs controlling behaviour. However, it is not clear how this distinction is made. This concept is referenced throughout the manuscript, so I think a clear quantitative way of identifying "pair" vs "population" identity for each DN would be very useful. And at the very least, a thorough explanation of how it is done in the current manuscript.

      We have added additional text in the Figure 2 legend to point towards Materials and Methods where the DN grouping (pair vs. population) is explained. These groups were formed based on morphology and further split into types based on connectivity, if needed. However, as the connectome represents a static snapshot of connectivity with no functional data, it remains possible that some DNs that were grouped as populations may act functionally as multiple pairs. Future work should continue to update these annotations.

      In Figure 4, there are some inconsistencies between neurotransmitter predictions and experimental FISH data. Have the authors taken into consideration Lacin et al. 2019 (https://elifesciences.org/articles/43701)? Specifically in that paper, it is stated: "We did not find any cases of neurons using more than one neurotransmitter, but found that the acetylcholine specific gene ChAT is transcribed in many glutamatergic and GABAergic neurons, but these transcripts typically do not leave the nucleus and are not translated." I wonder if this might explain some of the inconsistencies between FISH (mRNA detection) and the neurotransmitter predictions (presumably based on indirect protein structures detected via EM imagery), or the presence of so much co-transmission.

      We agree and have added this possible explanation for apparent co-transmission in the text (L394-397).

      In Figure 8B, the authors state: "We found that individual DN and MN subclasses have direct downstream and upstream partners, respectively, that are relatively hemilineage-restricted (Figure 8B)." While the connectivity patterns highlighted are intriguing, further quantitative analysis could help strengthen this point. The connectivity matrices in Figure 8B are linked to activation phenotypes and hemilineages below. But I don't really know how to interpret "relatively hemilineage-restricted" in light of this plot. How does this connectivity pattern for example compare statistically to a randomly selected set of DNs (maintaining the same group size for example)? Would random DN sets be less hemilineage restricted? Similar quantification would be helpful to support this statement "...with high correspondence between the hemilineages connected to individual DN and MN subclasses that are expected to be functionally related."

      "both upper tectulum DNs (DNut) and wing MNs (MNwm) have significant connectivity with hemilineages 6A, 7B, 2A, 19B, 12A and 3B". What is significant connectivity? Looking at the plot in Figure 8B, why is DNut -> 16B not considered significant? Is there a threshold and if so, what is the justification?

      These plots aim to be descriptive rather than drawing hard quantitative thresholds between ‘significant’ and ‘non-significant’ connectivity. We have revised the text to remove the terms ‘restricted’ and ‘significant’ and to clarify our interpretation (L555-559).

      In Figure 9G-H, this is a very interesting finding, but how do we know that the difference is real? Why not do a statistical test to compare the brain and VNC? Or create a null model network with edge swaps, etc. to compare against.

      Statistical comparison between the brain and VNC may be problematic given differences in generating these connectomes, as well as missing connectivity (only half the brain is imaged) in the hemibrain connectome. Comparison to a null model is possible and for purposes of understanding motif frequency in general has already been done (see for example, Lin et al., 2024, Nature). However, a null or shuffled model is not required for comparing motif frequencies between brain or VNC neuropils as is the point of this particular graph. At present, we simply highlight a qualitative observation that will require future work to investigate.

      Referring to Figure 12 in the main text, "we observe that the power MN upstream network is largely shared among all power MNs and is highly bilateral." Quantifying the fraction of shared upstream neurons from power MNs would make this statement much stronger. Particularly if compared to other non-power MNs. Or potentially using some other network comparison metric.

      This is a good point. We have added cosine similarity to figure 6 for wing/haltere MNs to show the similarity between inputs across these MNs, and added text in section 2.3 (L461-465) and 2.5.3 discussing the cosine similarity (L987-988).

      In Figure 13B, "Nearly 50% of these restricted neurons (totalling about 1200 per leg neuropil) have been serially matched across the six neuropils (Figure 13B)". There seems like a disconnect here. In the IR, CR, and BR columns, I see ~2750, ~500, and ~1250 neurons not in a serial set (~4500 total); I see ~1500, ~750, and ~1000 in a serial set (~3250 total). This would mean that ~58% of neurons are not in serial sets, ~42% are in serial sets. Shouldn't the conclusion be the opposite then? That surprisingly most intrinsic neurons are not repeated across leg neuropils. I find this fascinating if true. Perhaps there is some confusion on my part, however.

      We now find that about half of the leg-restricted neurons are serially repeated across the 6 leg neuropil with similar morphology and connectivity, especially to the downstream leg motor neurons. Since first submission of this paper, we have identified some additional serial homologues while completing the systematic cell typing, described in the accompanying paper Marin et al. 2024. Figure 13B has now been updated to reflect this. In total, 3998 of 7684 restricted neurons (IR,CR,BR) have been assigned to a serial set or serial type. The sentence in the text has been adjusted to report that 52% of these restricted neurons are in serial sets (L1125).

      In Figure 13D-E, "the Tect INs are not a homogenous population." Providing additional evidence could strengthen this statement. A connectivity matrix is shown in (D), followed by examples of morphologies in (E). What makes a population homogenous or heterogenous? For example, compared to all possible INs, the Tect IN morphology actually looks quite similar. Are those connectivity matrices in (D) really so different? What would a random selection of neurons look like?

      Our sister paper, Marin et al. (2024), has looked into variation of connectivity across neurons of the entire VNC in much more detail, including clustering methods that include connectivity and other criteria for cell typing. Thus, we have now amended the text to direct the reader to that paper for more detail on variability of connectivity in the Tect INs, which were divided into 5 cell types in Marin et al. (2024) (L1027-1031). In addition, we have replaced our clustering by connectivity in Figure 13 with the cell type clusters from Marin et al. (2024).

      In reference to Figure 13 - Supplement 1, "This standard leg connectome was very similar across legs, but there were small deviations 1051 between T1, T2, and T3 legs, as shown in Figure 13-Supplement 1." - what makes a deviation considered small? T1 seems to generally have many more synapses, T2 many less, and T3 a mixture depending on the connection. Also, are there lost connections or new connections? A quantification of these issues would be helpful instead of simply depicting the wiring diagrams.

      The connections that differ are likely due to the reconstruction state of leg MNs. We have now stated this in the main text for clarification (L1143-1145). In the leg neuropils, T2 and T3 left hand side MNs have sparser dendritic arbors than the right hand side. Therefore the differences in Figure 13–Supplement 1, which are almost exclusively the connections between the leg restricted neurons onto leg MNs, seem stronger in T1. Future work, bolstered by additional datasets, will undoubtedly reveal further insight into the comparison of circuits for the different legs.

      In Figure 15 - Supplement 2, "We used effective connectivity to identify leg DNs with similar MN connectivity patterns (Figure 15-Supplement 2). Of previously identified DNs, we found that DNg13 showed a highly similar effective connectivity fingerprint."

      How was this similarity calculated? How do we know these particular DNs have similar effective connectivity? The connectivity matrix depicted is quite complex, with both layer and connectivity scores quantified at each location. A principled way of determining similarity would make this statement much stronger.

      The similarity was calculated simply as the Euclidean distance between the effective connectivity matrix for each DN onto the set of MNs. While this is a straightforward comparison mathematically, effective connectivity calculations (as first introduced in this context by Li et al., 2020 by our collaborators Larry Abbott and Ashok Litwin-Kumar) have not yet been subject to functional validation. We therefore agree with the reviewer that this should not be over interpreted at this point. Future functional work should explore hypotheses suggested here and more quantitatively compare the similarity of different DN-MN pathways.

      Minor notes:

      In Figure 4E, the circles, squares, and triangles in the figure legend are too small. This is also true to some extent in the plot itself.

      We have increased the size of the symbols in the legend and plot.

      In Figure 8E right, the figure legend and x/y axes are not clear to me. Unfortunately, I'm not sure what the plot is showing because of this.

      The right plot in figure 8E is the number of DN groups each MN group receives input from, at a threshold of 1% input. As this plot is redundant to the left plot, we have decided to remove it.

      In Figure 8I, it would be interesting to see which neurons are directly downstream of DNs. One can't see layers 2/3/4 with the fan-out expansion of neurons and the y-axis scale.

      We have revised the plot to better show cell composition of individual layers.

      In Figure 19E, it would be helpful to also have a standard y-axis.

      The panel has been revised accordingly.

      Reviewer #2 (Recommendations For The Authors):

      General:

      In the Title, you do not mention DNs or MNs but these are a major focus of this study. The title could be more descriptive of the work.

      Per the reviewer’s comments, we have revised the title to “Transforming descending input into motor output: An analysis of the Drosophila Male Adult Nerve Cord connectome”.

      A glossary would be helpful, where all the paper's abbreviations and their definitions are provided in one place. Perhaps a hierarchical structure would help (for at least part of the glossary), so that terms like NTct, WTct, and HTct could be nested underneath UTct, for example.

      We do include a glossary in the sister paper, Marin et al. (2024) and in this paper have included a short glossary in the first Figure. Please refer to these sources for abbreviation reference.

      Introduction:

      Define 'Premotor'.

      We have defined ‘premotor circuits’ to be ‘circuits that directly or indirectly control motor output’ in lines 45-46.

      It might be worthwhile to start with a broader introduction sentence than the current one that focuses just on the fly, in order to emphasize the impact of MANC as the first complete connectome of a motor circuit in any animal with limbs or wings.

      We have revised the introductory paragraph per the reviewer’s suggestions.

      "Muscles in the leg are not innervated uniformly; indeed, in the T1 legs the number of MNs per muscle varies by as much as an order of magnitude" needs to specify the axis of variability more clearly - the authors probably mean variability across muscles in the leg (not variability across individuals for example) but I think the current sentence is a bit ambiguous in that respect.

      We have reworded this sentence to clarify this point (L132-133).

      Line 182 end of paragraph: It would be useful to point out explicitly what makes the MANC project valuable in the context of a similar FANC project - for example, that the MANC connectome is more complete, is a male (so interesting for anyone interested in sexual dimorphism), and gives the field an n=2 for VNC connectome datasets.

      We agree, and have added a sentence describing the benefits of the MANC connectome on L209-212.

      Line 213: A brief phrase or sentence of context could be provided to help unaware readers understand that 42% of synaptic connectivity being captured is in the same sort of range as previous datasets like the hemibrain and likely leads to the vast majority of important cell-cell connections being identified (perhaps cite Buhmann et al 2021 Nature Methods which does an analysis of this), and therefore is a reason to think highly of this dataset's quality and its potential for impact on the field. The sentence at the end of this paragraph doesn't quite do it for me.

      We have added the comparison of MANC synapse completeness to that of the Hemibrain, and revised the ending sentence in L234-237.

      Line 271: Clarify what happened to the remaining 15% of DNs that weren't able to be assigned to a tract. They travelled outside the tracts, or data quality issues prevented assignment, or something else?

      Indeed, some DNs could not be assigned to a tract as they traveled outside of all axon tracts and did not bundle with other DNs. We have added this explanation to the text (L300-301).

      Figure 1:

      The pie chart "DN postsynaptic partners by neuron class" is a bit hard to interpret without having another pie chart next to it showing "Neurons in MANC by neuron class". I know these numbers are written on the schematic but it would be nice to be able to easily tell which cell classes are overrepresented or underrepresented in the set of postsynaptic partners of DNs. e.g. It's obvious that ANs are overrepresented and DNs are underrepresented in the set of postsynaptic partners of DNs, but it would be nice if readers didn't have to do any mental math to figure out if INs or MNs are under/overrepresented.

      We agree and have added a pie chart of the neuron class composition of the entire VNC to Figure 1.

      "35.9% of leg MNs are matched to FANC" Why is this number so low? Because FANC motor neurons were only identified in T1, so the remaining 2/3rds of leg MNs in MANC weren't matched? How successful was matching for the neurons where it was actually attempted?

      For this work, we only matched the T1 neurons across the two datasets. This was both a way of checking that we found everything in these segments and a way of being more sure of muscle target assignments as our collaborators in the FANC dataset had generated extensive light level data to match motor neurons with their target leg muscles. The T2 and T3 MNs were not fully proofread or identified in FANC, precluding further analysis, and leading to the 35.9% matched number. We hope to be able to compare between these datasets more thoroughly in future, and have matched all the premotor leg restricted intrinsic neurons of our standard connectome to FANC. We report on their stereotypy in our latest preprint, Stürner, Brooks et al. 2024.

      Figure 2:

      Figure 2A: Perhaps darken the color of the MTD-III skeletons. Currently, they're so light it's hard to see, and this is one of the most interesting tracts because the claim is that it's a new tract.

      We take the reviewer’s point, however, the color scheme used for the tracts in Figure 2 is coordinated between multiple figures and figure panels, and thus we would prefer to keep it as is. If readers would like to examine DNs of a particular tract, we encourage them to retrieve said DNs using the tract annotations in NeuPrint.

      Figure 2 supplement 1: It's not clear to me what I should be getting out of seeing the right side DNs as well. If you want readers to be able to visually compare the left and right side morphologies and appreciate the high degree of symmetry, you may want to put the left and right side DN panels side-by-side. Perhaps do that (show both the left and right side DNs) for one or two tracts in the main Fig2, and then leave out the remaining panels - or if you want to include the remaining panels, explain more clearly what readers are supposed to learn from seeing them.

      We agree and have now removed Figure 2 supplement 1.

      Figure 2C caption: Instead of "DN primary neurites" I think the authors probably mean "longest single branch of each DN" or something along those lines. I think "primary neurite" is usually used to refer to the thick non-synaptic branch coming out of a neuron's soma, which can't be how it's being used here.

      We agree and have changed all references to ‘primary neurite’ for DNs to ‘longest neurite’.

      Figure 2D+E: Perhaps add an overall % of neurons of each class to the legend. I ask because I would be very interested to know what % of all DNs exist as single pairs versus as populations, and I imagine that could be a number that is quoted a fair amount by others in the field when talking about DNs.

      We agree and have added the overall percentage of each neuron class to the results (L275-276) and Figure 2 legend.

      Figure 3:

      UTct.IntTct neurons are by far the largest class of DNxn neurons, so would it be worth calling these the DNxt class (DN projecting to some combination of tectulum neuropils), to mirror the DNxl class? I would vote for doing that.

      Thanks for the suggestion.  However, the subclass naming scheme for DNs had been coordinated between multiple groups of people working on MANC reconstruction and annotation. As making changes to subclasses will impact many analyses that have already been completed for existing work, we will refrain from doing so.

      Figure 3G feels a bit out of place in this figure and under-explained

      We have clarified in the text our citations to Figure 3G to better explain our interpretation of this data.

      Figure 4

      "DNp20 has few vesicles and may be electrically coupled": If I'm correct that DNp20 is also known as DNOVS1 and is the second largest diameter axon in the neck after the giant fiber, then yes, Suver et al. 2016 J Neurosci show that this DN is gap junction coupled to neck motor neurons (see their Fig 2F). This neuron (along with the giant fiber) is enough of an outlier that it might be more representative to show a different, more canonical DN that has a low prediction probability.

      The reviewer is right that DNp20 is also known as DNOVS1 with known gap junction coupling.  We now clarify in the text (L366) how we think that could lead to a lower neurotransmitter prediction score, which is what we were trying to illustrate.

      Figure 4E: It looks like only a single DN has more inputs (~11000) than outputs (~9000), is that right? It could be interesting to dedicate some panels and text to the connectivity profile of that one unique neuron.

      Yes, that is correct, there is just one pair of DNs, DNxn166, that receives more input than it gives output (the two triangles lie on top of each other). We think that the other DN pair in that same box (more variable in total synapse number and therefore the triangles are further apart) also receives an unusually high amount of input versus output. The morphology of these two types are shown in Figure 4F and they both have fine processes that look more like dendrites, especially when compared to other DNs such as the ones in 4G. Unfortunately, neither of these two types have been matched to light microscopy images so we cannot say if they have the same type of morphology in the brain, or further explore their brain connectivity, at this time point.

      Figure 4E: "black rectangle ... gray rectangle" don't look different shades to me. It's obvious which is which based on where they are in the graph but if you want to color code this, pick more separate colors. Or code it with something other than colors.

      We have made the rectangle in Figure 4E a lighter shade of grey and added labels to refer to the panels D, F and G. The figure legend now also describes more clearly that we are plotting every DN as a single shape and exactly how many DN types are included in those rectangles to avoid confusion.

      Figure 5:

      "subclass is their two-letter muscle anatomical category" should be explained better, I'm not sure what "muscle anatomical category" means.

      We have changed the wording in the Figure 5 legend to better clarify that MN subclasses are the broad muscle category that they innervate (e.g. legs, wings).

      Figure 7:

      Leg MN identification and serial homology.

      Why are there no tarsus reductor (tarm1 and tarm2) motor neurons? Do we not know their anatomy from light microscopy well enough, perhaps? Were these MNs identified in FANC? Is it reasonable to guess that the remaining small number of unidentified T1 leg motor neurons in MANC would control these muscles? I think Marta Moita's lab has some ongoing projects on these muscles (see Twitter), so if more LM data is needed perhaps it will come from them.

      We now know that the small number of unidentified T1 leg motor neurons (a T1 pair with a serial T2 pair, serial set 17664) are not in fact MNs. A new and unpublished dataset (Janelia whole male CNS volume, the optic lobe from which has been published as Nern et al., 2025) shows they have axons within the VNC. The MN annotation for these neurons has been removed and they now have the type name INXXX471. Thus, we have no T1 leg MNs without a muscle target annotated. Our muscle target annotation comes from matching to the FANC dataset that has also not annotated tarsus reductor MNs. We suspect that the tarsus reductor MNs are hard to distinguish from the tarsus depressor MNs of which there are 5 per side and segment.

      It seems there are a few more leg motor neurons in MANC vs FANC. Any indication of which muscles they control?

      See above.

      -Figure 7E: A qualitative comparison between the cosine similarity results here and from FANC could be useful. What generally is the same versus different? Any indication of male/female differences?

      We observe no differences in the cosine similarity of T1 leg MNs between MANC and FANC and only very minor differences between T1, T2 and T3, as shown in Figure 7. In our most recent work, now on bioRxiv (Stürner, Brooks et al., 2024), we were able to find all intrinsic leg serial sets that we included in our standard leg premotor circuit here in the FANC dataset. We do not see any differences between them in terms of morphology, and while we have several cases in which we are still missing 1 of the 6 neurons in a serial set in FANC, we see similar connectivity when comparing small circuits. We have also found almost all neurons interconnecting the legs, with some very interesting exceptions, mainly coming from the abdomen, that we believe are male specific. These male-specific neurons can also be found in this preprint (Stürner, Brooks et al., 2024).

      Figure 8

      Figure 8A: Why are ~1/3rd of the wing and leg motor neurons considered populations instead of pairs? I thought essentially all wing and leg motor neurons have unique morphologies.

      Pair vs populations are assigned based on MN morphology and connectivity. For the wing MNs, many sets of DVMns and DLMns have near-identical morphology and connectivity, are not easily distinguishable in the VNC and are categorized as a ‘population’. For the leg MNs, there are ‘true’ population MN types that provide multiple innervation of the same muscle.

      The text states "up to a maximum of 20% [traversal probability] (corresponding to a synapse input fraction of 1)" but I interpret the bottom of Figure 8G to have flipped values, where a synapse input fraction of 0.2 yields a traversal probability of 1. Is there a mistake here or have I misunderstood?

      Thank you for pointing this discrepancy out. The text description was indeed flipped, and we have corrected this error.

      Caption for J says "Layers without neurons are omitted". How is it possible to have a layer without neurons?? Something about how the traversal is done doesn't seem to be explained clearly enough. If it's really possible to have a layer without neurons, I think the approach might need to be revisited as this seems quite strange.

      Here, ‘layer’ should be viewed as a nonlinear measure of indirect connectivity combining path length and synaptic weights. Layers without neurons are possible due to the details of the calculation–layer position is assigned probabilistically by the downstream synapse connectivity of the source neurons, and the probability is scaled up to 1 at an input synapse fraction of 0.2. Neuron-to-neuron connectivity of an input synapse fraction of >=0.2 is very rare in the VNC connectome and thus neurons strictly assigned to layer 2 downstream of each DN type are similarly rare. We have updated the figure legend for figure 8 to better explain this.

      Section 2.6

      "flies have been shown to walk normally without proprioceptive feedback, suggesting that inter- and intra-leg coordination is not strictly dependent on sensory feedback loops from the legs" is quite a drastic overinterpretation of that paper's results. The ablation there was not complete (some subtypes of sensory neurons were not perturbed), and the perturbed flies certainly walked with some defects. This statement certainly should be removed or significantly softened.

      Thank you for pointing this detail out. The term ‘normally’ has been removed from this sentence to soften the statement.

      Figure 13, Standard leg connectome

      Unfortunately, the motor neurons controlling the tarsus could not be included here, I suppose due to the difficulty in identifying the T2 and T3 homologs for these motor neurons. This should be mentioned in the text. This version of the standard leg connectome is without a doubt still an incredibly valuable discovery, but readers should be made aware that this version of the standard leg connectome does in fact lack the motor neurons for one joint.

      The MNs controlling the tarsus could not be matched with high confidence. We have added a sentence pointing this out when the leg circuit is introduced (L1141-1142).

      The focus here is on locomotion is the absence of other behaviors whereas the legs are responsible for grooming, reaching, boxing, etc. How should we consider the leg connectome in light of this?

      This is a very good point, and we have indeed found known grooming neurons that target our leg premotor circuit (L1158-1161). We’ve now added this observation to the Discussion (L1949-1951).

      Minor points

      L84 - re: Descending neurons work together - cite Braun et al., bioRxiv 2023; cite Yang HH bioRxiv 2023 .

      We agree that these papers are relevant to the function of DNs in combination, and have added them to the introduction (L83-84, 86-87).

      L193 - "intrepid" is overly florid language; similar for L1507 "enigmatic".

      We have replaced these words with suitable synonyms.

      L273 - The acronym "ITD" is not explained. Please check all other acronyms. Related, it would be good to include a Table or Box with all acronyms for the reader.

      We have added the full name of the ITD to the text. A glossary is available in Figure 1, and a full glossary of MANC terms is available in Table 1 of our sister paper, Marin et al. 2024.

      -L514, you state that hemilineages 6A and 6B unexpectedly produce uncoordinated leg movements (flight-related was expected). However, Harris didn't study animals in tethered flight but headless on the ground.

      The experimental setup of Harris et al. was capable of assessing flight-like motor output even if not true flight, as seen in the predominantly wing movement phenotypes of activating hemilineages 7B, 11A/B and 2A. We now also note that hemilineage annotation in Marin et al., 2024, shows that the 6B hemilineage has some projections into the leg neuropils, in support of a leg motor role in addition to an upper tectular role (L570-571).

      L1425 - "the TTM" is repeated twice.

      This sentence addresses both the TTM and its MN (TTMn). We have revised this sentence to improve clarity by expanding the full name of TTM in that paragraph and leaving TTMn abbreviated

      L1728 - Ascending neuron projections to the brain - cite Chen et al., Nat Neuro 2023.

      We agree that Chen et al. 2023 is relevant to the discussion of AN function, and have added this citation (L1836-1838).

      L1817, It is a good idea to compare with previous predictions for circuit control. But these originate from non-Drosophila work as well. Please cite and consider the original models from Buschges, Cruse, Holmes, and others.

      Thanks for the suggestion. We now cite the non-Drosophila literature as well. (L1971)

      L1827, how precisely should these "theories" be updated? Be explicit.

      We summarize in the sentences before what is different in comparison to one of the suggested models. We have now additionally added examples to the sentence (L1942-1945) to suggest that theoretical leg circuits need to account for the posterior-to-anterior as well as anterior-to-posterior connections between leg neuropils, as well as relative lack of connectivity between the left and right mesothoracic leg neuropils.

      L1831, include a discussion about another alternative which is through mechanical coupling and sensory feedback.

      We agree that leg sensory input likely contributes to leg locomotor circuits. We have added the following sentence to point out that annotations of sensory neurons in MANC are available through work in a companion paper (Marin et al. 2024), and future work is necessary to examine the contribution of sensory input to leg motor circuits (L1954-1956).

      Methods

      https://flyconnectome.github.io/malevnc/ link doesn't work.

      We have updated the link.

    1. eLife Assessment

      In this useful study, ectopic expression and knockdown strategies were used to assess the effects of increasing and decreasing Cyclic di-AMP on the developmental cycle in Chlamydia. The authors convincingly demonstrate that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. Whilst the authors have attempted to revise the submission, the model currently proposed is not fully supported by the data presented.

    2. Reviewer #2 (Public review):

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. The main findings remain the same. The authors show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of transitionary and late genes. The authors also knocked down the expression of the dacA-ybbR operon and reported a modest reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion.

      Overall, this is a very intriguing study with important implications however, the data is very preliminary, and the model is very rudimentary. The data support the observation that dramatically increased c-di-AMP has an impact on transitionary gene expression and late gene expression suggesting dysregulation of the developmental cycle. This effect goes away with modest changes in c-di-AMP (detaTM-DacA vs detaTM-DacA (D164N)). However, the model predicts that low levels of c-di-AMP delays EB production is not not well supported by the data. If this prediction were true then the growth rate would increase with c-di-AMP reduction and the data does not show this. The levels of c-di-AMP at the lower levels need to be better validated as it seems like only very high levels make a difference for dysregulated late gene expression. However, on the low end it's not clear what levels are needed to have an effect as only DacAopMut and DacAopKD show any effects on the cycle and the c-di-AMP levels are only different at 24 hours.

      The authors responded to reviewers' critiques by adding the overexpression of DacA without the transmembrane region. This addition does not really help their case. They show that detaTM-DacA and detaTM-DacA (D164N) had the same effects on c-di-AMP levels but the figure shows no effects on the developmental cycle.

      Describing the significance of the findings:

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well.

      Describing the strength of evidence:

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported.

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings.

    3. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The paper by Lee and Ouellette explores the role of cyclic-d-AMP in chlamydial developmental progression. The manuscript uses a collection of different recombinant plasmids to up- and down-regulate cdAMP production, and then uses classical molecular and microbiological approaches to examine the effects of expression induction in each of the transformed strains. 

      Strengths: 

      This laboratory is a leader in the use of molecular genetic manipulation in Chlamydia trachomatis and their efforts to make such efforts mainstream is commendable. Overall, the model described and defended by these investigators is thorough and significant.

      Thank you for these comments.

      Weaknesses: 

      The biggest weakness in the document is their reliance on quantitative data that is statistically not significant, in the interpretation of results. These challenges can be addressed in a revision by the authors. 

      Thank you for these comments. We point out that, while certain RT-qPCR data may not be statistically significant, our RNAseq data indicate late genes are, as a group, statistically significantly increased when increasing c-di-AMP levels and decreased when decreasing c-di-AMP levels. We do not believe running additional experiments to “achieve” statistical significance in the RT-qPCR data is worthwhile. We hope the reviewer agrees with this assessment.

      We have also included new data in this revised manuscript, which we believe further strengthens aspects of the conclusions linked to individual expression of full-length DacA isoforms. We have also quantified inclusion areas and bacterial sizes for critical strains.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. Chlamydia are obligate intracellular bacterial pathogens that rely on eukaryotic host cells for growth. The chlamydial life cycle depends on a cell form developmental cycle that produces phenotypically distinct cell forms with specific roles during the infectious cycle. The RB cell form replicates amplifying chlamydia numbers while the EB cell form mediates entry into new host cells disseminating the infection to new hosts. Regulation of cell form development is a critical question in chlamydia biology and pathogenesis. Chlamydia must balance amplification (RB numbers) and dissemination (EB numbers) to maximize survival in its infection niche. The main findings In this manuscript show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. The authors also knocked down the expression of the dacA-ybbR operon and reported a reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary and is not well supported by the data. 

      Thank you for your comments. Chlamydia is not an easy experimental system, but we have done our best to address the reviewer’s concerns in this revised submission.

      Describing the significance of the findings: 

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well. 

      Describing the strength of evidence: 

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported. 

      dacA-ybbR ectopic expression: 

      For the dacA-ybbR ectopic expression experiments they show that hctA is induced early but there is no significant change in OmcB gene expression. This is problematic as when RBs are treated with Pen (this paper) and (DOI 10.1128/MSYSTEMS.00689-20) hctA is expressed in the aberrant cell forms but these forms do not go on to express the late genes suggesting stress events can result in changes in the developmental expression kinetic profile. The RNA-seq data are a little reassuring as many of the EB/Late genes were shown to be upregulated by dacA-ybbR ectopic expression in this assay.

      As the reviewer notes, we also generated RNAseq data, which validates that late gene transcripts (including sigma28 and sigma54 regulated genes) are statistically significantly increased earlier in the developmental cycle in parallel to increased c-di-AMP levels. The lack of statistical significance in the RT-qPCR data for omcB, which shows a trend of higher transcripts, is less concerning given the statistically significantly RNAseq dataset. We have reported the data from three replicates for the RT-qPCR and do not think it would be worthwhile to attempt more replicates in an attempt to “achieve” statistical significance.

      We recognize that hctA may also increase during stress as noted by the Grieshaber Lab. In re-evaluating these data, we decided to remove the Penicillin-linked studies from the manuscript since they detract from the focus of the story we are trying to tell given the potential caveat the reviewer mentions.

      The authors also demonstrate that this ectopic expression reduces the overall growth rate but produces EBs earlier in the cycle but overall fewer EBs late in the cycle. This observation matches their model well as when RBs convert early there is less amplification of cell numbers. 

      dacA knockdown and dacA(mut) 

      The authors showed that dacA knockdown and ectopic expression of the dacA mutant both reduced the amount of c-di-AMP. The authors show that for both of these conditions, hctA and omcB expression is reduced at 24 hpi. This was also partially supported by the RNA-seq data for the dacA knockdown as many of the late genes were downregulated. However, a shift to an increase in RB-only genes was not readily evident. This is maybe not surprising as the chlamydial inclusion would just have an increase in RB forms and changes in cell form ratios would need more time points.

      Thank you for this comment. We agree that it is not surprising given the shift in cell forms. The reduction in hctA transcripts argues against a stress state as noted above by the reviewer, and the RNAseq data from dacA-KD conditions indicates at least that secondary differentiation has been delayed. We agree that more time points would help address the reviewer’s point, but the time and cost to perform such studies is prohibitive with an obligate intracellular bacterium.

      Interestingly, the overall growth rate appears to differ in these two conditions, growth is unaffected by dacA knockdown but is significantly affected by the expression of the mutant. In both cases, EB production is repressed. The overall model they present does not support this data well as if RBs were blocked from converting into EBs then the growth rate should increase as the RB cell form replicates while the EB cell form does not. This should shift the population to replicating cells. 

      We agree that it seems that perturbing c-di-AMP production by knockdown or overexpressing the mutant DacA(D164N) has different impacts on chlamydial growth. We have generated new data, which we believe addresses this. Overexpressing membrane-localized DacA isoforms is clearly detrimental to chlamydiae as noted in the manuscript. However, when we removed the transmembrane domain and expressed N-terminal truncations of these isoforms, we observed no effects of overexpression on chlamydial morphology or growth. Importantly, for the wild-type full-length or truncated isoforms, overexpressing each resulted in the same level of c-di-AMP production, further supporting that the negative effect of overexpressing the wild-type full-length is linked to its membrane localization and not c-di-AMP levels. These data have been included as new Figure 3. These data indicate that too much DacA in the membrane is disruptive and suggest that the balance of DacA to YbbR is important since overexpression of both did not result in the same phenotype. This is further described in the Discussion.

      As it relates to knockdown of dacA-ybbR, we have essentially removed/reduced the amount of these proteins from the membrane and have blocked the production of c-di-AMP. This is fundamentally different from overexpression.

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings. 

      Reviewer #1 (Recommendations for the authors): 

      There is a generally consistent set of experiments conducted with each of the mutant strains, allowing a straightforward examination of the effects of each transformant. There are a few general and specific things that need to be addressed for both the benefit of the reader and the accuracy of interpretation. The following is a list of items that need to be addressed in the document, with an overall goal of making it more readable and making the interpretations more quantitatively defended. 

      Specific comments: 

      (1) The manuscript overall is wordy and there are quite a few examples of text in the results that should be in the discussion (examples include lines 224-225, 248-262, 282-288, 304-308) the manuscript overall could use a careful editing for verbosity. 

      Thank you for this comment. We have removed some of the indicated sentences. However, to maintain the flow and logic of the manuscript, some statements may have been preserved to help transition between sections. As far as verbosity, we have tried to be as clear as possible in our descriptions of the results to minimize ambiguity. Others who read our manuscript appreciated the thoroughness of our descriptions.

      (2) There is also a trend in the document to base fact statements on qualitative and quantitative differences that do not approach statistical significance. Examples of this include the following: lines 156-158, 190-192, 198-199, 230-232, 239-242, 292-293). This is something the authors need to be careful about, as these different statistically insignificant differences may tend to multiply a degree of uncertainty across the entire manuscript. 

      We have quantified inclusion areas and tried to remove instances of qualitative assessments as noted by the reviewer. In regards to some of the transcripts, we can only report the data as they are. In some cases, there are trends that are not statistically significant, but it would seem to be inaccurate to state that they were unchanged. In other cases, a two-fold or less difference in transcript levels may be statistically significant but biologically insignificant. A reader can and should make their own conclusions.

      (3) Any description of inclusion or RB size being modestly different needs to be defended with microscopic quantification. 

      We have quantified inclusion areas and RB sizes and tried to remove instances of qualitative assessments as noted by the reviewer.

      (4) It would be very helpful to reviewers if there was a figure number added to each figure in the reviewer-delivered text. 

      Added.

      (5) Figure 1A: This should indicate that the genes indicated beneath each developmental form are on high (I think that is what that means). 

      We have reorganized Figure 1 to better improve the flow.

      (6) Figure 1B is exactly the same as the three images in Figure 8B. I would delete this in Figure 1. This relates to comment 9. 

      We presented this intentionally to clearly illustrate to the reader, who may not be knowledgeable in this area, what we propose is happening in the various strains. As such, we respectfully disagree and have left this aspect of the figure unchanged.

      (7) Figure 1D: It is not clear if the period in E.V has any meaning. I think this is just a typo. Also, the color coding needs to be indicated here. What do the gray bars represent? The labeling for the gene schematic for dacA-KDcom should not be directly below the first graph in D. This makes the reader think this is a label for the graph. This can be accomplished if the image in panel B is removed and the first graph in panel D is moved into B. This will make a better figure. 

      We have reorganized Figure 1 to better improve the flow.

      (8) Figure 2 C, G: The utility of these panels is not clear. For them to have any value, they need to be expressed in genome copies. If they are truly just a measure of chlamydia genomic DNA, they have minimal utility to the reader. There are similar panels in several other figures. 

      We have reported genome copies as suggested in lieu of ng gDNA for these measurements. Importantly, it does not alter any interpretations.

      (9) I am not sure about the overall utility of Figure 8. Granted, a summary of their model is useful, but the cartoons in the figure are identical or very nearly identical to model figures shown in two other publications from the same group (PMID: 39576108, 39464112) These are referenced at least tangentially in the current manuscript (Jensen paper- now published- and ref 53). Because the model has been published before, if they are to be included, there needs to be a direct comparison of the results in each of these three papers, as they basically describe the same developmental process. The model images should also be referenced directly to the first of the other papers.

      This was intentional so that readers familiar with our work will see the similarities between these systems. We have added additional comments in the Discussion related to our newly published work. As an aside, Dr. Lee generated the first version of the figure that was adapted by others in the lab. It is perhaps unlucky that those other studies have been published before his work.

    1. eLife Assessment

      This important study presents a new framework (ASBAR) that combines open-source toolboxes for pose estimation and behavior recognition to automate the process of categorizing behaviors in wild apes from video data. The authors present compelling evidence that this pipeline can categorize simple wild ape behaviors from out-of-context video at a similar level of accuracy as previous models, while simultaneously vastly reducing the size of the model. The study's results should be of particular interest to primatologists and other behavioral biologists working with natural populations.

    2. Reviewer #1 (Public review):

      Summary:

      Advances in machine vision and computer learning have meant that there are now state-of-the-art and open-source toolboxes that allow for animal pose estimation and action recognition. These technologies have the potential to revolutionize behavioral observations of wild primates but are often held back by labor intensive model training and the need for some programming knowledge to effectively leverage such tools. The study presented here by Fuchs et al unveils a new framework (ASBAR) that aims to automate behavioral recognition in wild apes from video data. This framework combines robustly trained and well tested pose estimate and behavioral action recognition models. The framework performs admirably at the task of automatically identifying simple behaviors of wild apes from camera trap videos of variable quality and contexts. These results indicate that skeletal-based action recognition offers a reliable and lightweight methodology for studying ape behavior in the wild and the presented framework and GUI offer an accessible route for other researchers to utilize such tools.

      Given that automated behavior recognition in wild primates will likely be a major future direction within many subfields of primatology, open-source frameworks, like the one presented here, will present a significant impact on the field and will provide a strong foundation for others to build future research upon.

      Strengths:

      Clearly articulated the argument as to why the framework was needed and what advantages it could convey to the wider field.

      For a very technical paper it was very well written. Every aspect of the framework the authors clearly explained why it was chosen and how it was trained and tested. This information was broken down in a clear and easily digestible way that will be appreciated by technical and non-technical audiences alike.

      The study demonstrates which pose estimation architectures produce the most robust models for both within context and out of context pose estimates. This is invaluable knowledge for those wanting to produce their own robust models.

      The comparison of skeletal-based action recognition with other methodologies for action recognition are helpful in contextualizing the results.

      Weaknesses:

      While I note that this is a paper most likely aimed at the more technical reader, it will also be of interest to a wider primatological readership, including those who work extensively in the field. When outlining the need for future work I felt the paper offered almost exclusively very technical directions. This may have been a missed opportunity to engage the wider readership and suggest some practical ways those in the field could collect more ASBAR friendly video data to further improve accuracy.

      Comments on latest version:

      I think the new version is an improvement and applaud the authors on a well-written article that conveys some very technical details excellently. The authors have addressed my initial comments about reaching out to a wider, sometimes less technical, primatological audience by encouraging researchers to create large annotated datasets and make these publicly accessible. I also agree that fostering interdisciplinary collaboration is the best way to progress this field of research. These additions have certainly strengthened the paper but I still think some more practical advice for the actual collection of high-quality training data used to improve the pose estimates and behavioral classification in tough out-of-context environments could have been added. This doesn't detract from the quality of the paper though.

    3. Reviewer #2 (Public review):

      Fuchs et al. propose a framework for action recognition based on pose estimation. They integrate functions from DeepLabCut and MMAction2, two popular machine learning frameworks for behavioral analysis, in a new package called ASBAR.

      They test their framework by:

      Running pose estimation experiments on the OpenMonkeyChallenge (OMC) dataset (the public train + val parts) with DeepLabCut

      Also annotating around 320 images pose data in the PanAf dataset (which contains behavioral annotations). They show that the ResNet-152 model generalizes best from the OMC data to this out-of-domain dataset.

      They then train a skeleton-based action recognition model on PanAf and show that the top-1/3 accuracy is slightly higher than video-based methods

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Summary:

      Advances in machine vision and computer learning have meant that there are now state-of-the-art and open-source toolboxes that allow for animal pose estimation and action recognition. These technologies have the potential to revolutionize behavioral observations of wild primates but are often held back by labor-intensive model training and the need for some programming knowledge to effectively leverage such tools. The study presented here by Fuchs et al unveils a new framework (ASBAR) that aims to automate behavioral recognition in wild apes from video data. This framework combines robustly trained and well-tested pose estimate and behavioral action recognition models. The framework performs admirably at the task of automatically identifying simple behaviors of wild apes from camera trap videos of variable quality and contexts. These results indicate that skeletal-based action recognition offers a reliable and lightweight methodology for studying ape behavior in the wild and the presented framework and GUI offer an accessible route for other researchers to utilize such tools.

      Given that automated behavior recognition in wild primates will likely be a major future direction within many subfields of primatology, open-source frameworks, like the one presented here, will present a significant impact on the field and will provide a strong foundation for others to build future research upon.

      Strengths:

      Clearly articulated the argument as to why the framework was needed and what advantages it could convey to the wider field.

      For a very technical paper it was very well written. Every aspect of the framework the authors clearly explained why it was chosen and how it was trained and tested. This information was broken down in a clear and easily digestible way that will be appreciated by technical and non-technical audiences alike.

      The study demonstrates which pose estimation architectures produce the most robust models for both within-context and out-of-context pose estimates. This is invaluable knowledge for those wanting to produce their own robust models.

      The comparison of skeletal-based action recognition with other methodologies for action recognition helps contextualize the results.

      We thank Reviewer #1 for their thoughtful and constructive review of our manuscript. We are especially grateful for your recognition of the clarity of the manuscript, the strength of the technical framework, and its accessibility to both technical and non-technical audiences. Your feedback highlights exactly the kind of interdisciplinary engagement we hope to foster with this work.

      Weaknesses

      While I note that this is a paper most likely aimed at the more technical reader, it will also be of interest to a wider primatological readership, including those who work extensively in the field. When outlining the need for future work I felt the paper offered almost exclusively very technical directions. This may have been a missed opportunity to engage the wider readership and suggest some practical ways those in the field could collect more ASBAR-friendly video data to further improve accuracy.

      We appreciate this insightful suggestion and fully agree that emphasizing practical relevance is important for engaging a broader readership. In response, we have reformulated the opening of the Discussion section to place stronger emphasis on the value of shared, open-source resources and the real-world accessibility of the ASBAR framework. The revised text explicitly highlights the practical benefits of ASBAR for field researchers working in resource-constrained environments, and underscores the importance of community-driven data sharing to advance behavioral research in natural settings.

      This section now reads: Despite the growing availability of open-source resources, such as large-scale animal pose datasets and machine learning toolboxes for pose estimation and human skeleton-based action recognition, their integration for animal behavior recognition—particularly in natural settings—remains largely unexplored. With ASBAR, a framework combining animal pose estimation and skeleton-based action recognition, we provide a comprehensive data and model pipeline, methodology, and GUI to assist researchers in automatically classifying animal behaviors via pose estimation. We hope these resources will become valuable tools for advancing the understanding of animal behavior within the research community.

      To illustrate ASBAR’s capabilities, we applied it to the challenging task of classifying great ape behaviors in their natural habitat. Our skeletonbased approach achieved accuracy comparable to previous video-based studies for Top-K and Mean Class Accuracies. Additionally, by reducing the input size of the action recognition model by a factor of approximately 20 compared to video-based methods, our approach requires significantly less computational power, storage space, and data transfer resources. These qualities make ASBAR particularly suitable for field researchers working in resource-constrained environments.

      Our framework and results are built on the foundation of shared and open-source materials, including tools like DeepLabCut, MMAction2, and datasets such as OpenMonkeyChallenge and PanAf500. This underscores the importance of making resources publicly available, especially in primatology, where data scarcity often impedes progress in AI-assisted methodologies. We strongly encourage researchers with large annotated video datasets to make them publicly accessible to foster interdisciplinary collaboration and further advancements in animal behavior research.

      Reviewer #2 (Public Review)

      Fuchs et al. propose a framework for action recognition based on pose estimation. They integrate functions from DeepLabCut and MMAction2, two popular machine-learning frameworks for behavioral analysis, in a new package called ASBAR.

      They test their framework by

      Running pose estimation experiments on the OpenMonkeyChallenge (OMC) dataset (the public train + val parts) with DeepLabCut.

      Annotating around 320 image pose data in the PanAf dataset (which contains behavioral annotations). They show that the ResNet-152 model generalizes best from the OMC data to this out-of-domain dataset.

      They then train a skeleton-based action recognition model on PanAf and show that the top-1/3 accuracy is slightly higher than video-based methods (and strong), but that the mean class accuracy is lower - 33% vs 42%. Likely due to the imbalanced class frequencies. This should be clarified. For Table 1, confidence intervals would also be good (just like for the pose estimation results, where this is done very well).

      We thank Reviewer #2 for their clear and helpful summary of our work, and for the thoughtful suggestions to improve the manuscript. We appreciate this observation. In the revised manuscript, we now clarify that the lower Mean Class Accuracy (MCA) in the initial version was indeed driven by significant class imbalance in the PanAf dataset, which contains highly uneven representation across behavior categories. To address this, we made two key improvements to the action recognition model:

      (1) We replaced the standard cross-entropy loss with a class-balanced focal loss, following the approach of Sakib et al. (2021), to better account for rare behaviors during training.

      (2) We initialized the PoseConv3D model with pretrained weights from FineGym (Shao et al., 2020) rather than training from scratch, which increased performance across underrepresented classes.

      Together, these changes substantially improved model performance on tail classes, increasing the Mean Class Accuracy from 33.6% to 47%, now exceeding that of the videobased baseline.

      Moreover, we sincerely thank Reviewer #2 for the thorough and constructive private feedback. Your comments have greatly helped us improve both the structure and clarity of the manuscript, and we have implemented several key revisions based on your recommendations to streamline the text and sharpen its focus on the core contributions. In particular, we have revised the tone of both the Introduction and Discussion sections to more modestly and accurately reflect the scope of our findings. We removed unnecessary implementation details—such as the description of graph-based models that were not part of the final pipeline—to avoid distracting tangents. The Methods section has been clarified and consolidated to include all evaluation metrics, a description of the data augmentation, and other methodological elements that were previously scattered across the Results section. Additionally, the Discussion now explicitly addresses the limitations of our EfficientNet results, including a dedicated paragraph that acknowledges the use of suboptimal hyperparameters and highlights the need for architecture-specific tuning, particularly with respect to learning rate schedules.

    1. eLife Assessment

      The authors present a useful agent-based model to study the tensile force generated by myosin mini-filaments in actin systems (bundles and networks); by numerically solving a mechanical model of myosin-II filaments, the authors provide insights into how the geometry of the molecular components and their elastic responses determine the force production. This work is of interest to biophysicists (in particular theoreticians) investigating force generation of motor molecules from a biomechanical engineering and physics perspective. The authors convincingly show that cooperative effects between multiple myosin filaments can enhance the total force generated, but not the efficiency of force generation (force per myosin) if passive cross-linkers are present. This work would benefit from a more extensive discussion of the physiological relevance of the results in view of the existing experimental literature, and how the principles that govern the behavior could be different for different motor proteins.

    2. Reviewer #1 (Public review):

      Summary:

      This work by Ding et al uses agent-based simulations to explore the role of the structure of molecular motor myosin filaments in force generation in cytoskeletal structures. The focus of the study is on disordered actin bundles which can occur in the cell cytoskeleton and can be investigated with in vitro purified protein experiments. A key finding is that the force generation depends on the number of myosin motor heads and the spatial distribution of the myosin thick filaments in relation to passive crosslinkers.

      Strengths:

      The work develops a model where the detailed structure of the myosin motor filaments with multiple heads is represented. This allows the authors to test the dependence of myosin-generated forces on the number of myosin heads and their spatial distribution.

      The work highlights that forces from multiple myosin motors within a disordered actin bundle may not simply add up, but depend on their spatial distribution in relation to passive crosslinkers.

      This may explain prior experimental observations in in vitro reconstituted actomyosin bundles that the tension developed in the bundle was proportional to the number of myosin motor heads per filament rather than the number of myosin filaments. More generally, this type of modeling can guide fundamental understanding of the relationship between structure and mechanical force production.

      Weaknesses:

      The work focuses on the structure of myosin filaments but ignores other processes that may determine contractility of actomyosin structures such as the dynamics of crosslinker binding/unbinding and actin polymerization/depolymerization.

      The authors did not vary the relative concentration of myosin motors and passive crosslinkers. This would have revealed interesting competing effects between motor and crosslink density and distribution, that their model and other studies suggest are important.

      Given the above factors and the lack of direct quantitative comparisons with the experiment, the physiological significance of the work remains hard to ascertain.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors use a mechanical model to investigate how the geometry and deformations of myosin II filaments influence their force generation. They introduce a force generation efficiency that is defined as the ratio of the total generated force and the maximal force that the motors can generate. By changing the architecture of the myosin II filaments, they study the force generation efficiency in different systems: two filaments, a disorganized bundle, and a 2D network. In the simple two-filament systems, they found that in the presence of actin cross-linking proteins motors cannot add up their force because of steric hindrances. In the disorganized bundle, the authors identified a critical overlap of motors for cooperative force generation. This overlap is also influenced by the arrangement of the motor on the filaments and influenced by the length of the bare zone between the motor heads.

      Strengths:

      The strength of the study is the identification of organizational principles in myosin II filaments that influence force generation. It provides a complementary mechanistic perspective on the operation of these motor filaments. The force generation efficiency and the cooperative overlap number are quantitative ways to characterize the force generation of molecular motors in clusters and between filaments. These quantities and their conceptual implications are most likely also applicable in other systems.

      Weaknesses:

      The detailed model that the authors present relies on over 20 numerical parameters that are listed in the supplement. Because of this vast number of parameters, it is not clear how general the findings are. On the other hand, it was not obvious how specific the model is to myosin II, meaning how well it can describe experimental findings or make measurable predictions. Although the authors partially addressed this point in the revisions, I still think it is not easy to see what are the fundamental principles that govern the behavior and how they could be different for different motor proteins.

      The model seems to be quantitative, but the interpretation and connection to real experiments is rather qualitative in my point of view.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work by Ding et al uses agent-based simulations to explore the role of the structure of molecular motor myosin filaments in force generation in cytoskeletal structures. The focus of the study is on disordered actin bundles which can occur in the cell cytoskeleton and have also been investigated with in vitro purified protein experiments.

      Strengths:

      The key finding is that cooperative effects between multiple myosin filaments can enhance both total force and the efficiency of force generation (force per myosin). These trends were possible to obtain only because the detailed structure of the motor filaments with multiple heads is represented in the model.

      We appreciate your comments about the strength of our study. 

      Weaknesses:

      It is not clearly described what scientific/biological questions about cellular force production the work answers. There should be more discussion of how their simulation results compare with existing experiments or can be tested in future experiments.

      Please see our response to the comment (1) below.

      The model assumptions and scientific context need to be described better.

      We apologize for the insufficient descriptions about the model and the scientific context. We revised the manuscript to better explain model assumptions and scientific context as described in our responses below.

      The network contractility seems to be a mere appendix to the bundle contractility which is presented in much more detail.

      Please see our response to the comment (6) below.

      Reviewer #1 (Recommendations for the authors):

      (1) It is not clearly described what scientific/biological questions about cellular force production the work answers. There should be more discussion of how their simulation results compare with existing experiments, or can be tested in future experiments. The authors do briefly mention Reference 4 where different myosin isoforms were used, but it is not clear that these experiments support the scalings predicted in this work in Figures 3-6. Also, the experiments in Ref. 4 apparently did not involve passive crosslinkers (ACPs) which are key in this study.

      Thank you for the comment. In the 5th paragraph of the discussion section of the original manuscript, we applied our findings to understand how structural differences between ventral stress fibers and actin arcs could affect force generation. In addition, at the end of the discussion section, we mentioned that experiments with artificially-made myosin thick filaments could be used for verifying our results. 

      The experiments in Ref. 4 were only ones that we could directly compare our results with. In previous study, actomyosin bundles were experimentally created with ACPs (K.L. Weirich et al., Biophys J, 2021, 120: 1957-1970), but the motions of myosin thick filaments were only quantities measured in the experiments. In general, measuring forces generated by in vitro actomyosin bundles is very challenging. This is why the predictions from our model are particularly valuable for understanding the force generation of actomyosin structures. 

      (2) The architecture of the bundles seems to be prescribed by hand in these simulations. Several well-known stochastic aspects of the dynamics of actin and actin-binding proteins are not included in the model. For example, there is no remodeling of the actin structures through actin polymerization and depolymerization, or crosslink (ACP) binding and unbinding. Can the authors comment on why these effects could be neglected for the questions they want to address?

      Thank you for the comment. We previously showed that the force generation process in actomyosin networks and bundles is affected by actin dynamics (Q. Yu et al., Biophys J, 2018, 115: 2003-2013) and the unbinding of ACPs (T. Kim, Biomech Model Mechanobiol, 2015, 14(2): 345-355 and W. Jung et al., Comput Part Mech, 2015, 2(4): 317-327). 

      However, we did not include the actin dynamics and the ACP unbinding in the current study to clearly understand the effects of the structural properties of thick filaments on the force generation process. We have learned that the stochastic behaviors of cytoskeletal components lead to noisier results, which requires us to run a much larger number of simulations to obtain statistically convincing data. We added the following paragraph in the discussion section of the revised manuscript:

      “Although this study focused mainly on parameters related to motor structures, we expect that other parameters would affect the force generation process. For example, as we showed before, a decrease in ACP density would reduce forces by deteriorating connectivity between filaments. With very low ACP density, some of neighboring motors may not have ACPs between them, thus adding up their forces as shown in Fig. 2. However, such low ACP density may not maintain the structure of bundles or cross-linked networks well. In addition, the force-dependent unbinding of ACPs could change the spatial distribution of ACPs during force generation. If they behave as a slip bond which unbinds more frequently with higher forces, ACPs may not stay between two motors for long time due to high tension. Then, forces generated by two motors may have a higher chance to add up. By contrast, if they behave as a catch bond which unbinds less frequently with larger forces, more ACPs will be recruited between two motors, reducing a chance to add up

      forces. The length of actin filaments is unlikely to affect the force generation process significantly unless filaments are very short. Additionally, as we showed before, actin turnover would reduce forces by competing with motor activities, change connectivity between filaments over time, and prevent motors from being stalled for long time, all of which could affect force generation.”

      (3) The present study is confined to the fixed density of motors and ACPs. However, these can be easily varied in in vitro experiments. Works such as Reference 4 show an optimum in contractility vs myosin concentration. Myosins act not only to slide actin filaments but also crosslink them.

      Can the authors vary myosin concentration to demonstrate such effects in their model?

      As the reviewer pointed out, there is a belief that myosin thick filaments can serve as crosslinkers as well. However, unless there are a fraction of dead myosins (which remain bound on filaments without walking) or myosins dwell at the barbed ends filaments for very long time, it looks very hard for bundles or networks to generate large forces. A former experiment showed that active myosins increases the viscosity of actin networks, not elasticity (D. Humphrey et al., Nature, 2002, 416: 413-416) Computer simulations with reasonable assumptions did not show significant force generation without cross-linkers. We have tested systems with a large number of motors and a few cross-linkers in previous studies (T. Kim, Biomech Model Mechanobiol, 2015, 14(2): 345-355 and W. Jung et al., Comput Part Mech, 2015, 2(4): 317-327). We observed that large force/stress was generated momentarily, but it was relaxed very fast. It is expected that there will be similar outcomes if we try such conditions in the current study.

      (4) Why is there a (factor of 1.5-2) discrepancy in the measured (Ftot) and estimated (Fest) force values in Figure 4-6? How can the authors improve their scaling arguments to capture this? What about the estimated efficiency?

      Thank you for the comment. Indeed, there was a discrepancy between the actual and estimated forces. When the estimated force was calculated, we used the z positions of motors without consideration of the actual bundle geometry with multiple filaments. For example, if two motors are located on the opposite sides of the bundle (i.e., if they are located far from each other in x or y direction), forces generated by them may not counterbalance each other. Then, the estimated force can be smaller than the actual force because counterbalance between motors can be overcounted. The original manuscript had the following sentences to clarify this point: “F</sub>est</sub> was generally smaller than F<sub>tot</sub> because this analysis does not account for actual bundle geometry consisting of multiple F-actins; if two motors are located far from each other in x or y direction, they may not counterbalance or add up forces. Nevertheless, we found that F<sub>est</sub> captures the overall dependence of F<sub>tot</sub> on parameters well.”

      (5) Several choices of parameter values used in the simulations are not clear:

      a) Why consider F actin of 140 nm specifically? Actin can come in a range of lengths. How do their results depend upon the length scale of actin?

      It seems that there is a misunderstanding. 140 nm is the equilibrium length of one actin segment in our model. The actual F-actin consists of multiple actin segments. The length of Factin was 9 μm in bundle simulations and 10 μm (average) in network simulations. We expect that the general tendency of our results would not change with different filament length. However, if filament length becomes too short, the force generation process would be impaired due to lack of connectivity between filaments. 

      b) Similarly, very specific values of myosin backbone length (42 nm), number of myosin heads (8), number of arms (24), and Actin Cross-linking Proteins (ACPs). What informs these values and how will the results change if they are different? It is not especially clear how an "Arm" differs from "heads" and what kind of coarse-graining is involved.

      In the “model overview” section of the original manuscript, we mentioned the following to clarify the definitions of motor arms and motor heads: 

      “To mimic the structure of bipolar filaments, each motor has a backbone, consisting of serially linked segments, and two arms on each endpoint of the backbone segments that represent 8 myosin heads (N<sub>h</sub> = 8).”

      We devised this coarse-graining scheme of myosin thick filaments in our previous work (T. Kim, Biomech Model Mechanobiol, 2015, 14(5): 1143-1155). Through extensive tests, we showed that force generation and motor behaviors are largely independent of coarse-graining level. In other words, a motor with the same value of N<sub>h</sub>N<sub>a</sub> leads to similar outcomes regardless of the value of N<sub>a</sub>. However, in a bundle with multiple filaments, each motor has a sufficient number of arms to ensure simultaneous interactions with those filaments. This is why we decided to useN<sub>h</sub> = 8 and N<sub>a</sub> = 24. 

      To match the length of thick filaments and the total number of heads (N<sub>h</sub>N<sub>a</sub>) in the model with real myosin thick filaments, we have used 42 nm for each backbone length. Varying this length is equivalent to a variation in L<sub>sp</sub> that we did for Fig. 6.

      We used high ACP density to ensure connections between all neighboring pairs of actin filaments. We already showed how the presence of ACPs affects the force generation process in Fig. 2 using two actin filaments. It is expected that a variation of ACP density would affect our results to some extent. Since the main focus of the current study is the structural properties of motors, we did not explore the effects of ACP density. I hope that the reviewer would understand our intention. 

      (6) The manuscript focuses on disordered bundles with only one figure on networks. However, actin fibers also ubiquitously exist as disordered networks, and it is important to explore in more detail the contractile forces in such network arrangements.

      We appreciate the comment. Because we plan to delve into the effects of motor structures on the force generation in networks as a follow-up study, we showed the minimal results in the current study to prove the generality of our findings. I hope that the reviewer would understand our intention and plan.

      It is not described very clearly how these networks were generated.

      We apologize for lack of explanation about how the networks were generated. We added the following section in Supplementary Text of the revised manuscript:

      “Network assembly

      Unlike F-actin in bundle simulations, F-actin in network simulations is formed by stochastic processes as in our previous studies. The formation of F-actin is initiated from a nucleation event with a constant rate constant, k<sub>n,A</sub>, with the appearance of one cylindrical segment in a random position with a random orientation perpendicular to the z direction. The polymerization of F-actin is simulated by adding cylindrical segments at the barbed end of existing filaments with a rate constant, k<sub>p,A</sub>. The ratio of k<sub>n,A</sub>to k<sub>p,A</sub> is adjusted to result in the average filament length of ~10 μm. The rest of the assembly process is identical to that described in the main text.”

      Crosslinked biopolymers like actin typically form disordered elastic networks with their coordination number below rigidity percolation threshold (z=4 in 2D), see for example review by Broedersz and Mackintosh Rev. Mod, Phys. 2013. Such networks should exist in the bendingdominated regime, where bending forces play a vital role in force propagation. Was that observed in the simulations? Why or why not?

      We appreciate the comment. We are aware of the bending-dominated regime and indeed showed the importance of the bending stiffness of actin filaments at low shear strain level in our previous work (T. Kim et al., PLOS Comput Biol, 2009, 5(7): e1000439). In case of active networks with motors, such a bending-dominated regime has not been observed without external shear strain. Instead, buckling of actin filaments was found to be essential for breaking symmetry between tensile and compressive forces developed by motor activities. We have shown that the free contraction of networks is inhibited if filament bending stiffness is increased substantially (J. Li et al., Soft Matter, 2017, 13: 3213-3220 and T. Bidone et al., PLOS Comput Biol, 2017, 13(1): e1005277). We expect that contractile forces generated by bundles or networks will be reduced significantly if we highly increase bending stiffness. However, considering the focus of the current study is on the structural properties of motors, we did not perform such simulations. 

      (7) It would be interesting to see the simulated predictions of the bundle or network contraction dynamics. This can be done by changing to free boundary conditions so that the bundle can contract.

      Thank you for the suggestion. We have previously investigated the free contraction of actomyosin networks with different motor density and ACP density (J Li et al., Soft Matter, 2017, 13: 3213). We observed that the rate of network contraction was higher with more motors and ACPs. However, we did not test the effects of the structural properties of thick filaments in the previous study. We plan to investigate the effects in future studies because the focus of the current study is the force generation process. Please note that in the discussion section of the original manuscript, we mentioned the following:

      “Although we focused on force generation, the contractile behaviors of actomyosin structures (i.e., a decrease in length) have also been of great interest. Our model can be used to study such contractile behaviors by deactivating the periodic boundary condition and removing connection between one end of bundle/network and a domain boundary as done previously [20]. To achieve higher contractile speed with the same total number of myosin heads, the existence of multiple contractile units would be better as suggested in a previous work [4]. This means that there is a trade-off between force generation and contractile speed. Previous studies also showed that the contractile speed of networks is proportional to motor density [18, 43, 51]. We may be able to use our model to systematically investigate how the contractile speed is regulated by parameters that we tested in this study, including the number, distribution, length, and structure of motors.”

      Minor suggestions for improvement:

      (1) What are the vertical markers in Figures 1E and F? They should be labelled. if they are crosslinkers, it is not clear why the color is different from Figure 1A and B.

      We believe that the reviewer meant Figs. 2E, F. Those vertical lines are indeed ACPs (crosslinkers). We changed the color of ACPs in Fig. 1A and Fig. 2B-D to purple to be consistent. In addition, we changed the colors of two filaments in Figs. 2B-D slightly to be consistent with Fig. 2E.

      (2) To help understanding, please include a figure showing how forces are measured.

      We added Fig. S1 in the revised manuscript to explain how the bundle force is calculated.

      (3) It should be possible to extend the scaling arguments to predict what is the crossover myosin density (N_M) in Figure 4a at which the efficiency changes from going as 1/N_M to saturating. 

      As the reviewer might have observed, the slope of the efficiency in Fig. 4A gradually changes, rather than showing a sharp transition. Thus, it is hard to define one crossover myosin density. 

      Similarly, what are the slopes in Figure 6a-b?

      We drew the reference lines in those two plots. Unfortunately, we do not have explanations about the origin of these slopes.

      (4) Some more explanation for the observed values should be added. Figure 4: Why does efficiency plateau at a value close to 0.8 in (A)? 

      We assume that the reviewer meant the plateau of η close to 0.08, not 0.8. Our speculation for the origin of this plateau value is related to L<sub>M</sub> (= 462 nm under the reference condition). Ideally, ~43 motors are required to cover the entire length of the bundle (= 20 μm). Under this condition, η is ~0.023. Although this is not 0.08, we believe that these two values are related to each other. For example, if we increase L<sub>M</sub>, this plateau level would increase. We added the following sentences in the result section of the revised manuscript:

      “The plateau level of η at ~0.08 is related to the minimum number of motors required for saturating an entire bundle, implying that the plateau level would be higher if each motor is longer.”

      Figure 5: Overlapping between motors seems to increase the total force applied by them because of cooperative effects. However, it is not abundantly clear why that should peak at a value of f = 0.06.

      As shown in Fig. 5B, smaller f always results in higher F<sub>tot</sub> due to higher level of cooperative overlap. The minimum value of f we tested in this study was 0.06, so F<sub>tot</sub> was maximal at f = 0.06.

      (5) Why is the network force expected to scale approximately as sqrt(N_M)? Is it because of the 2D geometry where the number of motors along the x or y-direction scale as sqrt(N_M)?

      We initially thought that the weaker dependence of the total force on N<sub>M</sub> was related to the random orientations of motors. However, if the network is fully saturated with motors, the inclusion of more motors will increase forces in both x and y directions almost linearly, resulting in the direct proportionality of F<sub>tot</sub> to N<sub>M</sub>. Our new hypothesis for weaker dependence is consistent with the reviewer’s speculation; the network is not fully saturated even with 1000 motors, so the entire regime shown in Fig. 7B corresponds to that with N<sub>M</sub> < 100 in Fig. 4A where similar weaker dependence on N<sub>M</sub> was observed. We added the following sentence in the result section of the revised manuscript to clarify this point:

      “the average number of motors in each direction which can experience the cooperative overlap would be ~. Maximal N<sub>M</sub> tested with the network was ~2,500, so the dependence of F<sub>tot</sub> on N<sub>M</sub> with the network is similar to that with N<sub>M</sub> < ~50 with the bundle (Fig. 4A).”

      (6) Figures 6 D and A: Figure 6D suggests that there is a more full overlap in the cases where there was a longer bare zone or larger spacing between motor arms. However, the quantification of the total force in A shows that the force is highest for the case where LM was increased by increasing the number of arms. Why do the authors think that is? I would expect from the explanation in Fig 6D that the Lsp and Lbz would be higher than Na in Fig 6A.

      Fig. 6D shows a difference in the level of the cooperative overlap () between two motors. As the reviewer pointed out, the case with more arms shows the lowest , resulting in the lowest as we showed in Fig. S2B. However, as show in in Eq. 7, the total force is a function of both N<sub>a</sub> and . Thus, due to higher N<sub>a</sub> and lower , the force in the case with different N<sub>a</sub> can be similar to that in the case with different L<sub>bz</sub>. In the original manuscript, we had the following sentence to explain how the force can be similar between the two cases: 

      “Thus, was higher (Fig. S2B, blue), resulting in higher F<sub>tot</sub> and η despite smaller N<sub>a</sub>.”

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors use a mechanical model to investigate how the geometry and deformations of myosin II filaments influence their force generation. They introduce a force generation efficiency that is defined as the ratio of the total generated force and the maximal force that the motors can generate. By changing the architecture of the myosin II filaments, they study the force generation efficiency in different systems: two filaments, a disorganized bundle, and a 2D network. In the simple two-filament systems, they found that in the presence of actin crosslinking proteins motors cannot add up their force because of steric hindrances. In the disorganized bundle, the authors identified a critical overlap of motors for cooperative force generation. This overlap is also influenced by the arrangement of the motor on the filaments and influenced by the length of the bare zone between the motor heads.

      Strengths:

      The strength of the study is the identification of organizational principles in myosin II filaments that influence force generation. It provides a complementary mechanistic perspective on the operation of these motor filaments. The force generation efficiency and the cooperative overlap number are quantitative ways to characterize the force generation of molecular motors in clusters and between filaments. These quantities and their conceptual implications are most likely also applicable in other systems.

      Thank you for the comments about the strength of our study. 

      Weaknesses:

      The detailed model that the authors present relies on over 20 numerical parameters that are listed in the supplement. Because of this vast amount of parameters, it is not clear how general the findings are. On the other hand, it was not obvious how specific the model is to myosin II, meaning how well it can describe experimental findings or make measurable predictions. The model seems to be quantitative, but the interpretation and connection to real experiments are rather qualitative in my point of view.

      As the reviewer mentioned, all agent-based computational models for simulating the actin cytoskeleton are inevitably involved with such a large number of parameters. Some of the parameter values are not known well, so we have tuned our parameter values carefully by comparing our results with experimental observations in our previous studies since 2009.We were aware of the importance of rigorous representation of unbinding and walking rates of myosin motors, so we implemented the parallel cluster model, which can predict those rates with consideration of the mechanochemical rates of myosin II, into our model. Thus, we are convincing that our motors represent myosin II.

      In our manuscript, our results were compared with prior observations in Ref. 4 (Thoresen et al., Biophys J, 2013) several times. In particular, larger force generation with more myosin heads per thick filament was consistent between the experiment and our simulations. 

      Our study can make various predictions. First, our study explains why non-muscle myosin II in stress fibers shows focal distributions rather than uniform distributions; if they stay closely, they can generate much larger forces in the stress fibers via the cooperative overlap. Our study also predicts a difference between bipolar structures (found in skeletal muscle myosins and nonmuscle myosins) and side polar structures (found in smooth muscle myosins) in terms of the likelihood of the cooperative overlap. As shown below, myosin filaments with the bipolar structure can add up their forces better than those with the side polar structure when their overlap level is the same.

      Author response image 1.

       

      It was often difficult for me to follow what parameters were changed and what parameters were set to what numerical values when inspecting the curve shown in the figures. The manuscript could be more specific by explicitly giving numbers. For example, in the caption for Figure 6, instead of saying "is varied by changing the number of motor arms, the bare zone length, the spacing between motor arms", the authors could be more specific and give the ranges: "is varied by changing the number of motor arms form ... to .., the bare zone length from .. to..., and the spacing between motor arms from .. to ..".

      This unspecificity is also reflected in the text: "We ran simulations with a variation in either L<sub>sp</sub> or L<sub>bz</sub>" What is the range of this variation? "WhenL<sub>M</sub> was similar" similar to what? "despite different N<sub>M</sub>." What are the different values for N<sub>M</sub>? These are only a few examples that show that the text could be way more specific and quantitative instead of qualitative descriptions.

      We appreciate the comment. In the revised manuscript, we specified the range of the variation in each parameter.

      In the text, after equation (2) the authors discuss assumptions about the binding of the motor to the actin filament. I think these model-related assumptions and explanations should be discussed not in the results section but rather in the "model overview" section.

      Thank you for pointing this out. In the original manuscript, we described all the details of the model in Supplementary Material. We feel that the assumptions about interactions between motors and actin filaments are too detailed information to be included in the model overview section.

      The lines with different colors in Figure 2A are not explained. What systems and parameters do they represent?

      The different colors used in Fig. 2A were used for distinguishing 20 cases. We added the explanation about the colors in the figure caption in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      To guarantee the reproducibility of the results, I recommend that the authors publish their simulation code on GitHub.

      We appreciate the reviewer’s suggestion. Following the suggestion, we prepared and posted the code on GitHub as mentioned in the Data Availability of the revised manuscript: The source code of our model is available on GitHub: https://github.com/ktyman2/ThickFilament”

    1. eLife Assessment

      This important study uses data on over 56 million articles to examine the dynamics of interdisciplinarity and international collaborations in research journals. The data analytics used to quantify disciplinary and national diversity are convincing, and support the claims that journals have become more diverse in both aspects.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to explore how interdisciplinarity and internationalization-two increasingly prominent characteristics of scientific publishing-have evolved over the past century. By constructing entropy-based indices from a large-scale bibliometric dataset (OpenAlex), they examine both long-term trends and recent dynamics in these two dimensions across a selection of leading disciplinary and multidisciplinary journals. Their goal is to identify field-specific patterns and structural shifts that can inform our understanding of how science has become more globally collaborative and intellectually integrated.

      Strengths and Weaknesses:

      The paper's primary strength lies in its comprehensive temporal scope and use of a rich, openly available dataset covering over 56 million articles. The interdisciplinary and internationalization indices are well-founded and allow meaningful comparisons across fields and time. Moreover, the distinction between disciplinary and multidisciplinary journals adds valuable nuance. However, some methodological choices, such as the use of a 5-year sliding window to compute trend values, are insufficiently justified and under-explained. The paper also does not fully address disparities in data coverage across disciplines and time, which may affect the reliability of historical comparisons. Finally, minor issues in grammar and clarity reduce the overall polish of the manuscript.

      Evaluation of Findings:

      Overall, the authors have largely succeeded in achieving their stated aims. The findings-such as the sharp rise in internationalization in fields like Physics, and the divergence in interdisciplinarity trends across disciplines-are clearly presented and generally well-supported by the data. The authors effectively demonstrate that scientific journals have not followed a uniform trajectory in terms of structural evolution. However, greater clarity in trend estimation methods and better acknowledgment of dataset limitations would help to further substantiate the conclusions and enhance their generalizability.

      Impact and Relevance:

      This study makes a timely and meaningful contribution to the fields of scientometrics, sociology of science, and science policy. Its combination of scale, historical depth, and field-level comparison offers a useful framework for understanding changes in scientific publishing practices. The entropy-based indicators are simple yet flexible, and the use of open bibliometric data enhances reproducibility and accessibility for future research. Policymakers, journal editors, and researchers interested in publication dynamics will likely find this work informative, and its methods could be applied or extended to other structural dimensions of scholarly communication.

    3. Reviewer #2 (Public review):

      Summary:

      This paper uses large-scale publication data to examine the dynamics of interdisciplinarity and international collaborations in research journals. The main finding is that interdisciplinarity and internationalism have been increasing over the past decades, especially in prestigious general science journals.

      Strengths:

      The paper uses a state-of-the-art large-scale publication database to examine the dynamics of interdisciplinarity and internationalism. The analyses span over a century and in major scientific fields in natural sciences, engineering, and social sciences. The study is well designed and has provided a range of robustness tests to enhance the main findings. The writing is clear and well organized.

      Weaknesses:

      While the research provides interesting perspectives for the reader to learn about the trends of journal preferences, I have a few points for the authors to consider that might help strengthen their work.

      The first thing that comes to mind is the epistemic mechanism of the study. Why should there be a joint discussion combining internationalism and interdisciplinarity? While internationalism is the tendency to form multinational research teams to work on research projects, interdisciplinarity refers to the scope and focus of papers that draw inspiration from multiple fields. These concepts may both fall into the realm of diversity, but it remains unclear if there is any conceptual interplay that underlies the dynamics of their increase in research journals.

      It is also unclear why internationalization is increasing. Although the authors have provided a few prominent examples in physics, such as CERN and LAGO, which are complex and expensive experimental facilities that demand collective efforts and investments from the global scientific community, whether some similar concerns or factors drive the growth of internationalism in other fields remains unknown. I can imagine that these concerns do not always apply in many fields, and the authors need to come up with some case studies in diverse fields with some sociological theory to support their empirical findings.

      The authors use Shannon entropy as a measure of diversity for both internationalism and interdisciplinarity. However, entropy may fail to account for the uneven correlations between fields, and the range of value chances when the number of categories changes. The science of science and scientometrics community has proposed a range of diversity indicators, such as the Rao-Stirling index and its derivatives. One obvious advantage of the RS index is that it explicitly accounts for the heterogeneous connections between fields, and the value ranges from 0 to 1. Using more state-of-the-art metrics to quantify interdisciplinarity may help strengthen the data analytics.

    1. eLife Assessment

      This useful study examines excitation/inhibition (E/I) balance in the CA3-CA1 circuit of the hippocampus. Experimental and computational modeling results are presented, but these results provide incomplete evidence to support the paper's main claims due to shortcomings in the experimental and modeling approaches, as well as concerns about the neurobiological relevance of the results.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses optogenetics to activate CA3, while recording from CA1 neurons and characterizing the excitation/inhibition (E/I) balance. They observe use-dependent alterations in the E/I balance as a result of STP, and they develop a model to describe these observations. This is a very ambitious paper that deals with many issues using both experimental and modeling approaches.

      Strengths:

      This paper examines important principles regarding the manner in which synaptic circuitry and use-dependent synaptic plasticity can transform inputs and perform computations.

      Weaknesses:

      The use of selective ChR2 expression in CA3 cells is a good approach, but there are numerous issues that cause concern regarding the applicability of their slice recordings to physiological conditions and that make some aspects of their results difficult to interpret. Experiments are not performed under physiological conditions (high external calcium and low temperature), which makes the interpretation of their findings difficult. In addition, the reliability of stimulating action potentials in CA3 pyramidal cells needs to be determined, particularly during high-frequency trains. If it is unreliable, there are alternative approaches that might prove to be superior, such as the use of somatically targeted ChR2. In addition, a clearer, more detailed discussion of their model that distinguishes it from previous modeling studies would be helpful (and would make it seem less incremental).

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate EI balance in the CA3-CA1 projections, emphasizing synaptic depletion and the implied rebalancing of excitatory and inhibitory projections onto a single CA1 Pyramidal cell. They present physiological results with optical stimulation in CA3 and measuring various response features in CA1, showing signatures consistent with the adjustment of EI balance. In particular, the authors emphasize a transient effect where the neuron escapes from EI balance, which can be used for mismatch detection. They partially replicate these results in a computational model that looks at detailed properties of synaptic plasticity in CA1.

      Strengths:

      The authors provide compelling evidence that non-specific modulation of synaptic plasticity, combined with their differential effects on excitatory and inhibitory neurons, can be used by CA1 excitatory neurons to detect changes in the population activity of CA3 neurons. Indeed, they provide insight into the potential computational role of transient EI imbalance.

      Weaknesses:

      The authors observe that‬ "little‬‭ is‬‭ known‬‭ about‬‭ how‬‭ EI‬‭ balance‬ itself evolves dynamically due to activity-driven plasticity in sparsely active networks.‬" This is an overstatement, or better an understatement, given the extensive literature on EI balance (e.g. Wen W, Turrigiano GG. Keeping Your Brain in Balance: Homeostatic Regulation of Network Function. Ann Rev Neurosci. 2024. https://doi.org/10.1146/annurev-neuro-092523-110001 PMID:38382543). This way of framing the question does a disservice to the field and fails to contextualize the current research properly.

      The evidence is incomplete because the authors do not show a specific relationship between synaptic change in CA1 and EI balance adjustment, i.e., the alternative could be that this is an unspecific effect unrelated to the specific regulation of EI balance and its functional role in the hippocampus and the cortex. Indeed, the paper drifts from addressing EI balance to elucidating the mismatch detection. The second shortcoming is that they do not show that the stimulation of the CA3 neurons occurs in a physiologically realistic regime, nor do they analyze what the impact will be of the excitatory transient in "mismatch detection", and CA1, when this would occur at the level of the whole population, i.e., the physiological impossibility of triggering uncontrolled chaotic excitatory responses. In particular, when we consider CA3 as an attractor memory system, the range of deviations (mismatches) that a CA1 neuron can be exposed to and detect, given the model presented in this paper, might be below those generated due to CA3 pattern-completion dynamics. In addition, the match between the model and the physiological results is not fully quantified, leaving it to the reader to make a leap of faith.

      In addition, the manuscript suffers from poor analysis and presentation. The work could be improved by putting more effort into translating results into insightful metrics.

      Overall, the authors have not achieved their original aim to show that the observed phenomenon is relevant to computation in CA1 or the brain outside of a highly controlled in vitro setup and reductionist single cell model.

      The authors combine several techniques for in vitro whole-cell patch-clamp recordings with patterned optical stimulation of the CA3 network in the mouse hippocampus, which is consistent with the state-of-the-art.

      They introduce a metric of similarity between expected and observed response patterns, called gamma. The name is confusing given the wide use of the label gamma for oscillation frequencies above 20 Hz. Gamma is calculated as (E*O)/(E-O). This means that gamma approximates infinity as the difference goes to 0, to mention one of the problems. This metric is not interpretable, and it is not clear why the authors did not follow a standard approach, e.g., likelihood, correlation, or percent error.

      The authors aim to replicate the physiological results with an "abstract‬‭ model‬ of‬‭ the‬‭ hippocampal‬‭ FFEI‬‭ network. In practice, this is a conductance-based model of a single CA1 neuron, including chemical‬ kinetics-based‬‭ multi-step‬‭ neurotransmitter‬‭ vesicle‬‭ release‬‭. This is an abstraction from the FFEI network that the paper starts with. It raises the question whether this is the right level at which to model the computational impacts of EI imbalance on CA1 neurons. Given the highly reduced model they have elaborated, the generalization to the complete CA3-CA1 network that the authors suggest can be achieved in the discussion is overoptimistic. Network models of CA3 and C1 must be considered, together with afferents from the entorhinal cortex to accomplish this generalization.

      The authors reveal a potentially interesting physiological feature of CA1 excitatory neurons under very specific stimulus conditions. It could warrant follow-up studies to place EI imbalance in a physiologically realistic context.

    4. Reviewer #3 (Public review):

      Summary:

      This work shows experimentally and computationally that single CA1 neurons can perform mismatch detection on patterned CA3 inputs and that STP and EI balance underlie this detection.

      Strengths:

      It has been known that STP can enhance the EPSP when the corresponding presynaptic input exhibits abrupt changes in firing rate. This work provides experimental evidence and further computational support for the hypothesis that the basic computation through STP is useful for detecting abrupt changes in the spatial pattern of synaptic inputs at the Schaffer collaterals. Further, their results indicate the novel view that mismatch detection is most efficient when gamma-frequency bursting inputs exhibit mismatches between theta cycles.

      Weaknesses:

      Their model assumes that patterned activities in CA3 do not have overlaps. However, overlaps between memory engrams have been shown. Therefore, this assumption may not hold, and whether the proposed mechanism is valid for overlapping CA3 inputs needs further clarification.

    1. eLife Assessment

      This valuable study provides evidence that the integration of the nuclear envelope into the endoplasmic reticulum provides a mechanism for mechanical integration across this continuous membrane system. If robustly demonstrated, this work would open up new avenues for studying organelle membrane tension homeostasis. While the evidence is largely convincing and carefully quantified, a key limitation is the absence of data demonstrating that microinjection of cytoskeleton-depolymerizing drugs locally disrupts the target network.

    2. Reviewer #1 (Public review):

      Summary:

      Zare‑Eelanjegh et al. investigate how the endoplasmic reticulum, the nucleus, and the cell periphery are mechanically linked by indenting intact cells with specially shaped atomic‑force probes that double as drug injection devices. Fluorescence‑lifetime imaging of the membrane tension reporter Flipper‑TR reveals that these three compartments are mechanically linked and that the actin cytoskeleton, microtubules, and lamins modulate this coupling in complex ways.

      Strengths:

      (1) The study makes an important advance by applying FluidFM to probe organelle mechanics in living cells, a technically demanding but powerful approach.

      (2) Experimental design is quantitative, the data are clearly presented, and the conclusions are broadly consistent with the measurements.

      Weaknesses:

      (1) Calcium‑dependent effects: Indentation can evoke cytoplasmic Ca²⁺ elevations that drive myosin contraction and reshape the internal membrane network (e.g., vesiculation: PMID : 9200614, 32179693) possibly confounding the Flipper-TR responses; without simultaneous/matching Ca²⁺ imaging, cell viability assays (e.g., Sytox), and intracellular Ca²⁺ sequestration or myosin inhibition experiments, a more complex mechanochemical coupling cannot be excluded, weakening conclusions.

      (2) Baseline measurements: Flipper‑TR lifetime images acquired without indentation do not exclude potential light‑induced or time‑dependent changes, which weaken the conclusions.

      (3) Indentation depth versus nuclear stiffness/tension: Because lamin‑A/C depletion softens nuclei, a given force may produce a deeper pit and thus greater membrane stretch. It is unclear how the cytoskeletal perturbations affect indentation depth, which weakens the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      This useful study combines atomic force microscopy with genetic manipulations of the lamin meshwork and microinjection of cytoskeletal depolymerizing drugs to probe the mechanical responses of intracellular organelles to combinations of cytoskeletal perturbations. This study demonstrates both local and distal responses of intracellular organelles to mechanical forces and shows that these responses are affected by disruption of the actin, microtubule, and lamin cytoskeletal systems. Interpretation of these effects is limited by the absence of key data determining whether acute microinjection of cytoskeleton-depolymerizing drugs has complete or partial effects on the targeted cytoskeletal networks.

      Strengths:

      This study uses a sensitive micromanipulation system to apply and visualize the effects of force on intracellular organelles.

      Weaknesses:

      The choice to deliver cytoskeleton-depolymerizing drugs by local microinjection is unusual, and it is unclear to what extent actin and microtubule filaments are actually depolymerized immediately after microinjection and on the minutes-length timescale being evaluated in this study. This omission limits the interpretation of these data.

    4. Reviewer #3 (Public review):

      Summary:

      Using an approach developed by the authors (FluidFM) combined with FLIM, they discover that a mechanical force applied over the cell nucleus triggers mechanical responses dependent on the Lamina composition.

      Strengths:

      The authors present a new approach to study mechano-transduction in living cells, with which they uncover lamin-dependent properties of the nucleus.

      Weaknesses:

      (1) The transfer of the mechanical response from the Lamina to the ER is not fully covered.

      (2) In Figure 4D, WT dots are the same for each compartment. Why do the authors not make one graph for each compartment with WT, A-KO, B-KD, and A-KO/B-KD together?

      (2) In Figure 1E, the authors showed well how the probe deforms the nucleus. It is not indicated in the material and methods section or in the figure legend, where, in Z, the acquisition of FLIM images was made or if it is a maximum projection. I assume it was made at a plane in the middle of the nucleus to see the nuclear envelope border and the ER at the same time. Did the authors look at the nuclear membrane facing upward, where most of the deformation should occur? Are there more lifetime changes? In Figure D, before injection of CytoD, we can clearly see a difference at the pyramidal indentation site with two different lifetime colors.

      (3) A great result of this article regards the importance of Lamins, A and B, in triggering the response to a mechanical force applied to the nucleus. Could 3D imaging for LaminA and LaminB be performed at the different time points of indentation to see how the lamins meshworks are deformed and how they return to basal state? This could be correlated with the FLIM results described in the article.

      (4) Lamins form a meshwork underneath the nuclear membrane. They are connected to the cytoskeletons mainly by the LINC complex. Results presented here show that the cytoskeletons are implicated in transferring the stimulus from the nuclear envelope to the ER. Could the author perform the same experiments using Nesprin-2 or/and Nesprin-1 or/and SUN1/2 knockdowns to determine if this transmission is occurring through the LINC complex or rather in a passive way by modifying the nuclear close surroundings?

      (5) The authors used cytoskeleton drugs, CytoD and Nocodazole, with their FluidFM probe, but did not show if the drugs actually worked and to what extent by performing actin or microtubule stainings. In the original paper describing FluidFM, 15s were enough to obtain a full FITC-positive cell after injection. Here, the experiments are around 5 minutes long. I therefore interrogate the rationale behind the injection of the drugs compared to direct incubation, besides affecting only the cell currently under indentation.

    1. eLife Assessment

      This important study identifies a novel CRF-positive projection from the central amygdala and BNST to dorsal striatal cholinergic interneurons, revealing a previously unrecognized pathway by which stress signals modulate striatal function. The authors present strong and convincing evidence for the anatomical and functional connectivity of this circuit and demonstrate that alcohol disrupts CRF-mediated cholinergic activity, supporting its relevance to alcohol use disorder.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that corticotropin-releasing factor (CRF) neurons in the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) monosynaptically target cholinergic interneurons (CINs) in the dorsal striatum of rodents. Functionally, activation of CRFR1 receptors increases CIN firing rate, and this modulation was reduced by pre-exposure to ethanol. This is an interesting finding, with potential significance for alcohol use disorders, but some conclusions could use additional support.

      Strengths:

      Well-conceived circuit mapping experiments identify a novel pathway by which the CeA and BNST can modulate dorsal striatal function by controlling cholinergic tone. Important insight into how CRF, a neuropeptide that is important in mediating aspects of stress, affective/motivational processes, and drug-seeking, modulates dorsal striatal function.

      Weaknesses:

      (1) Tracing and expression experiments were performed both in mice and rats (in a mostly non-overlapping way). While these species are similar in many ways, some conclusions are based on assumptions of similarities that the presented data do not directly show. In most cases, this should be addressed in the text (but see point number 2).

      (2) Experiments in rats show that CRFR1 expression is largely confined to a subpopulation of striatal CINs. Is this true in mice, too? Since most electrophysiological experiments are done in various synaptic antagonists and/or TTX, it does not affect the interpretation of those data, but non-CIN expression of CRFR1 could potentially have a large impact on bath CRF-induced acetylcholine release.

      (3) Experiments in rats show that about 30% of CINs express CRFR1 in rats. Did only a similar percentage of CINs in mice respond to bath application of CRF? The effect sizes and error bars in Figure 5 imply that the majority of recorded CINs likely responded. Were exclusion criteria used in these experiments?

      (4) The conclusion that prior acute alcohol exposure reduces the ability of subsequent alcohol exposure to suppress CIN activity in the presence of CRF may be a bit overstated. In Figure 6D (no ethanol pre-exposure), ethanol does not fully suppress CIN firing rate to baseline after CRF exposure. The attenuated effect of CRF on CIN firing rate after ethanol pre-treatment (6E) may just reduce the maximum potential effect that ethanol can have on firing rate after CRF, due to a lowered starting point. It is possible that the lack of significant effect of ethanol after CRF in pre-treated mice is an issue of experimental sensitivity. Related to this point, does pre-treatment with ethanol reduce the later CIN response to acute ethanol application (in the absence of CRF)?

      (5) More details about the area of the dorsal striatum being examined would be helpful (i.e., a-p axis).

    3. Reviewer #2 (Public review):

      Summary:

      Essoh and colleagues present a thorough and elegant study identifying the central amygdala and BNST as key sources of CRF input to the dorsal striatum. Using monosynaptic rabies tracing and electrophysiology, they show direct connections to cholinergic interneurons. The study builds on previous findings that CRF increases CIN firing, extending them by measuring acetylcholine levels in slices and applying optogenetic stimulation of CRF+ fibers. It also uncovers a novel interaction between alcohol and CRF signaling in the striatum, likely to spark significant interest and future research.

      Strengths:

      A key strength is the integration of anatomical and functional approaches to demonstrate these projections and assess their impact on target cells, striatal cholinergic interneurons.

      Weaknesses:

      The nature of the interaction between alcohol and CRF actions on cholinergic neurons remains unclear. Also, further clarification of the ACh sensor used and others is required

    4. Reviewer #3 (Public review):

      Summary:

      The authors demonstrate that CRF neurons in the extended amygdala form GABAergic synapses onto cholinergic interneurons and that CRF can excite these neurons. The evidence is strong, however, the authors fail to make a compelling connection showing CRF released from these extended amygdala neurons is mediating any of these effects. Further, they show that acute alcohol appears to modulate this action, although the effect size is not particularly robust.

      Strengths:

      This is an exciting connection from the extended amygdala to the striatum that provides a new direction for how these regions can modulate behavior. The work is rigorous and well done.

      Weaknesses:

      While the authors show that opto stim of these neurons can increase firing, this is not shown to be CRFR1 dependent. In addition, the effects of acute ethanol are not particularly robust or rigorously evaluated. Further, the opto stim experiments are conducted in an Ai32 mouse, so it is impossible to determine if that is from CEA and BNST, vs. another population of CRF-containing neurons. This is an important caveat.

    5. Reviewer #4 (Public review):

      Summary:

      This manuscript presents a compelling and methodologically rigorous investigation into how corticotropin-releasing factor (CRF) modulates cholinergic interneurons (CINs) in the dorsal striatum - a brain region central to cognitive flexibility and action selection-and how this circuit is disrupted by alcohol exposure. Through an integrated series of anatomical, optogenetic, electrophysiological, and imaging experiments, the authors uncover a previously uncharacterized CRF⁺ projection from the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) to dorsal striatal CINs.

      Strengths:

      Key strengths of the study include the use of state-of-the-art monosynaptic rabies tracing, CRF-Cre transgenic models, CRFR1 reporter lines, and functional validation of synaptic connectivity and neurotransmitter release. The finding that CRF enhances CIN excitability and acetylcholine (ACh) release via CRFR1, and that this effect is attenuated by acute alcohol exposure and withdrawal, provides important mechanistic insight into how stress and alcohol interact to impair striatal function. These results position CRF signaling in CINs as a novel contributor to alcohol use disorder (AUD) pathophysiology, with implications for relapse vulnerability and cognitive inflexibility associated with chronic alcohol intake.

      The study is well-structured, with a clear rationale, thorough methodology, and logical progression of results. The discussion effectively contextualizes the findings within broader addiction neuroscience literature and suggests meaningful future directions, including therapeutic targeting of CRFR1 signaling in the dorsal striatum.

      Weaknesses:

      Minor areas for improvement include occasional redundancy in phrasing, slightly overlong descriptions in the abstract and significance sections, and a need for more concise language in some places. Nevertheless, these do not detract from the manuscript's overall quality or impact.

      Overall, this is a highly valuable contribution to the fields of addiction neuroscience and striatal circuit function, offering novel insights into stress-alcohol interactions at the cellular and circuit level, which requires minor editorial revisions.

    1. eLife Assessment

      This important study presents a meta-analysis confirming a statistically significant association between slow oscillation-spindle coupling and memory formation, although the reported effects are limited (~0.5% of variance). The evidence is overall convincing, but the statistical methods may be difficult to follow for readers unfamiliar with advanced techniques. This work will be of particular interest to neuroscientists studying the neural mechanisms of sleep and memory.

    2. Reviewer #1 (Public review):

      In this meta-analysis, Ng and colleagues review the association between slow-oscillation spindle coupling during sleep and overnight memory consolidation. The coupling of these oscillations (and also hippocampal sharp-wave ripples) have been central to theories and mechanistic models of active systems consolidation, that posit that the coupling between ripples, spindles, and slow oscillations (SOs) coordinate and drive the coordinated reactivation of memories in hippocampus and cortex, facilitating cross-regional information and ultimately memory strengthening and stabilisation.

      Given the importance that these coupling mechanisms have been given in theory, this is a timely and important contribution to the literature in terms of determining whether these theoretical assumptions hold true in human data. The results show that the timing of sleep spindles relative to the SO phase, and the consistency of that timing, predicted overnight memory consolidation in meta-analytic models. The overall amount of coupling events did not show as strong a relationship. Coupling phase in particular was moderated by a number of variables including spindle type (fast, slow), channel location (frontal, central, posterior), age, and memory type. The main takeaway is that fast spindles that consistently couple close to the peak of the SO in frontal channel locations are optimal for memory consolidation, in line with theoretical predictions. These findings will be very useful for future researchers in terms of determining necessary sample sizes to observe coupling - memory relationships, and in the selection and reporting of relevant coupling metrics.

      Although the meta-analysis covers the three main coupling metrics that are typically assessed (occurrence, timing, and consistency), the meta-analysis also includes spindle amplitude. This may be confusing to readers, as this is not a measurement of SO-spindle coupling but instead a measurement of spindles in general (which may or may not be coupled).

    3. Reviewer #2 (Public review):

      This article reviews the studies on the relationship between slow oscillation (SO)-spindle (SP) coupling and memory consolidation. It innovatively employs non-normal circular linear correlations through a Bayesian meta-analysis. A systematic analysis of the retrieved studies highlighted that co-coupling of SO and the fast SP's phase and amplitude at the frontal part better predicts memory consolidation performance.

      Regarding the moderator of age, this study not only provided evidence of the effect across all age groups but also the effect in a younger age group (without the small sample of elders that has a large gap from the younger age groups). The ageing effects become less pronounced, but the model still shows a moderate effect.

    4. Reviewer #3 (Public review):

      This manuscript presents a meta-analysis of 23 studies, which report 297 effect sizes, on the effect of SO-spindle coupling on memory performance. The analysis has been done with great care, and the results are described in great detail. In particular, there are separate analyses for coupling phase, spindle amplitude, coupling strength (e.g., measured by vector length or modulation index), and coupling percentage (i.e., the percentage of SPs coupled with SOs). The authors conclude that the precision and strength of coupling showed significant correlations with memory retention.

      There are two main points where I do not agree with the authors.

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect" (0.10<br /> Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is not. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent. This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots), which makes the analyses unnecessarily opaque. It is commendable that the authors now provide classical forest plots as Figs. S10.1-4.

      However, analyses that require a "Markov chain Monte Carlo (MCMC) method, [..] with the no-U-turn Hamiltonian Monte Carlo (HMC) samplers, [..] with each chain undergoing 12,000 iterations (including 2,000 warm-ups)" for calculating accurate Bayes Factors (BF), and checking its convergence "through graphical posterior predictive checks, [..] trace plots, and [..] Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" still seems overly complex. It follows a recent trend in using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because methods (including open source software toolboxes) can no longer be checked with reasonable effort.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Given the importance that these coupling mechanisms have been given in theory, this is a timely and important contribution to the literature in terms of determining whether these theoretical assumptions hold true in human data.

      Thank you!

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      Indeed, spindle amplitude refers to all spindle events rather than only coupled spindles. This choice was made because we recognized the challenge of obtaining relevant data from each study—only 4 out of the 23 included studies performed their analyses after separating coupled and uncoupled spindles. This inconsistency strengthens the urgency and importance of this meta-analysis to standardize the methods and measures used for future analysis on SO-SP coupling and beyond. We agree that focusing on the amplitude of coupled spindles would better reveal their relations with coupling, and we have discussed this limitation in the manuscript.

      Nevertheless, we believe including spindle amplitude in our study remains valuable, as it served several purposes. First, SO-SP coupling involves the modulation between spindle amplitude and slow oscillation phase. Different studies have reported conflicting conclusions regarding how overall spindle amplitude was related to coupling as an indicator of oscillation strength overnight– some found significant correlations (e.g., Baena et al., 2023), while others did not (e.g., Roebber et al., 2022). This discrepancy highlights an indirect but potentially crucial insight into the role of spindle amplitude in coupling dynamics. Second, in studies related to SO-SP coupling, spindle amplitude is one of the most frequently reported measures along with other coupling measures that significantly correlated with oversleep memory improvements (e.g. Kurz et al., 2023; Ladenbauer et al., 2021; Niknazar et al., 2015), so we believe that including this measure can provide a more comprehensively review of the existing literature on SO-SP coupling. Third, incorporating spindle amplitude allows for a direct comparison between the measurement of coupling and individual events alone in their contribution to memory consolidation– a question that has been extensively explored in recent research. (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023). Finally, spindle amplitude was identified as the most important moderator for memory consolidation in Kumral et al.'s (2023) meta-analysis. By including it in our analysis, we sought to replicate their findings within a broader framework and introduce conceptual overlaps with existing reviews. Therefore, although we were not able to selectively include coupled spindles, there is still a unique relation between spindle amplitude and SO-SP coupling that other spindle measures do not have. 

      Originally, we also intended to include coupling density or counts in the analysis, which seems more relevant to the coupling metrics. However, the lack of uniformity in methods used to measure coupling density posed a significant limitation. We hope that our study will encourage consistent reporting of all relevant parameters in future research, allowing future meta-analyses to incorporate these measures comprehensively. We have added this discussion to the revised version of the manuscript (p. 3) to further clarify these points.

      All other citations were referenced in the manuscript.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      While primary coupling measurements, including coupling phase and strength, showed strong evidence for their associations with memory consolidation, measures of spindles, including spindle amplitude, only exhibited limited evidence (or “non-significant” effect) for their association with consolidation. These results are consistent with multiple empirical studies using different techniques (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023), which reported that coupling metrics are more robust predictors of consolidation and synaptic plasticity than spindle or slow oscillation metrics alone. However, we agree with the reviewer that we did not directly separate the effect between coupled and uncoupled spindles, and a more precise comparison would involve contrasting the “coupling of oscillation events” with ”individual oscillation events” rather than coupling versus isolated events.

      We recognized that Kumral and colleagues’ meta-analysis reported a moderate association between spindle measures and memory consolidation (e.g., for spindle amplitude-memory association they reported an effect size of approximately r = 0.30). However, one of the advantages of our study is that we actively cooperated with the authors to obtain a large number of unreported and insignificant data relevant to our analysis, as well as separated data that were originally reported under mixed conditions. This approach decreases the risk of false positives and selective reporting of results, making the effect size more likely to approach the true value. In contrast, we found only a weak effect size of r = 0.07 with minimal evidence for spindle amplitude-memory relation. However, we agree with the reviewer that using a more conservative term in this context would be a better choice since we did not measure all relevant spindle metrics including the density.

      To improve clarity in our manuscript, we have revised the statement to: “Together with other studies included in the review, our results suggest a crucial role of coupling but did not support the role of spindle events alone in memory consolidation,” and provide relevant references (p. 13). We believe this can more accurately reflect our findings and the existing literature to address the reviewer’s concern.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      Regarding the topography of fast spindles and their relationship to memory consolidation, we agree this is an intriguing issue, and we have already developed significant progress in this topic in our ongoing work, and have found evidence that participants with a more frontal topography of fast spindles show better overnight consolidation. These findings will be presented in our future publications. We share a few relevant observations: First, there are significant discrepancies in the definition of “slow spindle” in the field. Some studies defined slow spindle from 9-12 Hz (e.g. Mölle et al., 2011; Kurz et al., 2021), while others performed the event detection within a range of 11-13/14 Hz and found a frontal-dominated topography (e.g. Barakat et al., 2011; D'Atri et al., 2018). Compounding this issue, individual and age differences in spindle frequency are often overlooked, leading to challenges in reliably distinguishing between slow and fast spindles. Some studies have reported difficulty in clearly separating the two types of spindles altogether (e.g., Hahn et al., 2020). Moreover, a critical factor often ignored in past research is the propagating nature of both slow oscillations and spindles across the cortex, where spindles are coupled with significantly different phases of slow oscillations (see Figure 5). In addition, the frontal region has the strongest and most active SOs as its origin site, which may contribute to the role of frontal coupling. In contrast, not all SOs propagate from PFC to centro-parietal sites. The reviewer also raised an interesting idea that slow spindles would be perfectly suited for memory consolidation given their frontal distribution. We propose that one possible explanation is that if SOs couple exclusively with slow SPs, they may lose their ability to coordinate inter-area activity between centro-parietal and frontal regions, which could play a critical role in long-range memory transmission across hippocampus, thalamus, and prefrontal cortex. This hypothesis requires investigation in future studies. We believe a better understanding of coupling in the context of the propagation of these waves will help us better understand the observed frontal relationship with consolidation. Therefore, we believe this result supports our conclusion that coupling precision is more important than intensity, and we have addressed this in revised manuscript (pp. 15-16).

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

      There are indeed multiple methods that we can discuss, including cluster-based and non-parametric methods, etc., to correct for multiple comparisons in EEG data with spatiotemporal structures. In addition, encouraging the reporting of all tested but insignificant results, at least in supplementary materials, is an important practice that helps readers understand the findings with reduced bias. We agree with the reviewer’s suggestions and have added more information in section 3.4-3.5 (p. 17) to advocate for a standardized “template” used to report effect sizes and correct multiple comparisions in future research.

      We advocate for the standardization of reporting all three coupling metrics– phase, strength, and prevalence (density, count, and/or percentage coupled). Each coupling metric captures distinct a property of the coupling process and may interact with one another (Weiner et al., 2023). Therefore, we believe it is essential to report all three metrics to comprehensively explore their different roles in the “how, what, and where” of long-distance communication and consolidation of memory. As we advance toward a deeper understanding of the relationship between memory and sleep, we hope this work establishes a standard for the standardization, transparency, and replication of relevant studies.

      Reviewer #2 (Public review):

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

      We agree that there is an age gap between younger and older adults, as current studies often focus on contrasting newly matured and fully aged populations to amplify the effect, while neglecting the gradual changes in memory consolidation mechanisms across the aging spectrum. We suggest that a non-linear analysis of age effects would be highly valuable, particularly when additional child and older adult data become available.

      In response to the reviewer’s suggestion, we re-tested the moderation effect of age after excluding effect sizes from older adults. The results revealed a decrease in the strength of evidence for phase-memory association due to increased variability, but were consistent for all other coupling parameters. The mean estimations also remained consistent (coupling phase-memory relation: -0.005 [-0.013, 0.004], BF10 = 5.51, the strength of evidence reduced from strong to moderate; coupling strength-memory relation: -0.005 [-0.015, 0.008], BF10 = 4.05, the strength of evidence remained moderate). These findings align with prior research, which typically observed a weak coupling-memory relationship in older adults during aging (Ladenbauer et al, 2021; Weiner et al., 2023) but not during development (Hahn et al., 2020; Kurz et al., 2021; Kurz et al., 2023). Therefore, this result is not surprising to us, and there are still observable moderate patterns in the data. We have reported these additional results in the revised manuscript (pp. 6, 11), and interpret “the moderator effect of age in the phase-memory association becomes less pronounced during development after excluding the older adult data”. We believe the original findings including the older adult group remain meaningful after cautious interpretation, given that the older adult data were derived from multiple studies and different groups, and they represent the aging effects.

      Reviewer #3 (Public review):

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect”.

      While we acknowledge the concern about the small effect sizes reported in our study, it is important to contextualize these findings within the field of neuroscience, particularly memory research. Even in individual studies, small effect sizes are not uncommon due to the inherent complexity of the mechanisms involved and the multitude of confounding variables. This is an important factor to be considered in meta-analyses where we synthesize data from diverse populations and experimental conditions. For example, the relationship between SO-slow SP coupling and memory consolidation in older adults is expected to be insignificant.

      As Funder and Ozer (2019) concluded in their highly cited paper, an effect size of r = 0.3 in psychological and related fields should be considered large, with r = 0.4 or greater likely representing an overestimation and rarely found in a large sample or a replication. Therefore, we believe r = 0.1 should not be considered as a lower bound of the small effect. Bakker et al. (2019) also advocate for a contextual interpretation of the effect size. This is particularly important in meta-analyses, where the results are less prone to overestimation compared to individual studies, and we cooperated with all authors to include a large number of unreported and insignificant results. In this context, small correlations may contain substantial meaningful information to interpret. Although we agree that effect sizes reported in our study are indeed small at the overall level, they reflect a rigorous analysis that incorporates robust evidence across different levels of moderators. Our moderator analyses underscore the dynamic nature of coupling-memory relationships, with stronger associations observed in moderator subgroups that have historically exhibited better memory performance, particularly after excluding slow spindles and older adults. For example, both the coupling phase and strength of frontal fast spindles with slow oscillations exhibited "moderate-to-large" correlations with the consolidation of different types of memory, especially in young adults, with r values ranging from 0.18 to 0.32. (see Table S9.1-9.4). We have included discussion about the influence of moderators and hierarchical structures on the dynamics of coupling-memory associations (pp. 17, 20). In addition, we have updated the conclusion to be “SO-fast SP coupling should be considered as a general physiological mechanism for memory consolidation” (p. 1).

      Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent.

      This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information).

      We appreciate the reviewer for sharing this viewpoint and we value the opportunity to clarify some key points. To address the concern about clarity, we have included more details in the methods section explaining how to interpret Bayesian statistics including priors, posteriors, and Bayes factors, making our results more accessible to those less familiar with this approach.

      On the use of Bayesian models, we believe there may have been a misunderstanding. Bayesian methods, far from being "opaque" or overly complex, are increasingly valued for their ability to provide nuanced, accurate, and transparent inferences (Sutton & Abrams, 2001; Hackenberger, 2020; van de Schoot et al., 2021; Smith et al., 1995; Kruschke & Liddell, 2018). It has been applied in more than 1,200 meta-analyses as of 2020 (Hackenberger, 2020). In our study, we used priors that assume no effect (mean set to 0, which aligns with the null) while allowing for a wide range of variation to account for large uncertainties. This approach reduces the risk of overestimation or false positives and demonstrates much-improved performance over traditional methods in handling variability (Williams et al., 2018; Kruschke & Liddell, 2018). In addition, priors can also increase transparency, since all assumptions are formally encoded and open to critique or sensitivity analysis. In contrast, frequentist methods often rely on hidden or implicit assumptions such as homogeneity of variance, fixed-effects models, and independence of observations that are not directly testable. Sensitivity analyses reported in the supplemental material (Table S9.1-9.4) confirmed the robustness of our choices of priors– our results did not vary by setting different priors.

      As Kruschke and Liddell (2018) described, “shrinkage (pulling extreme estimates closer to group averages) helps prevent false alarms caused by random conspiracies of rogue outlying data,” a well-known advantage of Bayesian over traditional approaches. This explains the observed differences between the distributions and grey dots in the forest plots, which is an advantage of Bayesian models in handling heterogeneity. Unlike p-values, which can be overestimated with a large sample size and underestimated with a small sample size, Bayesian methods make assumptions explicit, enabling others to challenge or refine them– an approach aligned with open science principles (van de Schoot et al., 2021). For example, a credible interval in Bayesian model can be interpreted as “there is a 95% probability that the parameter lies within the interval.”, while a confidence interval in frequentist model means “In repeated experiments, 95% of the confidence intervals will contain the true value.” We believe the former is much more straightforward and convincing for readers to interpret. We will ensure our justification for using Bayesian models is more clearly presented in the manuscript (pp. 21-23).

      We acknowledge that even with these justifications, different researchers may still have discrepancies in their preferences for Bayesian and frequentist models. To increase the effort of transparent reporting, we have also reported the traditional frequentist meta-analysis results in Supplemental Material 10 to justify the robustness of our analysis, which suggested non-significant differences between Bayesian and frequentist models. We have included clearer references in the updated version of the manuscript to direct readers to the figures that report the statistics provided by traditional models.

      However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. 

      We appreciate the reviewer’s concerns about accessibility and potential complexity in our descriptions of Bayesian methods. Our decision to provide a detailed account serves to enhance transparency and guide readers interested in replicating our study. We acknowledge that some terms may initially seem overwhelming. These steps, such as checking the MCMC chain convergence and robustness checks, are standard practices in Bayesian research and are analogous to “linearity”, “normality” and “equal variance” checks in frequentist analysis. In addition, Hamiltonian Monte Carlo (HMC) is the default algorithm Stan (the software we used to fit Bayesian models) uses to sample from the posterior distribution in Bayesian models. It is a type of MCMC method designed to be faster and more efficient than traditional sampling algorithms, especially for complex or high-dimensional models. We have added exemplary plots in the supplemental material S4.1-4.3 and the method section (pp. 21-22) to explain the results and interpretation of these convergence checks. We hope this will help address any concerns about methodological rigor.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

      First, we realized that in the title of Figures 12.2 and 12.3. “true r = 0.35” and “true r = 0.65” should be corrected as “true r_z” (note that we use r_z instead of Z_r in the revised manuscript per your suggestion). The method we used here is to first generate an underlying population that has null (0), moderate (0.35), or large (0.65) r_z correlations, then test whether the sampling distribution drawn from these populations followed a normal distribution across varying sample sizes. Nevertheless, the reviewer correctly noticed discrepancies between the reported true r_z and its sampling distribution peak. This discrepancy arises because, when generating large population data, achieving exact values close to a strong correlation like r_z = 0.65 is unlikely. We loop through simulations to generate population data and ensure their r_z values fall within a threshold. For moderate effect sizes (e.g., r_z = 0.35), this is straightforward using a narrow range (0.34 < r_z < 0.35). However, for larger effect sizes like r_z = 0.65, a wider range (0.6 < r_z < 0.7) is required. therefore sometimes the population we used to draw the sample has a r_z slightly deviated from 0.65. This remains reasonable since the main point of this analysis is to ensure that a large r_z still has a normal sampling distribution, but not focus specifically on achieving r_z = 0.65.

      We acknowledge that this variability of the range used was not clearly explained in supplemental material 12 and it is not accurate to report “true r_z = 0.65”. In the revised version, we have addressed this issue by adding vertical lines to each subplot to indicate the r_z of the population we used to draw samples, making it easier to check if it aligns with the sampling peak. In addition, we have revised the title to “Sampling distributions of r_z drawn from strong correlations

      (r_z = 0.6-0.7)”. We confirmed that population r_z and the peak of their sampling distribution remain consistent under both H0 and H1 in all sample sizes with n > 25, and we hope this explanation can fully resolve your concern.

      We agree with the reviewer that claiming r_z = -1 represents the null hypothesis is not accurate. The circlin r_z = 0 is better analogous to Pearson’s r = 0 since both represent the mean drawn from the population under the null hypothesis. In contrast, the mean effect size under null will be positive in the raw circlin r, which is one of the important reasons for the transformation. To provide a more accurate interpretation, we updated Table 6 to describe the following strength levels of evidence: no effect (r < 0), null (r = 0), small (r = 0.1), moderate (r = 0.3), and large (r =0.5). We thank the reviewer again for their valuable feedback.

      Reviewer #2 (Recommendations for the authors):

      (1) There is an extra space in the Notes of Figure 1. "SW R sharp-wave ripple.".

      We thank the reviewer for pointing this out. We have confirmed that the "extra space" is not an actual error but a result of how italicized Times New Roman font is rendered in the LaTeX format. We believe that the journal’s formatting process will resolve this issue.

      (2) In the introduction, slow oscillations (SO) are defined with a frequency of 0.16-4 Hz, sleep spindles (SP) at 8-16 Hz, and sharp-wave ripples (SWR) at 80-300 Hz. The term "fast oscillation" (FO) is first introduced with the clarification "SPs in our case." However, on page 2, the authors state, "SO-FO coupling involving SWRs, SPs, and SOs..." There seems to be a discrepancy in the definition of FO; does it consistently refer to SPs and SWRs throughout the article?

      We appreciate the reviewer’s observation regarding the potential ambiguity of the term "FO." In our manuscript, "FO" is used as a general term to describe the interaction of a "relatively faster oscillation" with a "relatively slower oscillation" in the phase-amplitude coupling mechanism, therefore it is not intended to exclusively refer to SPs or SWRs. For example, it is usually used to describe SO–SP–SWR couplings during sleep memory studies, but Theta–Alpha–Gamma couplings in wakeful memory studies. To address this confusion, we removed the phrase "SPs in our case" and explicitly use "SPs" when referring to spindles. In addition, we have replaced "fast oscillation" with "faster oscillation" to emphasize that it is used in a relative sense (p. 1), rather than to refer to a specific oscillation. Also, we only retained the term “FO” when introducing the PAC mechanism.

      (3) On page 2, the first paragraph contains the phrase: "...which occur in the precise hierarchical temporal structure of SO-FO coupling involving SWRs, SPs, and SOs ..." Since "SO-FO" refers to slow and fast oscillations, it is better to maintain the order of frequencies, suggesting it as: SOs, SPs, and SWRs.

      We sincerely thank the reviewer for their valuable suggestion. We have updated the sentence to maintain the correct order from the lowest to the highest frequencies in the revised version (p. 2).

      (4) References should be provided:

      a “Studies using calcium imaging after SP stimulation explained the significance of the precise coupling phase for synaptic plasticity.".

      b. "Electrophysiology evidence indicates that the association between memory consolidation and SO-SP coupling is influenced by a variety of behavioral and physiological factors under different conditions."

      c. "Since some studies found that fast SPs predominate in the centroparietal region, while slow SPs are more common in the frontal region, a significant amount of studies only extracted specific types of SPs from limited electrodes. Some studies even averaged all electrodes to estimate coupling..."

      This is a great point.  These have been referenced as follows:

      a. Rephrased: “Studies using calcium imaging and SP stimulation explained the significance of the precise coupling phase for synaptic plasticity.” We changed “after” to “and” to reflect that these were conducted as two separate experiments. This is a summary statement, with relevant citations provided in the following two sentences of the paragraph, including Niethard et al., 2018, and Rosanova et al., 2005. (p. 2)

      b. Included diverse sources of evidence: “Electrophysiology evidence from studies included in our meta-analysis (e.g. Denis et al., 2021; Hahn et al., 2020; Mylonas et al., 2020) and others (e.g. Bartsch et al., 2019; Muehlroth et al., 2019; Rodheim et al., 2023) reported that the association between memory consolidation and SO-SP coupling is influenced by a variety of behavioral and physiological factors under different conditions.” (p. 3)

      c. Added references and more details: “Since some studies found that fast SPs predominate in the centroparietal region, while slow SPs are more common in the frontal region, a significant amount of studies selectively extracted specific types of SPs from limited electrodes (e.g. Dehnavi et al., 2021; Perrault et al., 2019; Schreiner et al., 2021). Some studies even averaged all electrodes in their spectral and/or time-series analysis to estimate metrics of oscillations and their couplings (e.g. Denis et al., 2022; Mölle et al., 2011; Nicolas et al., 2022).” (p. 4)

      Reviewer #3 (Recommendations for the authors):

      There are a number of terms that are not clearly defined or used:

      (1) SP amplitude. Does this mean only the amplitude of coupled spindles or of spindles in general?

      This refers to the amplitude of spindles in general. We clarified this in the revised text (and see response to reviewer #1, point #1).

      (2) The definition of a small effect

      We thank the reviewer again for raising this important question. As we responded in the public review, small effect sizes are common in neuroscience and meta-analyses due to the complexity of the underlying mechanisms and the presence of numerous confounding variables and hierarchical levels. To help readers better interpret effect sizes, we changed rigid ranges to widely accepted benchmarks for effect size levels in neuroscience research: small (r=0.1), moderate (r=0.3), and large (r=0.5; Cohen, 1988). We also noted that an evidence and context-based framework will provide a more practical way to interpret the observed effect sizes compared to rigid categorizations.

      (3) Can a BF10 based on experimental evidence actually be "infinite" and a probability actually be 1.00?

      We appreciate the reviewer for highlighting this potential confusion. The formula used to calculate BF10 is P(data | H1) / P(data | H0). In the experimental setting with an informative prior, an ‘infinite’ BF10 value indicates that all posterior samples are overwhelmingly compatible with H1 given the data and assumptions (Cox et al., 2023; Heck et al., 2023; Ly et al., 2016). In such cases, the denominator P(data | H0) becomes vanishingly small, leading BF10 to converge to infinity. This scenario occurs when the probability of H1 converges to 1 (e.g., 0.9999999999…).

      It is a well-established convention in Bayesian statistics to report the Bayes factor as "infinity" in cases where the evidence is overwhelmingly strong, and BF10 exceeds the numerical limits of the computation tools to become effectively infinite. To address this ambiguity, we added a footnote in the revised version of the manuscript to clarify the interpretation of an 'infinite' BF10 . (p. 8)

      (4) Z_r should be renamed to r_z or similar. These are not Z values (-inf..+inf), but r values (-1..1).

      We thank the reviewers for their suggestions. We agree that r_z would provide a clearer and more accurate interpretation, while z is more appropriate for referring to Fisher's z-transformed r (see point (5)). We have updated the notation accordingly.

      (5) Also, it remains quite unclear at which points in the analyses, "r" values or "Fisher's z transformed r" values are used. Assumptions of normality should only apply to the transformed values. However, the formulas for the random effects model seem to assume normality for r values.

      The correlation values were z-transformed during preprocessing to ensure normality and the correct estimation of sampling variances before running the models. The outputs were then back-transformed to raw r values only when reporting the results to help readers interpret the effect size. We mentioned this in Section 5.5.1, therefore the normality assumptions are not a concern. We have updated the notation r to z (-inf..+inf) in the formula of the random and mixed effect models in the revised version of the manuscript (p. 22).

      Language

      (1) Frequency. In the introduction, the authors use "frequency" when they mean something like the incidence of spindles.

      We agree that the term "frequency" has been used inconsistently to describe both the incidence of events and the frequency bands of oscillations. We have replaced "frequency" with "prevalence" to refer to the incidence of coupling events where applicable (p. 3).

      (2) Moderate and mediate. These two terms are usually meant to indicate two different types of causal influences.

      Thanks for the reviewer’s suggestions. We agree that "moderate" is more appropriate to describe moderators in this study since it does not directly imply causality. We have replaced mediate with moderate in relevant contexts.

      (3) "the moderate effect of memory task is relatively weak": "moderator effect" or "moderate effect"?

      We appreciate the reviewer for pointing out this mistake. We have updated the term to "moderator effect" in Section 2.2.2 (p. 6).

      (4) "in frontal regions we found a latest coupled but most precise and strong SO-fast SP coupling" Meaning?

      We thank the reviewer for bringing this concern of clarity to our attention. By 'latest,' we refer to the delayed phase of SO-fast SP coupling observed in the frontal regions compared to the central and parietal regions (see Figure 5), "Precise and strong" describes the high precision and strength of phase-locking between the SO up-state and the fast SP peak in these regions. We have rephrased this sentence to be: “We found that SO-fast SP coupling in the frontal region occurred at the latest phase observed across all regions, characterized by the highest precision and strength of phase-locking.” to improve clarity (p. 9).

      (5) Figure 5 and others contain angles in degrees and radians.

      We appreciate the reviewer pointing out this inconsistency. We have updated the manuscript and supplementary material to consistently use radians throughout.

    1. eLife Assessment

      This well-designed study combining psychophysical and fMRI data presents a valuable finding regarding how adaptation alters spatial frequency processing in the cortex. The evidence supporting the claims of the authors is solid, although inclusion of more participants and better quality of the fMRI data would have strengthened the study. The study will be of interest to cognitive and perceptual neuroscientists working on human and non-human primates.

    2. Reviewer #2 (Public review):

      The revised manuscript by Altan et al. includes some real improvements to the visualizations and explanations of the authors' thesis statement with respect to fMRI measurements of pRF sizes. In particular, the deposition of the paper's data has allowed me to probe and refine several of my previous concerns. While I still have major concerns about how the data are presented in the current draft of the manuscript, my skepticism about data quality overall has been much alleviated. Note that this review focuses almost exclusively on the fMRI data as I was satisfied with the quality of the psychophysical data and analyses in my previous review.

      Major Concerns

      (I) Statistical Analysis

      In my previous review, I raised the concern that the small sample size combined with the noisiness of the fMRI data, a lack of clarity about some of the statistics, and a lack of code/data likely combine to make this paper difficult or impossible to reproduce as it stands. The authors have since addressed several aspects of this concern, most importantly by depositing their data. However their response leaves some major questions, which I detail below.

      First of all, the authors claim in their response to the previous review that the small sample size is not an issue because large samples are not necessary to obtain "conclusive" results. They are, of course, technically correct that a small sample size can yield significant results, but the response misses the point entirely. In fact, small samples are more likely than large samples to erroneously yield a significant result (Button et al., 2013, DOI:10.1038/nrn3475), especially when noise is high. The response by the authors cites Schwarzkopf & Huang (2024) to support their methods on this front. After reading the paper, I fail to see how it is at all relevant to the manuscript at hand or the criticism raised in the previous review. Schwarzkopf & Huang propose a statistical framework that is narrowly tailored to situations where one is already certain that some phenomenon (like the adaptation of pRF size to spatial frequency) either always occurs or never occurs. Such a framework is invalid if one cannot be certain that, for example, pRF size adapts in 98% of people but not the remaining 2%. Even if the paper were relevant to the current study, the authors don't cite this paper, use its framework, or admit the assumptions it requires in the current manuscript. The observation that a small dataset can theoretically lead to significance under a set of assumptions not appropriate for the current manuscript is not a serious response to the concern that this manuscript may not be reproducible.

      To overcome this concern, the authors should provide clear descriptions of their statistical analyses and explanations of why these analyses are appropriate for the data. Ideally, source code should be published that demonstrates how the statistical tests were run on the published data. (I was unable to find any such source code in the OSF repository.) If the effects in the paper were much stronger, this level of rigor might not be strictly necessary, but the data currently give the impression of being right near the boundary of significance, and the manuscript's analyses needs to reflect that. The descriptions in the text were helpful, but I was only able to approximately reproduce the authors analyses based on these descriptions alone. Specifically, I attempted to reproduce the Mood's median tests described in the second paragraph of section 3.2 after filtering the data based on the criteria described in the final paragraph of section 3.1. I found that 7/8 (V1), 7/8 (V2), 5/8 (V3), 5/8 (V4), and 4/8 (V3A) subjects passed the median test when accounting for the (40) multiple comparisons. These results are reasonably close to those reported in the manuscript and might just differ based on the multiple comparisons strategy used (which I did not find documented in the manuscript). However, Mood's median test does not test the direction of the difference-just whether the medians are different-so I additionally required that the median sigma of the high-adapted pRFs be greater than that of the low-adapted pRFs. Surprisingly, in V1 and V3, one subject each (not the same subject) failed this part of the test, meaning that they had significant differences between conditions but in the wrong direction. This leaves 6/8 (V1), 7/8 (V2), 4/8 (V3), 5/8 (V4), and 4/8 (V3A) subjects that appear to support the authors' conclusions. As the authors mention, however, this set of analyses runs the risk of comparing different parts of cortex, so I also performed Wilcox signed-rank tests on the (paired) vertex data for which both the high-adapted and low-adapted conditions passed all the authors' stated thresholds. These results largely agreed with the median test (only 5/8 subjects significant in V1 but 6/8 in in V3A, other areas the same, though the two tests did not always agree which subjects had significant differences). These analyses were of course performed by a reviewer with a reviewer's time commitment to the project and shouldn't be considered a replacement for the authors' expertise with their own data. If the authors think that I have made a mistake in these calculations, then the best way to refute them would be to publish the source code they used to threshold the data and to perform the same tests.

      Setting aside the precise values of the relevant tests, we should also consider whether 5 of 8 subjects showing a significant effect (as they report for V3, for example) should count as significant evidence of the effect? If one assumes, as a null hypothesis, that there is no difference between the two conditions in V3 and that all differences are purely noise, then a binomial test across subjects would be appropriate. Even if 6 of 8 subjects show the effect, however (and ignoring multiple comparisons), the p-value of a one-sided binomial test is not significant at the 0.05 level (7 of 8 subjects is barely significant). Of course, a more rigorous way to approach this question could be something like an ANOVA, and the authors use an ANOVA analysis of the medians in the paragraph following their use of Mood's median test. However, ANOVA assumes normality, and the authors state in the previous paragraph that they employed Mood's median test because "the distribution of the pRF sizes is zero-bounded and highly skewed" so this choice does not make sense. The Central Limits Theorem might be applied to the medians in theory, but with only 8 subjects and with an underlying distribution of pRF sizes that is non-negative, the relevant data will almost certainly not be normally distributed. These tests should probably be something like a Kruskal-Wallis ANOVA on ranks.

      All of the above said, my intuition about the data is currently that there are significant changes to the adapted pRF size in V2. I am not currently convinced that the effects in other visual areas are significant, and I suspect that the paper would be improved if authors abandoned their claims that areas other than V2 show a substantial effect. Importantly, I don't think this causes the paper to lose any impact-in fact, if the authors agree with my assessments, then the paper might be improved by focusing on V2. Specifically, the authors' already discuss psychophysical work related to the perception of texture on pages 18 and 19 and link it to their results. V2 is also implicated in the perception of texture (see, for example, Freeman et al., 2013; DOI:10.1038/nn.3402; Ziemba et al., 2016, DOI:10.1073/pnas.1510847113; Ziemba et al., 2019; DOI:10.1523/JNEUROSCI.1743-19.2019) and so would naturally be the part of the visual cortex where one might predict that spatial frequency adaptation would have a strong effect on pRF size. This neatly connects the psychophysical and imaging sides of this project and could make a very nice story out of the present work.

      (II) Visualizations

      The manuscript's visual evidence regarding the pRF data also remains fairly weak (but I found the pRF size comparisons in the OSF repository and Figure S1 to be better evidence-more in the next paragraph). The first line of the Results section still states, "A visual inspection on the pRF size maps in Figure 4c clearly shows a difference between the two conditions, which is evident in all regions." As I mentioned in my previous review, I don't agree with this claim (specifically, that it is clear). My impression when I look at these plots is of similarity between the maps, and, where there is dissimilarity, of likely artifacts. For example, the splotch of cortex near the upper vertical meridian (ventral boundary) of V1 that shows up in yellow in the upper plot but not the lower plot also has a weirdly high eccentricity and a polar angle near the opposite vertical meridian: almost certainly not the actual tuning of that patch of cortex. If this is the clearest example subject in the dataset, then the effect looks to me to be very small and inconsistently distributed across the visual areas. That said, I'm not convinced that the problem here is the data-rather, I think it's just very hard to communicate a small difference in parameter tuning across a visual area using this kind of side-by-side figure. I think that Figure S2, though noisy (as pRF maps typically are), is more convincing than Figure 4c, personally. For what it's worth, when looking at the data myself, I found that plotting log(𝜎(H) / 𝜎(L)), which will be unstable when noise causes 𝜎(H) or 𝜎(L) to approach zero, was less useful than plotting plotting (𝜎(H) - 𝜎(L)) / (𝜎(H) + 𝜎(L)). This latter quantity will be constrained between -1 and 1 and shows something like a proportional change in the pRF size (and thus should be more comparable across eccentricity).

      In my opinion, the inclusion of the pRF size comparison plots in the OSF repository and Figure S1 made a stronger case than any of the plots of the cortical surface. I would suggest putting these on log-log plots since the distribution of pRF size (like eccentricity) is approximately exponential on the cortical surface. As-is, it's clear in many plots that there is a big splotch of data in the compressed lower left corner, but it's hard to get a sense for how these should be compared to the upper right expanse of the plots. It is frequently hard to tell whether there is a greater concentration of points above or below the line of equality in the lower left corner as well, and this is fairly central to the paper's claims. My intuition is that the upper right is showing relatively little data (maybe 10%?), but these data are very emphasized by the current plots.
The authors might even want to consider putting a collection of these scatter-plots (or maybe just subject 007, or possible all subjects' pRFs on a single scatter-plot) in the main paper and using these visualizations to provide intuitive supporting for the main conclusions about the fMRI data (where the manuscript currently use Figure 4c for visual intuition).

      Minor Comments

      (1) Although eLife does not strictly require it, I would like to see more of the authors' code deposited along with the data (especially the code for calculating the statistics that were mentioned above). I do appreciate the simulation code that the authors added in the latest submission (largely added in response to my criticism in the previous reviews), and I'll admit that it helped me understand where the authors were coming from, but it also contains a bug and thus makes a good example of why I'd like to see more of the authors' code. If we set aside the scientific question of whether the simulation is representative of an fMRI voxel (more in Minor Comment 5, below), Figures 1A and the "AdaptaionEffectSimulated.png" file from the repository (https://osf.io/d5agf) imply that only small RFs were excluded in the high-adapted condition and only large RFs were excluded in the low-adapted condition. However, the script provided (SimlatePrfAdaptation.m: https://osf.io/u4d2h) does not do this. Lines 7 and 8 of the script set the small and large cutoffs at the 30th and 70th percentiles, respectively, then exclude everything greater than the 30th percentile in the "Large RFs adapted out" condition (lines 19-21) and exclude anything less than the 70th percentile in the "Small RFs adapted out" condition (lines 27-29). So the figures imply that they are representing 70% of the data but they are in fact representing only the most extreme 30% of the data. (Moreover, I was unable to run the script because it contains hard-coded paths to code in someone's home directory.) Just to be clear, these kinds of bugs are quite common in scientific code, and this bug was almost certainly an honest mistake.

      (2) I also noticed that the individual subject scatter-plots of high versus low adapted pRF sizes on the OSF seem to occasionally have a large concentration of values on the x=0 and y=0 axes. This isn't really a big deal in the plots, but the manuscript states that "we denoised the pRF data to remove artifactual vertices where at least one of the following criteria was met: (1) sigma values were equal to or less than zero ..." so I would encourage the authors to double-check that the rest of their analysis code was run with the stated filtering.

      (3) The manuscript also says that the median test was performed "on the raw pRF size values". I'm not really sure what the "raw" means here. Does this refer to pRF sizes without thresholding applied?

      (4) The eccentricity data are much clearer now with the additional comments from the authors and the full set of maps; my concerns about this point have been met.

      (5) Regarding the simulation of RFs in a voxel (setting aside the bug), I will admit both to hoping for a more biologically-grounded situation and to nonetheless understanding where the authors are coming from based on the provided example. What I mean by biologically-grounded: something like, assume a 2.5-mm isotropic voxel aligned to the surface of V1 at 4{degree sign} of eccentricity; the voxel would span X to Y degrees of eccentricity, and we predict Z neurons with RFs in this voxel with a distribution of RF sizes at that eccentricity from [reference], etc. eventually demonstrating a plausible pRF size change commensurate to the paper's measurements. I do think that a simulation like this would make the paper more compelling, but I'll acknowledge that it probably isn't necessary and might be beyond the scope here.

    3. Reviewer #3 (Public review):

      This is a well-designed study examining an important, surprisingly understudied question: how does adaptation affect spatial frequency processing in human visual cortex? Using a combination of psychophysics and neuroimaging, the authors test the hypothesis that spatial frequency tuning is shifted to higher or lower frequencies, depending on preadapted state (low or high s.f. adaptation). They do so by first validating the phenomenon psychophysically, showing that adapting to 0.5 cpd stimuli causes an increase perceived s.f., and 3.5 cpd causes a relative decrease in perceived s.f. Using the same stimuli, they then port these stimuli to a neuroimaging study, in which population receptive fields are measured under high and low spatial frequency adaptation states. They find that adaptation changes pRF size, depending on adaptation state: adapting to high s.f. led to broader overall pRF sizes across early visual cortex, whereas adapting to low s.f. led to smaller overall pRF sizes. Finally the authors carry out a control experiment to psychophysically rule out the possibility that the perceived contrast change w/ adaptation may have given rise to these imaging results (doesn't appear to be the case). All in all, I found this to be a good manuscript: the writing is taut, and the study is well designed.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We thank the reviewer for their careful evaluation and positive comments. 

      Adaptation paradigm

      “why is it necessary to use an *adaptation* paradigm to study the link between SF tuning and pRF estimation? Couldn't you just use pRF bar stimuli with varying SFs?” 

      We thank the reviewer for this question. First, by using adaptation we can infer the correspondence between the perceptual and the neuronal adaptation to spatial frequency. We couldn’t draw any inference about perception if we only varied the SF inside the bar. More importantly, while changing the SF inside the bar might help drive different neuronal populations, this is not guaranteed. As we touched on in our discussion, responses obtained from the mapping stimuli are dominated by complex processing rather than the stimulus properties alone. A considerable proportion of the retinotopic mapping signal is probably simply due to spatial attention to the bar (de Haas & Schwarzkopf, 2018; Hughes et al., 2019). So, adaptation is a more targeted way to manipulate different neuronal populations.

      Other pRF estimates: polar angle and eccentricity 

      We included an additional plot showing the polar angle for both adapter conditions (Figure S4), as well as participant-wise scatter plots comparing raw pRF size, eccentricity, and polar angle between two adapter conditions (available in shared data repository). In line with previous work on the reliability of pRF estimates (van Dijk, de Haas, Moutsiana, & Schwarzkopf, 2016; Senden, Reithler, Gijsen, & Goebel, 2014), both polar angle and eccentricity maps are very stable between the two adaptation conditions. 

      Variability in pRF size change

      As the reviewer pointed out, the pRF size changes show some variability across eccentricities, and ROIs (Figure 5A and 5B). It is likely that the variability could relate to the varying tuning properties of different regions and eccentricities for the specific SF we used in the mapping stimulus. So one reason V2 is most consistent could be that the stimulus is best matched for the tuning there. However, what factors contribute to this variability is an interesting question that will require further study. 

      Other recommendations

      We have addressed the other recommendations of the reviewer with one exception. The reviewer suggested we should comment on the perceived contrast decrease after SF adaptation (as seen in Figure 6B) in the main text. However, since we refer the readers to the supplementary analyses (Supplementary section S8) where we discuss this in detail, we chose to keep this aspect unchanged to avoid overcomplicating the main text.

      Reviewer #2 (Public Review):

      We thank the reviewer for their comments - we improved how we report key findings which we hope will clarify matters raised by the reviewer.

      RF positions in a voxel

      The reviewer’s comments suggest that they may have misunderstood the diagram (Figure 1A) illustrating the theoretical basis of the adaptation effect, likely due to us inadvertently putting the small RFs in the middle of the illustration. We changed this figure to avoid such confusion.

      Theoretical explanation of adaptation effect

      The reviewer’s explanation for how adaptation should affect the size of pRF averaging across individual RFs is incorrect. When selecting RFs from a fixed range of semi-uniformly distributed positions (as in an fMRI voxel), the average position of RFs (corresponding to pRF position) is naturally near the center of this range. The average size (corresponding to pRF size) reflects the visual field coverage of these individual RFs. This aggregate visual field coverage thus also reflects the individual sizes. When large RFs have been adapted out, this means the visual field coverage at the boundaries is sparser, and the aggregate pRF is therefore smaller. The opposite happens when adapting out the contribution of small RFs. We demonstrate this with a simple simulation at this OSF link: https://osf.io/ebnky/. The pRF size of the simulated voxels illustrate the adaptation effect should manifest precisely as we hypothesized.

      Figure S2

      It is not actually possible to compare R<sup>2</sup> between regions by looking at Figure S2 because it shows the pRF size change, not R<sup>2</sup>. Therefore, the arguments Reviewer #2 made based on their interpretation of the figure are not valid. Just as the reviewer expected, V1 is one of the brain regions with good pRF model fits. We included normalized and raw R<sup>2</sup> maps to make this more obvious to the readers.

      V1 appeared essentially empty in that plot primarily due to the sigma threshold we selected, which was unintentionally more conservative than those applied in our analyses and other figures. We apologize for this mistake. We corrected it in the revised version by including a plot with the appropriate sigma threshold.

      Thresholding details 

      Thresholding information was included in our original manuscript; however, we included more information in the figure captions to make it more obvious.

      2D plots replaced histograms

      We thank the reviewer for this suggestion. The original manuscript contained histograms showing the distribution of pRF size for both adaptation conditions for each participant and visual area (Figure S1). However, we agree that 2D plots better communicate the difference in pRF parameters between conditions. So we moved the histogram plots to the online repository, and included scatter plots with a color scheme revealing the 2D kernel density.

      We chose to implement 2D kernel density in scatter plots to display the distribution of individual pRF sizes transparently.

      (proportional) pRF size-change map 

      The reviewer requests pRF size difference maps. Figure S2 in fact demonstrates the proportional difference between the pRF sizes of the two adaptation conditions. Instead of simply taking the difference, we believe showing the proportional change map is more sensible because overall pRF size varies considerably between visual regions. We explained this more clearly in our revision. 

      pRF eccentricity plot 

      “I suspect that the difference in PRF size across voxels correlates very strongly with the difference in eccentricity across voxels.”

      Our original manuscript already contained a supplementary plot (Figure S4 B, now Figure S4 C) comparing the eccentricity between adapter conditions, showing no notable shift in eccentricities except in V3A - but that is a small region and the results are generally more variable. In addition, we included participant-wise plots in the online repository, presenting raw comparisons of pRF size, eccentricity, and polar angle estimates between adaptation conditions. These 2D plots provide further evidence that the SF adapters resulted in a change in pRF size, while eccentricity and polar angle estimates did not show consistent differences.  

      To the reviewer’s point, even if there were an appreciable shift in eccentricity between conditions (as they suggest may have happened for the example participant we showed), this does not mean that the pRF size effect is “due [...] to shifts in eccentricity.” Parameters in a complex multi-dimensional model like the pRF are not independent. There is no way of knowing whether a change in one parameter is causally linked with a change in another. We can only report the parameter estimates the model produces. 

      In fact, it is conceivable that adaptation causes both: changes in pRF size and eccentricity. If more central or peripheral RFs tend to have smaller or larger RFs, respectively, then adapting out one part of the distribution will shift the average accordingly. However, as we already established, we find no compelling evidence that pRF eccentricity changes dramatically due to adaptation, while pRF size does.

      Other recommendations

      We have addressed the other recommendations of the reviewer, except for the y-axis alignment. Different regions in the visual hierarchy naturally vary substantially in pRF size. Aligning axes would therefore lead to incorrect visual inferences that (1) the absolute pRF sizes between ROIs are comparable, and (2) higher regions show the effect most

      prominently. However, for clarity, we now note this scale difference in our figure captions. Finally, as mentioned earlier, we also present a proportional pRF size change map to enable comparison of the adaptation effect between regions.

      Reviewer #3 (Public Review):

      We thank the reviewer for their comments.

      pRF model

      Top-up adapters were not modelled in our analyses because they are shared events in all TRs, critically also including the “blank” periods, providing a constant source of signal. Therefore modelling them separately cannot meaningfully change the results. However, the reviewer makes a good suggestion that it would be useful to mention this in the manuscript, so we added a discussion of this point in Section 3.1.5.

      pRF size vs eccentricity

      We added a plot showing pRF size in the two adaptation conditions (in addition to the pRF size difference) as a function of eccentricity.

      Correlation with behavioral effect

      In the original manuscript, we pointed out why the correlation between the magnitude of the behavioral effect and the pRF size change is not an appropriate test for our data. First, the reviewer is right that a larger sample size would be needed to reliably detect such a between-subject correlation. More importantly, as per our recruitment criteria for the fMRI experiment, we did not scan participants showing weak perceptual effects. This limits the variability in the perceptual effect and makes correlation inapplicable.

    1. eLife Assessment

      This work presents potentially important findings suggesting that a combination of transcranial stimulation approaches applied for a short period could improve memory performance. However, the evidence supporting the conclusions is currently incomplete. In particular, the claims relating to the specific neural mechanisms and anatomical sites of action underlying effects were viewed as overstated in the current version. The results potentially have implications for non-invasive enhancement of cognitive functions.

    2. Review #1 (Public Review):

      Summary:

      The authors employ a combination of repetitive transcranial magnetic stimulation (intermittent theta burst-iTBS) and transcranial alternating current stimulation (gamma tACS) as an approach aimed to improve memory in a face/name/profession task.

      Strengths:

      The paper has many strengths. The approach of stimulating the human brain non-invasively is potentially impactful because it could lead to a host of interesting applications. The current study aims to evaluate one such exciting application. The paper contains an unusual combination of noninvasive stimulation and brain imaging data, and includes independent replication samples.

      Weaknesses:

      (1) It remains unclear how this stimulation protocol is proposed to enhance memory. Memories are believed to be stored by precise inputs to specific neurons and highly tuned changes in synaptic strengths. It remains unclear whether proposed neural activity generated by the stimulation reflects the activation of specific memories or generally increased activity across all classes of neurons.

      (2) The claim that effects directly involve the precuneus lacks strong support. The measurements shown in Figure 3 appear to be weak (i.e., Figure 3A top and bottom look similar, and Figure 3C left and right look similar). The figure appears to show a more global brain pattern rather than effects that are limited to the precuneus. Related to this, it would perhaps be useful to show the different positions of the stimulation apparatus. This could perhaps show that the position of the stimulation matters and could perhaps illustrate a range of distances over which position of the stimulation matters.

      (3) Behavioral results showing an effect on memory would substantiate claims that the stimulation approach produces significant changes in brain activity. However, placebo effects can be extremely powerful and useful, and this should probably be mentioned. Also, in the behavioral results that are currently presented, there are several concerns:

      a) There does not appear to be a significant effect on the STMB task.

      b) The FNAT task is minimally described in the supplementary material. Experimental details that would help the reader understand what was done are not described. Experimental details are missing for: the size of the images, the duration of the image presentation, the degree of image repetition, how long the participants studied the images, whether the names and occupations were different, genders of the faces, and whether the same participant saw different faces across the different stimulation conditions. Regarding the latter point, if the same participant saw the same faces across the different stimulation conditions, then there could be memory effects across different conditions that would need to be included in the statistical analyses. If participants saw different faces across the different stimulus conditions, then it would be useful to show that the difficulty was the same across the different stimuli.

      c) Also, if I understand FNAT correctly, the task is based on just 12 presentations, and each point in Figure 2A represents a different participant. How the performance of individual participants changed across the conditions is unclear with the information provided. Lines joining performance measurements across conditions for each participant would be useful in this regard. Because there are only 12 faces, the results are quantized in multiples of 100/12 % in Figure 3A. While I do not doubt that the authors did their homework in terms of the statistical analyses, it seems as though these 12 measurements do not correspond to a large effect size. For example, in Figure 3A for the immediate condition (total), it seems that, on average, the participants may remember one more face/name/occupation.

      d) Block effects. If I understand correctly, the experiments were conducted in blocks. This is potentially problematic. An example study that articulates potential problems associated with block designs is described in Li et al (TPAMI 2021, https://ieeexplore.ieee.org/document/9264220). It is unclear if potential problems associated with block designs were taken into consideration.

      e) In the FNAT portion of the paper, some results are statistically significant, while others are not. The interpretation of this is unclear. In Figure 3A, it seems as though the authors claim that iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham. The interpretation of such a result is unclear. Results are also unclear when separated by name and occupation. There is only one condition that is statistically significant in Figure 3A in the name condition, and no significant results in the occupation condition. In short, the statistical analyses, and accompanying results that support the authors’ claims, should be explained more clearly.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript "Dual transcranial electromagnetic stimulation of the precuneus-hippocampus network boosts human long-term memory" by Borghi and colleagues provides evidence that the combination of intermittent theta burst TMS stimulation and gamma transcranial alternating current stimulation (γtACS) targeting the precuneus increases long-term associative memory in healthy subjects compared to iTBS alone and sham conditions. Using a rich dataset of TMS-EEG and resting-state functional connectivity (rs-FC) maps and structural MRI data, the authors also provide evidence that dual stimulation increased gamma oscillations and functional connectivity between the precuneus and hippocampus. Enhanced memory performance was linked to increased gamma oscillatory activity and connectivity through white matter tracts.

      Strengths:

      The combination of personalized repetitive TMS (iTBS) and gamma tACS is a novel approach to targeting the precuneus, and thereby, connected memory-related regions to enhance long-term associative memory. The authors leverage an existing neural mechanism engaged in memory binding, theta-gamma coupling, by applying TMS at theta burst patterns and tACS at gamma frequencies to enhance gamma oscillations. The authors conducted a thorough study that suggests that simultaneous iTBS and gamma tACS could be a powerful approach for enhancing long-term associative memory. The paper was well-written, clear, and concise.

      Weaknesses:

      (1) The study did not include a condition where γtACS was applied alone. This was likely because a previous work indicated that a single 3-minute γtACS did not produce significant effects, but this limits the ability to isolate the specific contribution of γtACS in the context of this target and memory function

      (2) The authors applied stimulation for 3 minutes, which seems to be based on prior tACS protocols. It would be helpful to present some rationale for both the duration and timing relative to the learning phase of the memory task. Would you expect additional stimulation prior to recall to benefit long-term associative memory?

      (3) How was the burst frequency of theta iTBS and gamma frequency of tACS chosen? Were these also personalized to subjects' endogenous theta and gamma oscillations? If not, were increases in gamma oscillations specific to patients' endogenous gamma oscillation frequencies or the tACS frequency?

      (4) The authors do a thorough job of analyzing the increase in gamma oscillations in the precuneus through TMS-EEG; however, the authors may also analyze whether theta oscillations were also enhanced through this protocol due to the iTBS potentially targeting theta oscillations. This may also be more robust than gamma oscillations increases since gamma oscillations detected on the scalp are very low amplitude and susceptible to noise and may reflect activity from multiple overlapping sources, making precise localization difficult without advanced techniques.

      (5) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      (6) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

    4. Reviewer #3 (Public review):

      Summary:

      Borghi and colleagues present results from 4 experiments aimed at investigating the effects of dual γtACS and iTBS stimulation of the precuneus on behavioral and neural markers of memory formation. In their first experiment (n = 20), they found that a 3-minute offline (i.e., prior to task completion) stimulation that combines both techniques leads to superior memory recall performance in an associative memory task immediately after learning associations between pictures of faces, names, and occupation, as well as after a 15-minute delay, compared to iTBS alone (+ tACS sham) or no stimulation (sham for both iTBS and tACS). Performance in a second task probing short-term memory was unaffected by the stimulation condition. In a second experiment (n = 10), they show that these effects persist over 24 hours and up to a full week after initial stimulation. A third (n = 14) and fourth (n = 16) experiment were conducted to investigate the neural effects of the stimulation protocol. The authors report that, once again, only combined iTBS and γtACS increase gamma oscillatory activity and neural excitability (as measured by concurrent TMS-EEG) specific to the stimulated area at the precuneus compared to a control region, as well as precuneus-hippocampus functional connectivity (measured by resting-state MRI), which seemed to be associated with structural white matter integrity of the bilateral middle longitudinal fasciculus (measured by DTI).

      Strengths:

      Combining non-invasive brain stimulation techniques is a novel, potentially very powerful method to maximize the effects of these kinds of interventions that are usually well-tolerated and thus accepted by patients and healthy participants. It is also very impressive that the stimulation-induced improvements in memory performance resulted from a short (3 min) intervention protocol. If the effects reported here turn out to be as clinically meaningful and generalizable across populations as implied, this approach could represent a promising avenue for the treatment of impaired memory functions in many conditions.

      Methodologically, this study is expertly done! I don't see any serious issues with the technical setup in any of the experiments (with the only caveat that I am not an expert in fMRI functional connectivity measures and DTI). It is also very commendable that the authors conceptually replicated the behavioral effects of experiment 1 in experiment 2 and then conducted two additional experiments to probe the neural mechanisms associated with these effects. This certainly increases the value of the study and the confidence in the results considerably.

      The authors used a within-subject approach in their experiments, which increases statistical power and allows for stronger inferences about the tested effects. They are also used to individualize stimulation locations and intensities, which should further optimize the signal-to-noise ratio.

      Weaknesses:

      I want to state clearly that I think the strengths of this study far outweigh the concerns I have. I still list some points that I think should be clarified by the authors or taken into account by readers when interpreting the presented findings.

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. That being said, the authors do report significant effects, so they were per definition powered to find those. However, the effect sizes reported for their main findings are all relatively large and it is known that significant findings from small samples may represent inflated effect sizes, which may hamper the generalizability of the current results. Ideally, the authors would replicate their main findings in a larger sample. Alternatively, I think running a sensitivity analysis to estimate the smallest effect the authors could have detected with a power of 80% could be very informative for readers to contextualize the findings. At the very least, however, I think it would be necessary to address this point as a potential limitation in the discussion of the paper.

      It seems that the statistical analysis approach differed slightly between studies. In experiment 1, the authors followed up significant effects of their ANOVAs by Bonferroni-adjusted post-hoc tests whereas it seems that in experiment 2, those post-hoc tests where "exploratory", which may suggest those were uncorrected. In experiment 3, the authors use one-tailed t-tests to follow up their ANOVAs. Given some of the reported p-values, these choices suggest that some of the comparisons might have failed to reach significance if properly corrected. This is not a critical issue per se, as the important test in all these cases is the initial ANOVA but non-significant (corrected) post-hoc tests might be another indicator of an underpowered experiment. My assumptions here might be wrong, but even then, I would ask the authors to be more transparent about the reasons for their choices or provide additional justification. Finally, the authors sometimes report exact p-values whereas other times they simply say p < .05. I would ask them to be consistent and recommend using exact p-values for every result where p >= .001.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in sperate experiments, but it is still worth pointing out to readers that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      There were no stimulation-related performance differences in the short-term memory task used in experiments 1 and 2. The authors argue that this demonstrates that the intervention specifically targeted long-term associative memory formation. While this is certainly possible, the STM task was a spatial memory task, whereas the LTM task relied (primarily) on verbal material. It is thus also possible that the stimulation effects were specific to a stimulus domain instead of memory type. In other words, could it be possible that the stimulation might have affected STM performance if the task taxed verbal STM instead? This is of course impossible to know without an additional experiment, but the authors could mention this possibility when discussing their findings regarding the lack of change in the STM task.

      While the authors discuss the potential neural mechanisms by which the combined stimulation conditions might have helped memory formation, the psychological processes are somewhat neglected. For example, do the authors think the stimulation primarily improves the encoding of new information or does it also improve consolidation processes? Interestingly, the beneficial effect of dual iTBS and γtACS on recall performance was very stable across all time points tested in experiments 1 and 2, as was the performance in the other conditions. Do the authors have any explanation as to why there seems to be no further forgetting of information over time in either condition when even at immediate recall, accuracy is below 50%? Further, participants started learning the associations of the FNAT immediately after the stimulation protocol was administered. What would happen if learning started with a delay? In other words, do the authors think there is an ideal time window post-stimulation in which memory formation is enhanced? If so, this might limit the usability of this procedure in real-life applications.

    5. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) It remains unclear how this stimulation protocol is proposed to enhance memory. Memories are believed to be stored by precise inputs to specific neurons and highly tuned changes in synaptic strengths. It remains unclear whether proposed neural activity generated by the stimulation reflects the activation of specific memories or generally increased activity across all classes of neurons.

      Thank you for raising the important issue of the actual neurophysiological effects of non-invasive brain stimulation. Unfortunately, invasive neurophysiological recordings in humans for this type of study are not feasible due to ethical constraints, while studies on cadavers or rodents would not fully resolve our question. Indeed, the authors of the cited study (Mihály Vöröslakos et al., Nature Communications, 2018) highlight the impossibility of drawing definitive conclusions about the exact voltage required in the in-vivo human brain due to significant differences between rats and humans, as well as the in-vivo human brain and cadavers due to alterations in electrical conductivity that occur in postmortem tissue.

      We acknowledge that further exploration of this aspect would be highly valuable, and we agree that it is worth discussing both as a technical limitation and as a potential direction for future research, we therefore modify the manuscript correspondingly. However, to address the challenge of in vivo recordings, we conducted Experiments 3 and 4, which respectively examined the neurophysiological and connectivity changes induced by the stimulation in a non-invasive manner. The observed changes in brain oscillatory activity (increased gamma oscillatory activity), cortical excitability (enhanced posteromedial parietal cortex reactivity), and brain connectivity (strengthened connections between the precuneus and hippocampi) provided evidence of the effects of our non-invasive brain stimulation protocol, further supporting the behavioral data.

      Additionally, we carefully considered the issue of stimulation distribution and, in response, performed a biophysical modeling analysis and E-field calculation using the parameters employed in our study (see Supplementary Materials).

      (2) The claim that effects directly involve the precuneus lacks strong support. The measurements shown in Figure 3 appear to be weak (i.e., Figure 3A top and bottom look similar, and Figure 3C left and right look similar). The figure appears to show a more global brain pattern rather than effects that are limited to the precuneus. Related to this, it would perhaps be useful to show the different positions of the stimulation apparatus. This could perhaps show that the position of the stimulation matters and could perhaps illustrate a range of distances over which position of the stimulation matters.

      Thank you for your feedback. We will improve the clarity of the manuscript to better address this important aspect. Our assumption that the precuneus plays a key role in the observed effects is based on several factors:

      (1) The non-invasive stimulation protocol was applied to an individually identified precuneus for each participant. Given existing evidence on TMS propagation, we can reasonably assume that the precuneus was at least a mediator of the observed effects (Ridding & Rothwell, Nature Reviews Neuroscience 2007). For further details about target identification and TMS and tACS propagation, please refer to the MRI data acquisition section in the main text and Biophysical modeling and E-field calculation section in the supplementary materials.

      (2) To investigate the effects of the neuromodulation protocol on cortical responses, we conducted a whole-brain analysis using multiple paired t-tests comparing each data point between different experimental conditions. To minimize the type I error rate, data were permuted with the Monte Carlo approach and significant p-values were corrected with the false discovery rate method (see the Methods section for details). The results identified the posterior-medial parietal areas as the only regions showing significant differences across conditions.

      (3) To control for potential generalized effects, we included a control condition in which TMS-EEG recordings were performed over the left parietal cortex (adjacent to the precuneus). This condition did not yield any significant results, reinforcing the cortical specificity of the observed effects.

      However, as stated in the Discussion, we do not claim that precuneus activity alone accounts for the observed effects. As shown in Experiment 4, stimulation led to connectivity changes between the precuneus and hippocampus, a network widely recognized as a key contributor to long-term memory formation (Bliss & Collingridge, Nature 1993). These connectivity changes suggest that precuneus stimulation triggered a ripple effect extending beyond the stimulation site, engaging the broader precuneus-hippocampus network.

      Regarding Figure 3A, it represents the overall expression of oscillatory activity detected by TMS-EEG. Since each frequency band has a different optimal scaling, the figure reflects a graphical compromise. A more detailed representation of the significant results is provided in Figure 3B. The effect sizes for gamma oscillatory activity in the delta T1 and T2 conditions were 0.52 and 0.50, respectively, which correspond to a medium effect based on Cohen’s d interpretation.

      (3) Behavioral results showing an effect on memory would substantiate claims that the stimulation approach produces significant changes in brain activity. However, placebo effects can be extremely powerful and useful, and this should probably be mentioned. Also, in the behavioral results that are currently presented, there are several concerns:

      a) There does not appear to be a significant effect on the STMB task.

      b) The FNAT task is minimally described in the supplementary material. Experimental details that would help the reader understand what was done are not described. Experimental details are missing for: the size of the images, the duration of the image presentation, the degree of image repetition, how long the participants studied the images, whether the names and occupations were different, genders of the faces, and whether the same participant saw different faces across the different stimulation conditions. Regarding the latter point, if the same participant saw the same faces across the different stimulation conditions, then there could be memory effects across different conditions that would need to be included in the statistical analyses. If participants saw different faces across the different stimulus conditions, then it would be useful to show that the difficulty was the same across the different stimuli.

      We thank you for signaling the lack in the description of FNAT task. We will add all the information required to the manuscript.

      In the meantime, here we provide the answers to your questions. The size of the images 19x15cm. They were presented in the learning phase and the immediate recall for 8 seconds each, while in the delayed recall they were shown (after the face recognition phase) until the subject answered. The learning phase, where name and occupation were shown together with the faces, lasted around 2 minutes comprising the instructions. We used a different set of stimuli for each stimulation condition, for a total of 3 parallel task forms balanced across the condition and order of sessions. All the parallel forms were composed of 6 male and 6 female faces, for each sex there were 2 young adults (aged around 30 years old), 2 middle adults (aged around 50 years old), and 2 old adults (aged around 70 years old). Before the experiments, we ran a pilot study to ensure there were no differences between the parallel forms of the task. We can provide the task with its parallel form upon request. The chance level in the immediate and delayed recall is not quantifiable since the participants had to freely recall the name and the occupation without a multiple choice. In the recognition, the chance level was around 33% (since the possible answers were 3).

      c) Also, if I understand FNAT correctly, the task is based on just 12 presentations, and each point in Figure 2A represents a different participant. How the performance of individual participants changed across the conditions is unclear with the information provided. Lines joining performance measurements across conditions for each participant would be useful in this regard. Because there are only 12 faces, the results are quantized in multiples of 100/12 % in Figure 3A. While I do not doubt that the authors did their homework in terms of the statistical analyses, it seems as though these 12 measurements do not correspond to a large effect size. For example, in Figure 3A for the immediate condition (total), it seems that, on average, the participants may remember one more face/name/occupation.

      We will add another graph to the manuscript with lines connecting each participant's performance. Unfortunately, we were not able to incorporate it in the box-and-whisker plot.

      We apologize for the lack of clarity in the description of the FNAT. As you correctly pointed out, we used the percentage based on the single association between face, name and occupation (12 in total). However, each association consisted of three items, resulting in a total of 36 items to learn and associate – we will make it more explicit in the manuscript.

      In the example you mentioned, participants were, on average, able to recall three more items compared to the other conditions. While this difference may not seem striking at first glance, it is important to consider that we assessed memory performance after a single, three-minute stimulation session. Similar effects are typically observed only after multiple stimulation sessions (Koch et al., NeuroImage, 2018; Grover et al., Nature Neuroscience, 2022).

      d) Block effects. If I understand correctly, the experiments were conducted in blocks. This is potentially problematic. An example study that articulates potential problems associated with block designs is described in Li et al (TPAMI 2021, https://ieeexplore.ieee.org/document/9264220). It is unclear if potential problems associated with block designs were taken into consideration.

      Thank you for the interesting reference. According to this paper, in a block design, EEG or fMRI recordings are performed in response to different stimuli of a given class presented in succession. If this is the case, it does not correspond to our experimental design where both TMS-EEG and fMRI were conducted in a resting state on different days according to the different stimulation conditions.

      e) In the FNAT portion of the paper, some results are statistically significant, while others are not. The interpretation of this is unclear. In Figure 3A, it seems as though the authors claim that iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham. The interpretation of such a result is unclear. Results are also unclear when separated by name and occupation. There is only one condition that is statistically significant in Figure 3A in the name condition, and no significant results in the occupation condition. In short, the statistical analyses, and accompanying results that support the authors’ claims, should be explained more clearly.

      Thank you again for your feedback. We will work on making the large amount of data we reported easier to interpret.

      Hoping to have thoroughly addressed your initial concerns in our previous responses, we now move on to your observations regarding the behavioral results, assuming you were referring to Figure 2A. The main finding of this study is the improvement in long-term memory performance, specifically the ability to correctly recall the association between face, name, and occupation (total FNAT), which was significantly enhanced in both Experiments 1 and 2. However, we also aimed to explore the individual contributions of name and occupation separately to gain a deeper understanding of the results. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall. We understand that this may have caused some confusion. Therefore we will clarify this in the manuscript and consider presenting the name and occupation in a separate plot.

      Regarding the stimulation conditions, your concerns about the performance pattern (iTBS+gtACS > iTBS+sham-tACS, but iTBS+gtACS ~ sham+sham) are understandable. However, this new protocol was developed precisely in response to the variability observed in behavioral outcomes following non-invasive brain stimulation, particularly when used to modulate memory functions (Corp et al., 2020; Pabst et al., 2022). As discussed in the manuscript, it is intended as a boost to conventional non-invasive brain stimulation protocols, leveraging the mechanisms outlined in the Discussion section.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The study did not include a condition where γtACS was applied alone. This was likely because a previous work indicated that a single 3-minute γtACS did not produce significant effects, but this limits the ability to isolate the specific contribution of γtACS in the context of this target and memory function

      Thank you for your comments. As you pointed out, we did not include a condition where γtACS was applied alone. This decision was based on the findings of Guerra et al. (Brain Stimulation 2018), who investigated the same protocol and reported no aftereffects. Given the substantial burden of the experimental design on patients and our primary goal of demonstrating an enhancement of effects compared to the standalone iTBS protocol, we decided to leave out this condition. However, we agree that investigating the effects of γtACS alone is an interesting and relevant aspect worthy of further exploration. In line with these observations, we will expand the discussion on this point in the study’s limitations section.

      (2) The authors applied stimulation for 3 minutes, which seems to be based on prior tACS protocols. It would be helpful to present some rationale for both the duration and timing relative to the learning phase of the memory task. Would you expect additional stimulation prior to recall to benefit long-term associative memory?

      Thank you for your comment and for raising this interesting point. As you correctly noted, the protocol we used has a duration of three minutes, a choice based on previous studies demonstrating its greater efficacy with respect to single stimulation from a neurophysiological point of view. Specifically, these studies have shown that the combined stimulation enhanced gamma-band oscillations and increased cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) are all associated with encoding processes, we decided to apply the co-stimulation immediately before it to enhance the efficacy.

      Regarding the question of whether stimulation could also benefit recall, the answer is yes. We can speculate that repeating the stimulation before recall might provide an additional boost. This is supported by evidence showing that both the precuneus and gamma oscillations are involved in recall processes (Flanagin et al., Cerebral Cortex 2023; Griffiths et al., Trends in Neurosciences 2023). Furthermore, previous research suggests that reinstating the same brain state as during encoding can enhance recall performance (Javadi et al., The Journal of Neuroscience 2017).

      We will expand the study rationale and include these considerations in the future directions section.

      (3) How was the burst frequency of theta iTBS and gamma frequency of tACS chosen? Were these also personalized to subjects' endogenous theta and gamma oscillations? If not, were increases in gamma oscillations specific to patients' endogenous gamma oscillation frequencies or the tACS frequency?

      The stimulation protocol was chosen based on previous studies (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Gamma tACS sinusoid frequency wave was set at 70 Hz while iTBS consisted of ten bursts of three pulses at 50 Hz lasting 2 s, repeated every 10 s with an 8 s pause between consecutive trains, for a total of 600 pulses total lasting 190 s (see iTBS+γtACS neuromodulation protocol section). In particular, the theta iTBS has been inspired by protocols used in animal models to elicit LTP in the hippocampus (Huang et al., Neuron 2005). Consequently, neither Theta iTBS nor the gamma frequency of tACS were personalized. The increase in gamma oscillations was referred to the patient’s baseline and did not correspond to the administrated tACS frequency.

      (4) The authors do a thorough job of analyzing the increase in gamma oscillations in the precuneus through TMS-EEG; however, the authors may also analyze whether theta oscillations were also enhanced through this protocol due to the iTBS potentially targeting theta oscillations. This may also be more robust than gamma oscillations increases since gamma oscillations detected on the scalp are very low amplitude and susceptible to noise and may reflect activity from multiple overlapping sources, making precise localization difficult without advanced techniques.

      Thank you for the suggestion. We analyzed theta oscillations finding no changes.

      (5) Figure 4: Why are connectivity values pre-stimulation for the iTBS and sham tACS stimulation condition so much higher than the dual stimulation? We would expect baseline values to be more similar.

      We acknowledge that the pre-stimulation connectivity values for the iTBS and sham tACS conditions appear higher than those for the dual stimulation condition. However, as noted in our statistical analyses, there were no significant differences at baseline between conditions (p-FDR= 0.3514), suggesting that any apparent discrepancy is due to natural variability rather than systematic bias. One potential explanation for these differences is individual variability in baseline connectivity measures, which can fluctuate due to factors such as intrinsic neural dynamics, participant state, or measurement noise. Despite these variations, our statistical approach ensures that any observed post-stimulation effects are not confounded by pre-existing differences.

      (6) Figure 2: How are total association scores significantly different between stimulation conditions, but individual name and occupation associations are not? Further clarification of how the total FNAT score is calculated would be helpful.

      We apologize for any lack of clarity. The total FNAT score reflects the ability to correctly recall all the information associated with a person—specifically, the correct pairing of the face, name, and occupation. Participants received one point for each triplet they accurately recalled. The scores were then converted into percentages, as detailed in the Face-Name Associative Task Construction and Scoring section in the supplementary materials.

      Total FNAT was the primary outcome measure. However, we also analyzed name and occupation recall separately to better understand their individual contributions. Our analysis revealed that the improvement in total FNAT was primarily driven by an increase in name recall rather than occupation recall.

      We acknowledge that this distinction may have caused some confusion. To improve clarity, we will revise the manuscript accordingly and consider presenting name and occupation recall in separate plots.

      Reviewer #3 (Public review):

      Weaknesses:

      I want to state clearly that I think the strengths of this study far outweigh the concerns I have. I still list some points that I think should be clarified by the authors or taken into account by readers when interpreting the presented findings.

      I think one of the major weaknesses of this study is the overall low sample size in all of the experiments (between n = 10 and n = 20). This is, as I mentioned when discussing the strengths of the study, partly mitigated by the within-subject design and individualized stimulation parameters. The authors mention that they performed a power analysis but this analysis seemed to be based on electrophysiological readouts similar to those obtained in experiment 3. It is thus unclear whether the other experiments were sufficiently powered to reliably detect the behavioral effects of interest. That being said, the authors do report significant effects, so they were per definition powered to find those. However, the effect sizes reported for their main findings are all relatively large and it is known that significant findings from small samples may represent inflated effect sizes, which may hamper the generalizability of the current results. Ideally, the authors would replicate their main findings in a larger sample. Alternatively, I think running a sensitivity analysis to estimate the smallest effect the authors could have detected with a power of 80% could be very informative for readers to contextualize the findings. At the very least, however, I think it would be necessary to address this point as a potential limitation in the discussion of the paper.

      Thank you for the observation. As you mentioned, our power analysis was based on our previous study investigating the same neuromodulation protocol with a corresponding experimental design. The relatively small sample could be considered a possible limitation of the study which we will add to the discussion. A fundamental future step will be to replay these results on a larger population, however, to strengthen our results we performed the sensitivity analysis you suggested.

      In detail, we performed a sensitivity analysis for repeated-measures ANOVA with α=0.05 and power(1-β)=0.80 with no sphericity correction. For experiment 1, a sensitivity analysis with 1 group and 3 measurements showed a minimal detectable effect size of f=0.524 with 20 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η2\=0.274 corresponding to f=0.614; the ANOVA on FNAT delayed performance revealed an effect size of η2 =0.236 corresponding to f=0.556. For experiment 2, a sensitivity analysis for total FNAT immediate performance (1 group and 3 measurements) showed a minimal detectable effect size of f=0.797 with 10 participants. In our paper, the ANOVA on total FNAT immediate performance revealed an effect size of η2 =0.448 corresponding to f=0.901. The sensitivity analysis for total FNAT delayed performance (1 group and 6 measurements) showed a minimal detectable effect size of f=0.378 with 10 participants. In our paper, the ANOVA on total FNAT delayed performance revealed an effect size of η2 =0.484 corresponding to f=0.968. Thus, the sensitivity analysis showed that both experiments were powered enough to detect the minimum effect size computed in the power analysis. We have now added this information to the manuscript and we thank the reviewer for her/his suggestion.

      It seems that the statistical analysis approach differed slightly between studies. In experiment 1, the authors followed up significant effects of their ANOVAs by Bonferroni-adjusted post-hoc tests whereas it seems that in experiment 2, those post-hoc tests where "exploratory", which may suggest those were uncorrected. In experiment 3, the authors use one-tailed t-tests to follow up their ANOVAs. Given some of the reported p-values, these choices suggest that some of the comparisons might have failed to reach significance if properly corrected. This is not a critical issue per se, as the important test in all these cases is the initial ANOVA but non-significant (corrected) post-hoc tests might be another indicator of an underpowered experiment. My assumptions here might be wrong, but even then, I would ask the authors to be more transparent about the reasons for their choices or provide additional justification. Finally, the authors sometimes report exact p-values whereas other times they simply say p < .05. I would ask them to be consistent and recommend using exact p-values for every result where p >= .001.

      Thank you again for the suggestions. Your observations are correct, we used a slightly different statistical depending on our hypothesis. Here are the details:

      In experiment 1, we used a repeated-measure ANOVA with one factor “stimulation condition” (iTBS+γtACS; iTBS+sham-tACS; sham-iTBS+sham-tACS). Following the significant effect of this factor we performed post-hoc analysis with Bonferroni correction.

      In experiment 2, we used a repeated-measures with two factors “stimulation condition” and “time”. As expected, we observed a significant effect of condition, confirming the result of experiment 1, but not of time. Thus, this means that the neuromodulatory effect was present regardless of the time point. However, to explore whether the effects of stimulation condition were present in each time point we performed some explorative t-tests with no correction for multiple comparisons since this was just an explorative analysis.

      In experiment 3, we used the same approach as experiment 1. However, since we had a specific hypothesis on the direction of the effect already observed in our previous study, i.e. increase in spectral power (Maiella et al., Scientific Report 2022), our tests were 1-tailed.

      For the p-values, we will correct the manuscript reporting the exact values for every result.

      While the authors went to great lengths trying to probe the neural changes likely associated with the memory improvement after stimulation, it is impossible from their data to causally relate the findings from experiments 3 and 4 to the behavioral effects in experiments 1 and 2. This is acknowledged by the authors and there are good methodological reasons for why TMS-EEG and fMRI had to be collected in sperate experiments, but it is still worth pointing out to readers that this limits inferences about how exactly dual iTBS and γtACS of the precuneus modulate learning and memory.

      Thank you for your comment. We fully agree with your observation, which is why this aspect has been considered in the study's limitations. To address your concern, we will further emphasize the fact that our findings do not allow precise inferences regarding the specific mechanisms by which dual iTBS and γtACS of the precuneus modulate learning and memory.

      There were no stimulation-related performance differences in the short-term memory task used in experiments 1 and 2. The authors argue that this demonstrates that the intervention specifically targeted long-term associative memory formation. While this is certainly possible, the STM task was a spatial memory task, whereas the LTM task relied (primarily) on verbal material. It is thus also possible that the stimulation effects were specific to a stimulus domain instead of memory type. In other words, could it be possible that the stimulation might have affected STM performance if the task taxed verbal STM instead? This is of course impossible to know without an additional experiment, but the authors could mention this possibility when discussing their findings regarding the lack of change in the STM task.

      Thank you for your insightful observation. We argue that the intervention primarily targeted long-term associative memory formation, as our findings demonstrated effects only on FNAT. However, as you correctly pointed out, we cannot exclude the possibility that the stimulation may also influence short-term verbal associative memory. We will acknowledge this potential effect when discussing the absence of significant findings in the STM task.

      While the authors discuss the potential neural mechanisms by which the combined stimulation conditions might have helped memory formation, the psychological processes are somewhat neglected. For example, do the authors think the stimulation primarily improves the encoding of new information or does it also improve consolidation processes? Interestingly, the beneficial effect of dual iTBS and γtACS on recall performance was very stable across all time points tested in experiments 1 and 2, as was the performance in the other conditions. Do the authors have any explanation as to why there seems to be no further forgetting of information over time in either condition when even at immediate recall, accuracy is below 50%? Further, participants started learning the associations of the FNAT immediately after the stimulation protocol was administered. What would happen if learning started with a delay? In other words, do the authors think there is an ideal time window post-stimulation in which memory formation is enhanced? If so, this might limit the usability of this procedure in real-life applications.

      Thank you for your comment and for raising these important points.

      We hypothesized that co-stimulation would enhance encoding processes. Previous studies have shown that co-stimulation can enhance gamma-band oscillations and increase cortical plasticity (Guerra et al., Brain Stimulation 2018; Maiella et al., Scientific Reports 2022). Given that the precuneus (Brodt et al., Science 2018; Schott et al., Human Brain Mapping 2018), gamma oscillations (Osipova et al., Journal of Neuroscience 2006; Deprés et al., Neurobiology of Aging 2017; Griffiths et al., Trends in Neurosciences 2023), and cortical plasticity (Brodt et al., Science 2018) have all been associated with encoding processes, we decided to apply co-stimulation before the encoding phase, to boost it.

      We applied the co-stimulation immediately before the learning phase to maximize its potential effects. While we observed a significant increase in gamma oscillatory activity lasting up to 20 minutes, we cannot determine whether the behavioral effects we observed would have been the same with a co-stimulation applied 20 minutes before learning. Based on existing literature, a reduction in the efficacy of co-stimulation over time could be expected (Huang et al., Neuron 2005; Thut et al., Brain Topography 2009). However, we hypothesize that multiple stimulation sessions might provide an additional boost, helping to sustain the effects over time (Thut et al., Brain Topography 2009; Koch et al., Neuroimage 2018; Koch et al., Brain 2022).

      Regarding the absence of further forgetting in both stimulation conditions, we think that the clinical and demographical characteristics of the sample (i.e. young and healthy subjects) explain the almost absence of forgetting after one week.

    1. eLife Assessment

      This useful study employs optogenetics, genetically-encoded dopamine and serotonin sensors, and patch-clamp electrophysiology to investigate modulations of neurotransmitter release between striatal dopamine and serotonin neurons - a topic of interest to neuroscientists studying the basal ganglia. The results suggest that the dopamine and serotonin systems operate largely in parallel, with the activation of serotonin neurons resulting in a small, transient dopamine release. The authors suggest that this interaction occurs via glutamate release in the ventral tegmental area, findings that are closely related to previous work. Some conclusions are incomplete requiring larger samples-sizes and controls.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Liu et al use optogenetics and genetically encoded neuromodulator sensors to test the extent to which dopamine neuron stimulation produces striatal serotonin release, and vice versa. The study is timely given growing interest in dopamine/serotonin interactions and in the context of recent work showing bidirectional and dynamic regulation of striatal dopamine by another neuromodulator, acetylcholine. The authors find that striatal dopamine and serotonin afferents function largely independently, with dopamine neuron stimulation producing no striatal serotonin release and serotonin neuron stimulation producing minimal striatal dopamine release. This work will inform future work seeking to dissect the contributions of striatal dopamine, serotonin, and their interactions to various motivated behaviors. While the paper's main conclusions are adequately supported (see Strengths), additional controls and experiments would significantly broaden the paper's impact (see Weaknesses). Finally, this draft of the work is poorly presented with numerous errors, omissions, and inconsistencies evident throughout the text and the figures that should be addressed.

      Strengths:

      The study employs optogenetic stimulation simultaneously with fiber photometry recording of dopamine or serotonin release measured with genetically encoded sensors. These methods are state-of-the-art, offering tighter temporal control compared to pharmacological methods for manipulating dopamine and serotonin and improved selectivity over techniques like electrochemistry and microdialysis used to record neuromodulator release in previous studies on the subject. As a result, the paper's main conclusions are well supported.

      Weaknesses:

      (1) The electrophysiology experiments in Figure 3 are only tangentially related to the focus of the study, and their findings are almost entirely irrelevant to the paper's main conclusions. The results of these experiments are also not novel. Glutamate corelease from 5HT neurons has been previously shown, including in the OFC and VTA (Ren et al, 2018, Cell, McDevitt et al, 2014, Cell Rep, Liu et al 2014, Neuron; and others). The authors should explain more clearly what they think these data add to the manuscript and/or consider removing them altogether.

      (2) Related to the point above, as far as I can tell, the only value the electrophysiology data add is to suggest that perhaps activation of serotonin neurons may drive minimal striatal dopamine release via glutamate corelease in the VTA. The evidence provided in this version of the manuscript is insufficient to support that claim, but the manuscript would be significantly strengthened if the authors tested this hypothesis more directly. One way to do that could be to stimulate serotonin axons in the striatum (as opposed to the serotonin cell bodies) and record striatal dopamine release. A complementary anatomical approach would be to use retrograde tracing to test whether the DR 5HT neurons projecting to the striatum are the same or different from the VTA projecting population.

      (3) The findings would be strengthened by the addition of a fluorophore-only control group lacking opsin expression in all experiments in Figures 1 and 2.

      (4) The experiment of stimulating serotonin neurons and recording serotonin release in the NAc was not performed. It would be useful to be able to compare the magnitudes of evoked serotonin release in these two striatal regions, though it is not central to the main claims of the paper.

      (5) The interpretation of the results from Figure 2 is described inconsistently throughout the manuscript. The title implies there is significant crosstalk between the dopamine and serotonin systems. The abstract calls the crosstalk "transient", which is a description of its temporal dynamics, not its magnitude. Then the introduction figures and discussion all suggest the crosstalk is minimal. I suggest the authors describe the main findings - minimal crosstalk between the dopamine and serotonin systems - clearly and consistently in the title, abstract, and main text.

    3. Reviewer #2 (Public review):

      Summary:

      This brief communication aims to clarify interactions between the dopamine (DA) and serotonin (5HT) systems of mice. The authors use optogenetic stimulation of DA neurons in the VTA or of 5HT neurons in the DRN, while monitoring the fluorescence of DA and 5HT sensors in the nucleus accumbens (NAc) and dorsal striatum (DS) using fiber photometry. The authors report on a small release of DA in the NAc following DRN stimulation, which they attribute to glutamate co-release onto VTA DA neurons using slice electrophysiology. The authors also report on cocaine-induced 5-HT release in the striatum.

      Strengths:

      This is a topic well worth studying.

      Weaknesses:

      In its current form, this is an incomplete and underpowered study that does little to clarify the complicated relationship that exists between DA and 5HT in the mammalian brain under physiological conditions or during cocaine use.

    4. Reviewer #3 (Public review):

      The authors suggest that the small release of DA may be due to a release of glutamate from DRN 5-HT neurons to the VTA that stimulates weakly and in a transient fashion the VTA DA neurons, which in the end, produce a transient and small release of DA in the NAc.

      Their findings give more information on the previously reported complex and partial known crosstalk between 5-HT and DA in the NAc.

      I only have some minor concerns about the manuscript:

      (1) In Figure 2F, there is a missing curve for 5-HT in NAc. Besides, the legend shows n=2, making it difficult to perform statistical analysis with that data.

      (2) In Figure 3, the use of NBQX/AP5 is shown, but it is not mentioned either in the methodology or in the discussion. What is the meaning of those results?

      (3) Line 98 compares results from two different places of stimulation. The results are related to stimulation in the VTA, but the comparison indicates that the stimulation was made in the DRN.

      (4) If the release of 5-HT in Nac does not occur, it needs to be precise in the abstract that 5-HT is released in the dorsal striatum (DS) but not in the NAc (line 19).

      (5) Be consistent with the way you mention the 5-HT neurons. For example, in lines from 106 to 119, SERT neurons are used. Previously, 5-HT neurons were used.

      (6) There are several points of confusion when referring to the figures, making the text difficult to follow because the text explains something that is not shown in the figure cited.

    5. Author response:

      We appreciate the reviewers’ insightful feedback and propose to undertake an extensive revision of the manuscript to strengthen our findings and underscore the significance of this work. We remain convinced that our study offers critical insights into the largely independent dopamine and serotonin neural circuits. Nevertheless, we concur that substantial revisions are warranted, as the current organization may not be ideal to showcase the central findings. In particular, we will increase the number of animals to address data variability and enhance the reproducibility of the observed effects. We also recognize the need to perform additional control experiments and to include complementary anatomical tracing studies. Moreover, we will reformat the manuscript and conduct additional analyses to emphasize that evoked dopamine and serotonin release originate from distinct loci with minimal crosstalk. To address all of these points thoroughly, we estimate that a 12-month revision period will be required.

    1. eLife Assessment

      This valuable paper introduces the Dyadic Interaction Platform, an experimental setup that enables researchers to study real-time social interactions between two participants in a controlled environment while maintaining direct face-to-face visibility. The evidence supporting the platform's effectiveness is convincing, with demonstrations of distinct experimental paradigms showing how transparency and continuous access to partners' actions can influence strategic coordination, decision-making, and learning. The work will be of broad interest to researchers studying social cognition across humans and non-human primates, providing a versatile tool that bridges the gap between naturalistic social interactions and controlled laboratory experiments.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors aim to address significant limitations of existing experimental paradigms used to study dyadic social interactions by introducing a novel experimental setup - the Dyadic Interaction Platform (DIP). The DIP uniquely allows participants to interact dynamically, face-to-face, with simultaneous access to both social cues and task-related stimuli. The authors demonstrate the versatility and utility of this platform across several exemplary scenarios, notably highlighting cases of significant behavioral differences in conditions involving direct visibility of a partner.

      Major strengths include comprehensive descriptions of previous paradigms, detailed explanations of the DIP's technical features, and clear illustrations of multimodal data integration. These elements greatly enhance the reproducibility of the methods and clarify the potential applications across various research domains and species. Particularly compelling is the authors' demonstration of behavioral impacts related to transparency in interactions, as evidenced by the macaque-human experiments using the Bach-or-Stravinsky game scenario.

      Strengths:

      The DIP represents a methodological advance in the study of social cognition. Its transparent, touch-sensitive display elegantly solves the problem of enabling participants to attend to both their social partner and task stimuli simultaneously without requiring attention switching. This paper marks a notable step forward toward more options for naturalistic yet still lab-based studies of social decision-making, an area where the field is actively moving, especially given recent research highlighting significant differences in neural activity depending upon the context in which an action is performed. The DIP offers researchers a valuable tool to bridge the gap between tightly controlled laboratory paradigms and the dynamic, bidirectional nature of real-world social interactions.

      The authors do well to provide comprehensive documentation of the technical specifications for the four different implementations of the platform, allowing other researchers to adapt and build upon their work. The detailed information about hardware configurations demonstrates careful attention to practical implementation details. They also highlight numerous options for integration with other tools and software, further demonstrating the versatility of this apparatus and the variety of research questions to which it could be applied.

      The historical review of dyadic experimental paradigms is thorough and effectively positions the DIP as addressing a critical gap in existing methodologies. The authors convincingly argue that studying continuous, dynamic social interactions is essential for understanding real-world social cognition, and that existing paradigms often force unnatural attention-splitting or turn-taking behaviors that don't reflect naturalistic interaction patterns.

      The four example applications showcase the DIP's versatility across diverse research questions. The Bach-or-Stravinsky economic game example is particularly compelling, demonstrating how continuous access to partners' actions substantially changes coordination strategies in non-human primates. This highlights a key strength of the DIP, which is that it removes a level of abstraction that can make tasks more difficult for non-human primates to learn. By being able to see their partner and actions directly, rather than having to understand that a cursor on a screen represents a partner, the platform makes the task more accessible to non-human primates and possibly children as well. This opens up important avenues for enhanced cross-species investigations of cognition, allowing researchers to study social dynamics in a setting that remains naturalistic yet controlled across different populations.

      Weaknesses:

      Some of the experimental applications would benefit from stronger evidence demonstrating the unique advantages of the transparent setup. For instance, in the dyadic foraging example, it's not entirely clear how participants' behavior differs from what might be observed when simply tracking each other's cursor movements in a non-transparent setup. More evidence showing how direct visibility of the partner, beyond simply being able to track the position of the partner's cursor, influences behavior would strengthen this example. Similarly, in the continuous perceptual report (CPR) task, the subjects could perform this task and see feedback from their partners' actions without having to see their partner through the transparent screen. Evidence showing that 1) subjects do indeed look at their partner during the task and 2) viewing their partner influences their performance on the task would significantly strengthen the claim that the ability to view the partner brings in a new dimension to this task. These additions would better demonstrate the specific value added by the transparent nature of the DIP beyond what could be achieved with standard cursor-tracking paradigms.

      A significant limitation that is inadequately addressed relates to neural investigations. While the authors position the platform's ability to merge attention to social stimuli and task stimuli as a key advantage, they don't sufficiently acknowledge the challenges this creates for dissociating neural signals attributed to social cues versus task-based stimuli. More traditional lab-based experiments intentionally separate components like task-stimulus perception, social perception, and decision-making periods so that researchers can isolate the neural signals associated with each process. This deliberate separation, which the authors frame as a weakness, actually serves an important functional purpose in neural investigations. The paper would be strengthened by explicitly discussing this limitation and offering potential approaches to address it in experimental design or data analysis. For instance, the authors could suggest methodological innovations or analytical techniques that might help disentangle the overlapping neural signals that would inevitably arise from the integrated presentation of social and task stimuli in the DIP setup.

      Furthermore, the authors' suggestion to arrange task stimuli around the periphery of the screen to maintain a clear middle area for viewing the partner appears to contradict their own critique of traditional paradigms. This recommended arrangement would seemingly reintroduce the very problem of attentional switching between task stimuli and social partners that the authors identified as a limitation of previous approaches. The paper would be strengthened by discussing the potential trade-offs associated with their suggested stimulus arrangement. Additionally, offering potential approaches to address these limitations in experimental design or data analysis would enhance the paper's contribution to the field.

    3. Reviewer #2 (Public review):

      Summary:

      This work proposes a new platform to study social cognition in a more naturalistic setting. The authors give an overview of previous work that extends from static unidirectional paradigms (i.e., subject is presented with social stimuli such as still images or faces), to more dynamic unidirectional paradigms (i.e., the subject is presented with movies, or another individual's behavior) to dyadic interactions in a laboratory setting or in real life (i.e., interacting with a real person). Overall, this literature demonstrates that findings from realistic social situations can differ dramatically from unidirectional laboratory settings. Moreover, current and previous work are put in the perspective of an experimental framework that has tightly controlled experimental set-ups and low ecological validity on one end, and high ecological validity, naturalistic, without any experimental constraints on the other end, and all that is in between. The authors frame previous work along a spectrum, ranging from highly controlled, low-ecological-validity experiments to naturalistic, unconstrained approaches with high ecological validity, situating their current work within this continuum. They focus on a specific sub-domain of social interactions, i.e., goal-directed contexts in which interactions are purposeful for solving joint tasks or obtaining rewards. This new dyadic interaction platform claims to embed tight experimental control in a naturalistic face-to-face social interaction with the goal of investigating social information processing in bidirectional, dynamic social interactions.

      Strengths:

      The proposed dyadic interaction platform (DIP) is highly flexible, accommodating diverse visual displays, interactive components, and recording devices, making it suitable for various experiments.

      The manuscript does a good job of highlighting the strengths and weaknesses of the various display options. This clarity allows readers to easily assess which display best suits their specific experimental setup and objectives.

      One of the platform's key strengths is its versatility, allowing the same experimental setup to be used across multiple species and developmental stages, and enabling NHPs and humans to be studied as subjects within the same paradigm. Highlighting this capability could further underscore the platform's broad applicability.

      Weaknesses:

      The manuscript emphasizes the importance of ecological validity alongside tight experimental control, a significant challenge in naturalistic neuroscience. While the platform achieves tight control, the ecological validity of such a set-up remains questionable and warrants further testing and validation. For example, while the platform is designed to be more naturalistic in principle, its application to NHPs is still complex and may be comparably constrained as traditional NHP research. To realize its full potential for animal studies, the platform should be combined with complementary methodologies - such as wireless electrophysiology and freely moving paradigms - to truly achieve a balance between ecological validity and experimental control. Further validation in this direction could significantly enhance its utility.

      The manuscript is somewhat lengthy and occasionally reads more like a review paper, which slightly shifts the focus away from the primary emphasis on the innovative technological advancement and the considerable effort invested in optimizing this new platform. Streamlining the presentation to more directly highlight these key contributions could enhance clarity and impact.

      Overall, there is compelling evidence supporting the feasibility and value of DIP for investigating specific types of social interactions, particularly in contexts where individuals share a workspace and have full transparency regarding their opponent's actions. While I believe that DIP has the potential to significantly impact the field, which is supported by preliminary data, its broader applicability remains an open question. This platform aligns well with recent initiatives aimed at enhancing ecological validity in neuroscience research across both human and animal models. To maximize its impact, it would be beneficial to more explicitly situate this work within that broader movement, emphasizing its relevance and potential to advance ecologically valid approaches in the field.