2,382 Matching Annotations
  1. Oct 2022
    1. Author Response

      Reviewer #1 (Public Review):

      Overview

      In this work, the authors set to study the effects of topographic connectivity in a hierarchical model of neural networks. They hypothesize that the topographic connectivity, often observed in cortical networks, is essential for signal propagation and allows faithful transmission of signals. To study the effects of topographic connectivity on the dynamics, the authors consider a network composed of several layers. Each layer is a recurrent neural network with excitatory and inhibitory sub-populations. The excitatory neurons in each layer enervate a sub-population of the following layer. The receiving excitatory sub-population targets a specific group in the next layer and so on. This procedure leads to separate channels that carry the inputs through the network. The authors study how the degree of specificity in each targeted projection, called ’modularity,’ affects signal propagation through the network.

      The authors find that the network reduces noise above a critical level of network modularity: the deep layers show a clear separation of an active channel and inactive channels, despite the noisy input signal. They study how different dynamical and structural properties affect the signal propagation through the network layers and suggest that the dynamics can implement a winnertakes-all computation.

      We thank the reviewer for the concise summary of our work.

      Strengths and novelty

      Topographic projections, in which sub-populations of neurons target specific cells in efferent populations, are common in the central nervous system. The dynamic and computation benefits of this organization are not fully understood. With their simple model, the authors were able to quantify the amount of topographic structure and selectivity in the network and study its impact on the network’s steady-state. In particular, a bifurcation point suggests a qualitative difference between networks with and without sufficient topographic modularity. The theoretical analysis in the paper is rigorous, and the mean-field study shows good agreement with computer simulations of the model.

      We thank the reviewer for acknowledging the rigor of our work both in terms of theory and simulations.

      The authors describe simulation results of networks with different dynamical properties, including rate-based networks, integrate-and-fire neurons, and more realistic conductancebased spiking neurons. All simulations exhibit similar qualitative behavior, supporting the conclusion that the behavior due to structural modularity will carry to more complex and biologically relevant neural dynamics.

      Overall, the authors convince that the topographic structure of the network can lead to noise reduction, given that the input to the network is provided as distinct channels.

      Weaknesses

      The authors support their hypothesis and show a relation between topographic connection and noise reduction in their model. However, I find the study limited and struggle to see the impact it will have on the field. The paper is purely theoretical; it does not provide any physiological evidence that supports the conclusion. On the other hand, and this is the key issue, I do not find real theoretical insights in this work. In the following, I elaborate on why I hold this opinion.

      We understand the reviewer’s point and therefore significantly extended our theoretical results and their conclusions in the revised manuscript (see below). We are confident that the revised manuscript provides the theoretical insights that the reviewer was asking for.

      The hypothesis is that topographic projections in cortical areas allow faithful signal propagation. However, as the authors point out, reliable transmission can be achieved in other ways, such as by direct routing of information (lines 17-19). Furthermore, denoising can be accomplished by a simple feedforward network (e.g., ref 38) without E/I balance and with plasticity rules that do not require topographic connectivity. Thus, I find the computational model not well motivated.

      The reviewer mentions an important point that has not been sufficiently addressed in the previous version, namely the distinguishing feature of our model. Direct routing is indeed a simple way to transmit signals, but without the possibility of denoising them. The reviewer is also right that the denoising solution in the work by Kadmon and Sompolinsky (ref 38) does not require any topographic connectivity. However, their model does not constrain feedforward connections between layers in any way. In particular, neurons can excite and inhibit other neurons (i.e., ignoring Dale’s law) in downstream layers so that feedforward input covers a much wider range, thereby extending the activity range of the target neurons and generating fixed points more easily. In the biologically more plausible setting that we study (excitatory and inhibitory populations, excitatory background input and excitatory feedforward connectivity), we find that recurrent inhibition is crucial to compensate the excitation from previous layers and the external input. Only if the recurrent inhibition is sufficiently strong does the topographic organization of feedforward connections enable denoising. This is addressed in a new section ”Critical modularity for denoising” of the revised manuscript, where we also study the case of no recurrent connectivity and excitatory recurrent connectivity (for further details, see answers below). We further extended our discussion on other forms of signal transmission and denoising (see lines 489-498).

      The task studied here is a simple classification of static inputs: the efferent readout needs to identify the active channel. Again, this could be achieved by a single layer of simple binary neurons [Babadi and Sompolinsky 2014]. The recurrent connectivity and E/I balance suggest that dynamics should play an essential part in the model. However, the task is not well suited for understanding the role of dynamics.

      We appreciate the reviewer’s comments and completely agree. The simple classification task we explored can certainly be performed by simpler network architectures, such as the one studied in Babadi and Sompolinsky. However, as discussed above, this only works if the feedforward connectivity is unconstrained. In the case of Babadi and Sompolinsky, there is an expansion of inputs into a higher dimensional space through random connectivity drawn from a centered Gaussian distribution and appropriately chosen readout weights. This scenario is not compatible with the well-established biological constraints mentioned above that our model takes into account. In the new section ”Critical modularity for denoising” of the revised manuscript we show that recurrent inhibition is necessary to enable signal transmission and denoising under these constraints. The inhibition thereby not only generates competition between input channels but it also allows the modules to track their input very rapidly (as originally demonstrated by van Vreeswijk and Sompolinsky in 1996). To demonstrate this point and emphasize the relevance of dynamics, we added a new signal reconstruction task in the new section ”Reconstruction and denoising of dynamic inputs”, where we show that our model can faithfully track and denoise spatially encoded time-varying inputs.

      The authors perform a mean-field study to explain how modularity affects signal propagation. At the heart of their argument is that the E/I network exhibit bistability. However, bistability can be achieved by an excitatory population with a threshold [Renart et al., 2013]. The role of the inhibitory population does not seem crucial for the task and questions the motivations for this analysis.

      We thank the reviewer for raising this important point which we address in the section ”Critical modularity for denoising” of the revised manuscript. The reviewer is correct that bistability can be obtained in a purely excitatory network, and the modular topographic connectivity in our work essentially renders the stimulated pathway excitatory. The important feature of our model, however, is that the non-stimulated pathways remain inhibitory to get a distinction between stimulated and non-stimulated populations and the denoising feature. This is only achieved by recurrent inhibition that causes competition between pathways. Our analyses show that, for networks without recurrent connections or even excitatory recurrent connections, the network lacks mechanisms to compensate the excitatory feedforward and external background input. In these cases, all populations show high (and synchronous) activity and no classification and denoising can be achieved. Therefore, the revised manuscript unambiguously demonstrates the critical role of recurrent inhibition.

      Active and inactive channels are decided by the two stable states of the network: the high and the low activity regimes. However, noise fluctuations and their propagation through the network may have a prominent role in the overall dynamics. I find that noise fluctuation analysis is bluntly missing in this work.

      Fig. 7b of the previous version showed the stability of theoretically predicted fixed points using numerical fluctuation analysis around the fixed points. We apologize for not having made this sufficiently clear, and have therefore updated the caption of Fig. 7 to emphasize this point and extended the subsection ”Fixed point analysis” of the Methods detailing our approach. Furthermore, we fully agree with the reviewer that fluctuation analyses are important to understand the dynamics of our system. Therefore, we performed a theoretical fluctuation analysis in the new Figure 8 and the extended Appendix B of the revised version. This extended theory shows that competition induced by recurrent inhibition stabilizes the low activity state of non-stimulated sub-populations such that fluctuations cannot build up and propagate across layers, in line with the previously presented numerical simulation results.

      The main finding is a critical level of modularity, m= 0.83, above which the network shows denoising properties of silencing inactive channels and increasing the mean activity of active ones. However, the critical modularity is numerically demonstrated and is not derived theoretically. For a theoretical insight into this transition between denoising and mixing properties of the network, I would have liked to see a more rigorous discussion on the critical value. What does the critical point depend on? The authors show that the single-neuron dynamics do not affect the critical value, but what about other structural elements such as the relative efficacies of the E/I and the feedforward connectivity matrices? Do the authors suggest that m=0.83 is a universal number? I expect a more detailed analysis and discussion of this core issue in a theoretical paper.

      We fully agree with the reviewer and are grateful that this point was brought up. The initial submission did not provide a sufficent or deep enough discussion on which features determine the critical modularity and it certainly is important to do so. We also apologize that our presentation was misleading and suggested a universal number for the critical modularity. Unfortunately, there is no closed form expression for the critical modularity for the non-linear activation functions shown in the previous version. We therefore added a new analysis with a fully tractable piecewise linear activation function that allows us to derive a closed-form solution for the critical modularity. The new section ”Critical modularity for denoising” and Appendix B show the results of this analysis and discuss the various parameters that affect the value of the critical modularity. In short, the reviewer was completely right that the critical modularity depends on a number of connectivity parameters as well as single-neuron properties. In particular, our theoretical results show that recurrent inhibition is crucial for denoising.

      To conclude my main criticism, I believe that a theoretical paper should offer a more in-depth analysis and discussion of the core ideas presented and not rely mainly on simulations. For example, to provide theoretical insight, the authors should address central questions such as the origin of the critical modularity, the role of the recurrent balance connectivity, and how the network can facilitate computations other than winner-takes-all among channels. Alternatively, if the authors aim to describe a neural dynamics model without deep theoretical insights, I would expect to see physiological evidence supporting the suggested dynamics.

      We are very grateful for the reviewer’s criticism and believe the manuscript has substantially improved as a consequence. We are confident that our revised manuscript, by addressing these issues and extending the theoretical insights, now provides a much more thorough and comprehensive understanding.

      Conclusions

      The model studied by the authors is novel and provides a valuable way of exploring the effects of modularity and topographic connectivity on signal propagation through hierarchical recurrent neural networks. However, the study lacks theoretical insights into cortical circuit functions in its current version. I believe that for this work to impact the field, it needs to show further analysis and not rely on a numerical study of the model with limited theoretical derivations.

      Reviewer #2 (Public Review):

      This manuscript puts forward a new idea that topography in neural networks helps to remove noise from inputs. The neural network consists of multiple stages. At each stage, the network is structured to be balanced in terms of the strength of inhibitory and excitatory signals. Because of topography, the networks become ”dis-balanced” and receive more recurrent excitatory signals locally for those regions that receive strong initial inputs. This leads to error correction. The main weakness in the manuscript is that the approach will only work for inputs that are constant-in-time. It is important to acknowledge this limitation in both the title and throughout the manuscript.

      We thank the reviewer for the concise summary of our work and for acknowledging its novelty. Given the importance of the issue raised by the reviewer regarding the nature of the input signals, in the revised manuscript we added a new section ”Reconstruction and denoising of dynamic inputs” in which we investigate more complex, time-varying inputs and demonstrate that the model, due to the balance between excitation and inhibition, is able to quickly follow, process and denoise the external inputs. There are of course limits to the signal frequencies which can be successfully denoised, which we discuss in the Supplementary Materials (see Figure 10 - supplement 1) and elaborate on in the Discussion, but these are roughly within the ranges found in Human psychophysics studies.

    1. Author Response

      Reviewer #1 (Public Review):

      In the article "Neuroendocrinology of the lung revealed by single cell RNA sequencing", Kuo et. al. described various aspects of pulmonary neuroendocrine cells (PNECs) including the scRNA-seq profile of one human lung carcinoid sample. Overall, although this manuscript does not have any specific storyline, it is informative and would be an asset for researchers exploring various new roles of PNECs.

      Thank you for appreciating the significance of the data presented. Our storyline focuses on the newly uncovered molecular diversity of PNECs and the extraordinary repertoire of peptidergic signals they express and cell types these signals can directly target in (and outside) the lung, in mice and human, and in health and disease (human carcinoid tumor).

      Major comments:

      The major concern about the work is most results are preliminary, and at a descriptive level, conclusions or sub-conclusions are derived from scRNA-seq analysis only, lacking in-depth functional analysis and validation in other methods or systems. There are many open-end results that have been predicted by the authors based on their scRNA-seq data analysis without functional validation. In order to give them a constructive roadmap, it would be better to investigate literature and put them in a potential or probable hypothesis by citing the available literature. This should be done in each section of the result part. The paper lacks a main theme or specific biology question to address. In addition, the description about the human lung carcinoid by scRNA-seq is somehow disconnected from the main study line. Also, these results are derived from the study on only one single patient, lacking statistical power.

      We agree that much of the data and analysis presented in the paper is descriptive and hypothesis-generating for PNECs, however we do not consider it preliminary. We focused on validating two key conclusions from the scRNA-seq analysis: PNECs are extraordinarily diverse molecularly (as validated by multiplex in situ hybridization and immunostaining) and they express many different combinations of peptidergic signals (and appear to package them in separate vesicles). From the lung expression profiles of the cognate receptors, we also predicted the direct lung targets of the dozens of new PNEC peptidergic signals we uncovered, and validated the cell target (PSN4, a recently identified subtype of pulmonary sensory neuron) of one of the newly identified PNEC signals (the classic hormone angiotensin) by confirming expression of the cognate receptor gene in PSN4 neurons that innervate PNECs and showing that the hormone can directly activate PSN4 neurons. The characterized human carcinoid provided evidence that during tumorigenesis, the amplified PNECs retain a memory (albeit imperfect) of the molecular subtype of PNEC from which they originated. As suggested by the Reviewer, we have provided more background in Results by adding additional citations from the literature to clarify the rationale for each analysis and what was known prior to the analysis. We feel that our paper provides a broad foundation for exploring the diversity and signaling functions of PNECs, and although each molecular type of PNEC and new PNEC peptidergic signal we uncovered and potential target cell in (and outside) the lung warrants follow up (as do the sensory and other properties of PNECs we inferred from their expression profiles), such studies will require the effort of many individuals in many labs studying both normal and disease physiology in mouse and human, and exploiting the data, hypotheses, approaches, and framework we provide.

      Reviewer #2 (Public Review):

      Pulmonary neuroendocrine cells (PNECs) are known to monitor oxygen levels in the airway and can serve as stem cells that repair the lung epithelium after injury. Due to their rarity, however, their functions are still poorly understood. To identify potential sensory functions of PNECs, the authors have used single-cell RNA-sequencing (scRNA-seq) to profile hundreds of mouse and human PNECs. They report that PNECs express over 40 distinct peptidergic genes, and over 150 distinct combinations of these genes can be detected. Receptors for these neuropeptides and peptide hormones are expressed in a wide range of lung cell types, suggesting that PNECs may have mechanical, thermal, acid, and oxygen sensory roles, among others. However, since some of these cognate receptors are not expressed in the lung, PNECs may also have systemic endocrine functions. Although these data are largely descriptive, the results represent a significant resource for understanding the potential roles of PNECs in normal biology as well as in pulmonary diseases and cancer and are likely to be relevant for understanding neuroendocrine cells in other tissue contexts.

      However, there are several aspects of the data analysis that are unclear and require clarification, most notably the definition of a neuroendocrine cell (points #1 and #2 below).

      1) Figure S1 shows the sorting strategy used for isolation of putative PNECs from Ascl1CreER/+; Rosa26ZsGreen/+ mice, and distinguishes neuroendocrine cells defined as ZsGreen+ EpCAM+ and "neural" cells defined as ZsGreen+ EpCAM-; the figure legend also refers to the ZsGreen+ EpCAM- cells as "control" cells. However, the table shown in panel D indicates that the NE population combines 112 ZsGreen+ EpCAM+ cells together with 64 ZsGreen+ EpCAM- cells to generate the 176 cells used for subsequent analyses. Why are these ZsGreen+ EpCAM- cells initially labeled as neural or control, but are then defined as neuroendocrine? If these do not express an epithelial marker, can they be rigorously considered as neuroendocrine?

      As explained above in the response to Essential Revision point 1, we define pulmonary neuroendocrine cells (PNECs) throughout the paper by their transcriptomic clustering and signatures, which includes the dozens of newly identified PNEC markers as well as the few extant marker genes available before this study (listed in Table S2). The confusion here arises from the two previously known markers (Ascl1 lineage marker ZsGreen, EpCAM) we used for flow sorting to enrich for these rare cells for transcriptomic profiling (Fig. S1). Although most of the cells with PNEC transcriptomic profiles were from the ZsGreenhi EpCAMhi sorted population (as expected), some were from the ZsGreenhi EpCAMlo sorted population. The latter resulted from the high EpCAM gating threshold we used during flow sorting, which excluded some PNECs with intermediate levels of surface EpCAM. Indeed, nearly all PNECs (> 95%) expressed EpCAM by scRNAseq, and there was no difference in EpCAM transcript levels or transcriptomic clustering of PNECs that were from the ZsGreenhi EpCAMhi vs. ZsGreenhi EpCAMlo sorted populations, as we now show in the new panels (C', C'') added to Fig S1C. This point is now clarified in the legend to Fig. S1C, and it nicely demonstrates that transcriptomic profiling is a more robust method of identifying PNECs than flow sorting based on two classical markers.

      2) Similarly, in the human scRNA-seq analysis, how were PNECs defined? The methods description states that these cells were identified by their expression of CALCA and ASCL1, but does not indicate whether they also expressed epithelial markers.

      Human PNECs were identified in the single cell transcriptomic analysis by the same strategy described above for mouse PNECs: by their transcriptomic clustering and signatures, which includes the dozens of newly identified PNEC markers as well as the few extant marker genes available before this study (listed in Table S2). In addition to expression of classic and new markers, the human PNEC cluster defined by scRNA-seq indeed showed the expected expressed of epithelial markers (e.g, EPCAM, see dotplot below), like other epithelial cells.

      3) The presentation of sensitivity and specificity in Figure 1 is confusing and potentially misleading. According to Figure 1B, Psck1 and Nov are two of the top-ranked differentially expressed genes in PNECs with respect to both sensitivity and specificity. However, the specificity of these two genes appears to be lower than that of Scg5, Chgb, and several other genes, as suggested in Figure 1C and Figure S1E. In contrast, Chgb appears to have higher specificity and sensitivity than Psck1 in Figures 1C and E but is not shown in the list of markers in Figure 1B.

      As explained above in the response to Essential Revision point 2, because different marker features are important for different applications, we have provided several different graphical formats (Figs. 1B,C, Fig. S1E) and a table (Table S1) to aid in selection of the optimal markers for each application. Fig. 1B shows the most sensitive and specific PNEC markers identified by ratio of the natural logs of the average expression of the marker in PNECs vs. non-PNEC epithelial cells (Table S1), and we have added a two-dimensional plot of this sensitivity and specificity for a large set of PNEC markers (new panel E of Fig. S1). The violin plots in Fig. 1C allow visual comparison of expression of selected markers across PNECs and 40 other lung cell types including non-epithelial cells (from our extensive mouse lung atlas in Travaglini, Nabhan et al, Nature 2020). Pcsk1 and Nov score high in the analysis of Fig. 1B because they are highly sensitive and specific markers within the pulmonary epithelium, and they are also valuable markers because they are highly expressed in PNECs. However, they appear slightly less specific in the violon plots of Fig. 1C (Pcsk1) and Fig. S1F (Nov) because of expression (though at much lower levels) in individual lung cell types outside the epithelium: Pcsk1 is expressed also at low levels in some Alox5+ lymphocytes, and Nov is expressed at low levels in some smooth muscle cells. Chgb is a new PNEC marker that did not make the cutoff for the list in Fig. 1B because it is expressed in a slightly higher percentage of non-PNEC epithelial cells than the markers shown, which ranked slightly above it by this metric (see Table S1).

      4) The expression of serotonin biosynthetic genes in mouse versus human PNECs deserves some comment. The authors fail to detect the expression of Tph1 and Tph2 in any of the mouse PNECs analyzed, but TPH1 is expressed in 76% of the human PNECs (Table S8). Is it possible that Tph1 and Tph2 are not detected in the mouse scRNA-seq data due to gene drop-out? If serotonin signaling by mouse PNECs is due to protein reuptake, as implied on p. 5, is there a discrepancy between serotonin expression as detected by smFISH versus immunostaining?

      It is always possible that the failure to detect expression of Tph1 and Tph2 in the mouse scRNA-seq dataset is due to technical dropout, however when we analyzed this in our other mouse PNEC scRNA-seq dataset obtained using a microfluidic platform and also deeply-sequenced (Ouadah et al, Cell 2019), we found similar values as in the previously analyzed dataset: no Tph2 expression was detected and only 3% (3 of 92) of PNECs had detected Tph1 expression, whereas 24% (22 of 92) had detected expression of serotonin re-uptake transporter Slc6a4. Because our mouse and human scRNA-seq datasets were prepared similarly and sequenced to a similar depth (105 to 106 reads/cell), the difference observed in Tph1/TPH1 expression between mouse (0-3% PNECs) and human (76% PNECs) is more likely a true biological difference. We also analyzed serotonin levels in mouse PNECs by immunohistochemistry (not shown) and detected serotonin in nearly all (~90%) embryonic PNECs but only ~10% of adult PNECs. Systematic follow up studies will be necessary to resolve the mechanism of serotonin biogenesis and uptake in PNECs, and the potential stage and species-specific differences in these processes suggested by this initial data.

      5) The smFISH and immunostaining analyses are often presented without any indication of the number of independent replicate samples analyzed (e.g., Figure 2B, Figure 3F, G).

      The number of samples analyzed have been added (the values for Fig. 2B are given in legend to Fig. 2C, the quantification of Fig. 2B).

      6) It would be helpful to provide a statistical analysis of the similarities and differences shown in the graphs in Figures 1E and G.

      We added a statistical analysis (Fisher's exact test, two-sided) of Fig. 1E comparing expression of each examined gene in the two scRNA-seq datasets (Table S4). We added a similar statistical analysis of Fig. 1G comparing the expression values of each examined gene by scRNA-seq vs smFISH (see Fig. 1G legend).

    1. Author Response

      Reviewer #2 (Public Review):

      SIGNIFICANCE: Movement is based on the coordinated activation and deactivation of muscle groups that depend on the timing and strength of firing of the motoneurons connected to them. Motoneuron recruitment ultimately depends on the activity of local interneurons. By difference to other CNS regions, the interneurons in the spinal cord controlling motor output display a very high diversity in genetics, anatomy, localization, and electrophysiological properties. Making sense of the interneuronal circuits that modulate motor output to the different muscles of the body has revealed to be quite complex. One technique proposed over 10 years ago is the use of retrograde transsynaptic-monosynaptic tracing with modified rabies virus injected in single muscles to define premotor connections to individual motor pools controlling single muscles. Using this technique, the original authors suggested that interneurons controlling flexors and extensors occupied different locations in the spinal cord. This idea was an extension of pioneering work from the Jessell lab at Columbia University demonstrating that positional identity determined input connectivity of motoneurons, at least from Ia afferents. This principle, if extended to premotor spinal interneurons would simplify mechanisms by which extensor and flexor interneuron networks could be connected and controlled. In this paper, the authors combine data from four independent groups to show this principle might not be correct. In other words, interneurons connected to individual motor pools are highly intermingled in the spinal cord. This raises the bar for understanding both the intrinsic organization principles of interneuron microcircuits in the spinal cord (if any) and how they develop their specific connectivity.

      STRENGTHS AND WEAKNESSES: The authors propose that the conflicting conclusions occur because technical differences. The technique is based on complementation of rabies virus glycoprotein (G) in specific targeted motoneurons infected with a glycoprotein deficient rabies virus (RVdG). The way G and RVdG are delivered to specific motoneurons controlling one muscle differ. Originally this was accomplished by co-injecting RVdG and an AAV-G vectors in the same muscle simultaneously. However as previously published by a different group and now confirmed by the authors, this approach also infects muscle sensory afferents capable of transynaptically labeling populations of interneurons in the spinal cord anterogradely. This results in the labeling of mixed interneuron populations through their output to specific motor pools and/or their input from primary afferents of the same muscle. To avoid this problem the authors used transgenic approaches to induce expression of G in all motoneurons (not sensory neurons) and obtain muscle specifity by injecting RVdG in single muscles. Unfortunately, there is no single gene that selects only motoneurons for transgenic expression and tools for intersectional approaches were not available. Therefore, G is unavoidably expressed in some interneurons, in addition to motoneurons. These interneurons could be an additional source of transsynaptic jumps if they receive the RVdG from the motoneurons, raising the possibility that some labeling is the result of disynaptic, not monosynaptic, connections. The authors try to control for this possibility by comparing two different cre lines to direct G expression in motoneurons and each with different types of additional interneurons targeted. The results in both lines are similar raising confidence in the main conclusions. Moreover, the authors indicate that some motoneurons outside the intended pools are also labeled because motoneuron-to-motoneuron connections. In other words, the starter neurons for tracing monosynaptic connections are not as homogeneous or specific to a single motor pool as desired. This is acknowledged as a current limitation and is addressed in the discussion by proposing possible alternative approaches. Despite this weakness, the main conclusion of the study remains strong.

      A second technical issue raised by the authors is that of possible leakage during injection in the muscle. To reduce this possibility the authors reduced the volume injected compared to previous studies from 5 to 1 microliter and checked post-hoc the injection site for possible leakage (these are neonatal pups with muscles volumes of 2-3 microliters or less). In addition, they make a nice comparison injecting different titers of RVdG to demonstrate that variations in the number of infected motoneurons of one or two orders of magnitude does not alter the main conclusion on the topographic positioning of the interneurons connected to different motor pools. One weakness is that the exact numbers of motoneurons that start the tracing is impossible to evaluate and this prevents accurate comparisons across experiments. This is because cell death induced by the rabies virus is to be expected and only a variable subset of surviving neurons can be identified. Currently, this is an unavoidable characteristic of the technique. Nevertheless, there is a nice correlation between titer, surviving motoneuron numbers and interneurons labeled in number and location. The large number of replicates and their consistency further raises confidence in the authors claim of high specificity and replicability during injection, despite variable numbers of recovered motoneurons. The authors conclude that it is very important to check for the number and localization of starter motoneurons to confirm specificity after the injections. This reviewer totally agrees and is surprised this was not done in the experiment in which they try to replicate previous experiments by co-injecting in muscle AAV-G and RVdG.

      We agree with the reviewer that ideally the starter cells should have been identified in all the experiments. However, data were collected independently, at very different times in each of the labs involved, with different initial aims and there was no prior agreement on the details of injection and post-processing. The realization that we had similar experiments, performed with different techniques, led us to pool our observation together in order to give a picture of the distribution of premotor interneurons, the leitmotif of this paper, and a great effort has been devoted to ensure that all the cell counting procedures were uniform across labs. The lack of initial coordination is the reason why in some datasets the starter cells have not been quantified. Furthermore, in the previous version we had wrongly indicated that motor neurons analysed at Glasgow University were identified by ChAT expression. We have corrected this in the current version, since for those experiments motor neurons were only identified by location within any of the motor nuclei and size (diameter greater than 30 µm). On the other hand, since we have started comparing results, we have agreed on a uniform way of analysing and representing the data using the same normalization criteria. Therefore, while we cannot compare quantities like the ratio of secondary and primary infected cells for all the experiments (but we do it for the subset in which this is possible, see new Figure 4-Figure supplement 3 and comment number 3 below), the positional analysis has been done following the exact same criteria.

      One final problem with interpretation is that, for yet unknown reasons, the technique is dependent on the age of the animal and cannot be implemented in mature animals. Therefore, the connectivity revealed here is the one present during the first few days of life in the mouse. This is a period of significant synaptogenesis and synaptic selection and de-selection. The authors are encouraged to discuss further this limitation when interpreting interneuron connectivity in adult from studies in neonates.

      A very important point, see detailed answer to comment number 10 below.

      In summary, the authors have introduced new technical variations to trace premotor interneurons and challenge a major idea in the field, that is that interneuron connected to flexors and extensors occupy different positions in the spinal cord. The technique has still some weaknesses. 1) possibility of disynaptic jumps, 2) accurate identification of starter neurons, 3) restriction to neonates. However, the authors strengthen their conclusions considering alternatives and introducing a large number of controls (two cre lines, different titers, large number of animals analyzed, large numbers and consistency of replicates, independent counting in different labs... etc). This is an important and very useful study that suggests topographic localization is not a major organizing principle for interneuron connections with motor pools. It remains to be investigated then what are the organizational mechanisms that couple interneurons to functional distinct motor pools.

      The weaknesses summarized in the paragraph above are addressed in detail below in the answers to the specific comments.

      Reviewer #3 (Public Review):

      The manuscript by Ronzano et al presents a rigorous neuroanatomical study to convincingly demonstrate that there is no difference in the medio-lateral organization of flexor and extensor premotor interneurons. The study uses monosynaptic restricted transsynaptic tracing from ankle flexor and extensor muscles with several (4) strategies for delivery of the G protein complement and delta G Rabies virus, and additional (2) variations that consider titer and cre line. The authors went to great lengths here in attempt to replicate prior studies for which they had initial conflicting findings. Further, the experiments are performed in laboratories in four different locations. The variations on the Rabies and complement delivery, regardless of lab performing the experiment and analysis, all converge on the same conclusion. Aside from the primary conclusion, the paper can be used as a manual for anyone considering transsynaptic tracing as it details the benefits and caveats of each strategy with examples.

      The initial conflicting results put the onus on the authors to demonstrate where the divergence occurred. The authors took a highly comprehensive approach, which is a clear strength of the paper. All of the data is fully and transparently presented. Standardizations and differences between experiments run or analyzed in each lab are well laid out. Figure 1 and Table 2 provide a great summary of the techniques and their limitations. These are also well thought out and discussed within each section of results.

      The only thing missing is a likely explanation for the differences seen. Although the authors made several attempts to provide such explanation, the question remains - how did two groups who published independent studies using different strategies demonstrate flexor and extensor separation in the dorsal horn, when this study, using several strategies in multiple labs, show that the premotor neurons are in complete overlap? Additional small differences in methodologies could be identified which are not discussed and may provide potential explanations, but only for discrepancies in results of single techniques, not for all of the strategies used. The lack of reason for the discrepancy with prior studies despite the extensive efforts is unsatisfying, but, most importantly, the experiments were rigorously performed and the data support the conclusions presented.

      We thank the reviewer for the positive comments and we share the opinion that the discrepancy is unsatisfying. While we propose possible explanations, despite the extensive efforts, we could not provide a definite answer, but we hope that making our work public and all the data available, will trigger even more efforts from the rest of the community.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper. In this manuscript Hendi et al. examined how two independent mechanisms, Wnt signalling and gap junction control two critical aspects of neuronal tiling. Here they have quite elegantly used two neighboring GABAergic motor neurons to show while one specific C. elegans Wnt-homolog, EGL-20, regulates the axonal tiling; innexin UNC-9-mediated gap junction at a very specific position on these axons regulate the chemical synapse tiling on these axons. They also performed multiple experiments to show that the UNC-9 gap junctions controls chemical synapse tiling independent of their channel activity.

      Overall, the paper is interesting and would be of general interest for many neuroscience researchers, specifically to those who are studying neuronal tiling and the role of gap junctions. However, there are some concerns with this study.

      Major concerns:

      1) Authors here only looked at the tiling of axons and presynaptic clusters in DD5/DD6 axons. However, these neurites get transformed in L1 from dendrite to axon and subsequently the nature of the synaptic termini also changes from postsynaptic to presynaptic. To say that egl-20/UNC-9 specifically control axonal tiling and GABAergic presynaptic tiling the authors must check the dendritic tiling and tiling of postsynaptic termini. Specifically, a) does UNC-9 channels also affect the postsynaptic patterning in L1? b) what is the time of unc-9 puncta formation? Is it present in the L1 stage or appears at L2 stage only after the fate switch from dendrite to axon? c) does egl-20 also control dendritic tiling in L1?

      We thank the reviewer for their insightful comments. As described in our original manuscript, we could not check the dendritic tiling between DD5 and DD6 at L4 stage due to the inconsistent labeling of DD6 dendrite with our fluorescent marker. As an alternative method, we measured the length of the (ventral) posterior dendrite of DD5 and showed that it is significantly longer in the egl-20(n585) mutant than in wild type at L4 stage. We also measured the length of postsynaptic domains in the DD5 posterior dendrite and showed that it was also longer in the egl-20(n585) mutant than wild type. Furthermore, we show that the UNC-9 localization at the tip of DD6 dendrite is unaffected in the egl-20(n585) mutant, despite the extension of postsynaptic domains. From these observations, we suggested that postsynaptic spines are distributed throughout the dendrite of DD5 in the egl-20(n585) mutant, and it is not regulated by unc-9.

      In the revised manuscript, we included images of wild type and egl-20(n585) animal in which ACR-12::GFP is co-labeled with mCherry::CAAX. In these strains, the expression of mCherry::CAAX and ACR-12::GFP is not detectable in DD6 in most animals. Using these strains, we confirmed that the DD5 postsynaptic sites are present throughout the dendrite of DD5 in both wild type and egl-20(n585) mutant backgrounds (Figure 1- figure supplement 1).

      a) Unfortunately we were not able to quantify postsynaptic patterning at L1 due to the low expression of ACR-12::GFP and mCherry::CAAX at L1 stage.

      b) UNC-9::7×GFP puncta are present at the tiling border of DD neurons on both ventral and dorsal sides throughout the development. In the original manuscript, we only showed the UNC-9 localization at the dorsal side. We believe our limited description of UNC-9 in the dendrites has caused confusion regarding the phenotypes of DD5 posterior dendrite and postsynaptic sites. In the revised manuscript, we have updated the images of UNC-9::7×GFP to show that the puncta are present in both axons and dendrites (Figures 2F-H).

      In the revised manuscript we also show that UNC-9 puncta are present at DD tiling border in L1 animals. We have included images of UNC-9::7×GFP at L1 at the axonal and dendritic tiling borders of DD5 and DD6 in both wild type and egl-20(n585) animals in Figure 2- figure supplement 5.

      c) As described above, we could not quantify dendritic tiling at L1 due to the low expression of our fluorescent makers at the L1 stage.

      2) Authors have shown that the previously known regulators for gap junction formation, NLR-1 and ZOO-1, do not regulate UNC-9 gap junction puncta on DD5/DD6 axons. Since they are cell adhesion molecule and tight junction component, respectively, presynaptic tiling should be checked in these mutants as well. Also, it is not clear whether these proteins are expressed in DD5/DD6 neurons at all. Since, NLR-1 has previously been shown to regulate unc-9 puncta in nerve ring, expression of these genes in DD5/DD6-neurons should be checked before making these conclusions.

      In the revised manuscript, we have included the presynaptic tiling quantification in zoo-1(tm4133); egl-20(n585) and nlr-1(miz202) egl-20(n585) mutants which showed no significant presynaptic tiling defects (Figure 2- figure supplement 1). We also cited a paper (Taylor et al., 2021) that described the expression of zoo-1 and nlr-1 in the DD neurons.

      3) Authors assumed that the relevant gap junction to be an UNC-9 homotypic homomeric channel, but DD neurons also express several other innexins (inx-1, inx-2, inx-10, inx-14 and unc-7). This raises the possibility that unc-9 channel could be heteromeric in nature. Effect of some other expressed innexins on synaptic tiling apart from unc-7 should also be tested.

      We thank the reviewer for their comment. As per their advice, we tested four additional innexins (inx-1, inx-2, inx-10, and inx-14) which have been reported to be expressed in DD neurons and examined their potential role in presynaptic tiling in egl-20(n585) mutant background. We found that none of them showed significant presynaptic tiling defect. In the revised manuscript, we have included this data in Figure 2E.

      4) Effect of unc-9(Del18) and unc-1 double mutant should be tested.

      We knocked out unc-1 using CRISPR/Cas9 genome editing in the egl-20(n585); unc-9(syb3236 [unc-9(ΔN18)]) mutant background and observed no significant presynaptic tiling defect compared with egl-20(n585); unc-9(syb3236 [unc-9(ΔN18)]), which further strengthen our model that the gap junction channel activity of UNC-9 is dispensable for its function in presynaptic tiling. We have included this data in Figure 5D.

      5) Authors have acknowledged the need to study the role of UNC-9 gap junction channels in maintaining the presynaptic pattering. This reviewer appreciates that idea and suggests the authors check whether late expression of UNC-9 is sufficient to rescue the presynaptic pattering defect observed in egl-20; unc-9 double mutant animals.

      We thank the reviewer for their comment. We conducted late rescue experiment using a heat shock promoter to express unc-9 at L2 stage after the presynaptic tiling competes. We did not observe significant rescue in presynaptic tiling defect in two independent transgenic lines of Phsp::unc-9. While we understand that this does not deny the function of unc-9 for the maintenance of presynaptic tiling, this result is consistent with the idea that unc-9 is required for the establishment of presynaptic tiling. We have included this data in Figure 2- figure supplement 4.

      Reviewer #3 (Public Review):

      This interesting paper from Hendi et al. describes a novel mechanism governing synaptic tiling that depends on expression of a gap junction protein at the border between adjacent presynaptic domains of neighboring neurons. The authors define the role of innexin UNC-9 in establishing the spatial arrangement of synapses in adjacent C. elegans GABA motor neurons. They show that axonal tiling is controlled by Wnt signaling. However, synaptic tiling is preserved when axonal tiling is disrupted in egl-20/Wnt mutants. Synaptic and axonal tiling are both disrupted in egl-20; unc-9 double mutants, suggesting these two processes are controlled through distinct molecular mechanisms. The authors find that UNC-9 is localized to the border between axons of adjacent GABA neurons and provide evidence that the function of UNC-9 in tiling does not require its channel function. The experiments are made possible by the development of a new system for labeling adjacent GABA motor neurons that will also be of general use to the field. The studies rule out requirements for either gap junction activity or several other genes previously implicated in gap junction function/localization, but fall short of clearly defining mechanism. Instead, the study provides additional support for channel-independent structural roles of gap junctions in the nervous system.

      The study's conclusions are generally well-supported by the data but more clarification is required in some areas:

      1) Overlaps between DD5 and DD6 dendrites are not evaluated directly. The authors show the extent of labeling in the DD5 dendrite. This should be clarified.

      We thank the reviewer for their comment. As described above, we could not directly quantify dendritic tiling defect between DD5 and DD6 neurons due to the inconsistent expression of mCherry in the dendrite of DD6. Alternatively, we measured the length of DD5 posterior dendrite in wild type and the egl-20(n585) mutant, and found a significant increase in the DD5 posterior dendrite length in the egl-20(n585) mutants. In the revised manuscript, we have edited the text to more clearly explain the defect of DD5 posterior dendrite.

      2) The authors suggest UNC-9 establishes axonal tiling as early as L2 stage, immediately following DD remodeling. However, no data is shown for UNC-9 localization at this developmental stage. It would also be interesting to know whether UNC-9 performs a similar role prior to remodeling, or if UNC-9 itself undergoes redistribution during the remodeling process.

      We thank the reviewer for their comment. As described above, we acknowledge our initial description of UNC-9 localization in the DD neurons was not sufficient. UNC-9 is present at both the axonal and dendritic tiling borders between DD5 and DD6 neurons throughout the larval development.

      In the revised manuscript, we included UNC-9 localization at the axonal and dendritic tiling borders between DD5 and DD6 in both wild type and egl-20(n585) animals at the L1 stage (Figure 2- supplement figure 5). However, we could not determine whether egl-20(n585); unc-9(e101) mutant exhibits presynaptic patterning defect in the ventral axons prior to remodeling at the L1 stage due to the low expression of our axonal and presynaptic markers at L1 stage.

      3) Based on the representative image, UNC-9 abundance appears reduced in unc-104. The authors should quantify.

      We thank the reviewer for their comment. In the revised manuscript, we quantified the signal intensity of UNC-9::7×GFP at the DD5-DD6 axonal tiling border in wild type, egl-20(n585), unc-104(e1265), zoo-1(tm4133) and nlr-1(gk366849). We found that the fluorescent intensity of UNC-9::7×GFP was indeed slightly lower in egl-20(n585) and unc-104(e1265) mutants compared with wild type animals. This result implies that egl-20 and unc-104 have a minor role in UNC-9 localization. Nevertheless, the UNC-9 puncta are always present in all genotypes we examined. The quantification is included in Figure 2- figure supplement 6, and we suggest that the weak presynaptic tiling defect in the egl-20 single mutant could be explained by this reduction of UNC-9 localization (lines 284-285).

      4) The authors show the distribution of muscle NLG-1 mirrors that of RAB-3. While this suggests the altered distribution of RAB-3 reports on synaptic rearrangement, this conclusion would be strengthened by analysis of an active zone marker.

      We agree with the reviewer that examining the co-localization of RAB-3 with an active zone protein would strengthen our conclusion. As such, we expressed BFP::RAB-3 under the DD specific promoter, flp-13, in a transgenic marker strain (wyIs292) that expresses the active zone protein, UNC-10::tdTomato under the GABAergic promoter, unc-25, and NLG-1::YFP expressed under the body wall muscle promoter, unc-129dm (Maro et al., 2015). Using this strain, we show that RAB-3 co-localized with UNC-10 and apposed to the postsynaptic NLG-1 in both wild type and the egl-20(n585); unc-9(e101) mutant. The representative images are included in Figure 2- figure supplement 2.

    1. Author Response

      Reviewer #1 (Public Review):

      The stated goal of this research was to look for interactions between metabolism, (manipulated by glucose starvation) and the circadian clock. This is a hot topic currently, as bi-directional links between metabolism and rhythmicity are found in several organisms and this connection has important implications for human health. The authors work with the model organism Neurospora crassa, a filamentous fungus that has many advantages for this type of research.

      The authors' first approach was to assay the effects of glucose starvation on the levels of the RNA and protein products of the key clock genes frq, wc-1, and wc-2. The WC-1 and WC-2 proteins form a complex, WCC, that activates frq transcription. The surprising finding was that WC-1 and WC-2 protein levels and WCC transcriptional activity were drastically reduced but frq RNA and protein levels remained the same. Under conditions where rhythmicity is expressed, the rhythms of frq RNA, FRQ protein, and expression of clock-driven "output" genes were also unaffected by starvation. The standard model for the molecular clock is a transcription/translation feedback loop dependent on the levels and activity of these clock gene products, so this disconnect between the starvation-induced changes in the stoichiometry of the loop components and the lack of effects of starvation on rhythmicity calls into question our understanding of the molecular mechanism of the clock. This is yet another example of the inadequacy of the TTFL model to explain rhythmicity. For me, the most significant sentence in the paper was this: "...an unknown mechanism must recalibrate the central clockwork to keep frq transcript levels and oscillation glucose-compensated despite the decline in WCC levels."

      The author's second approach was to try to identify mechanisms for the response to starvation by focussing on frq and its regulators, using mutations in the frq gene and strains with alterations in the activity of kinases and phosphatases known to modify FRQ protein. The finding that all of these manipulations have some effect on the starvation-induced changes in WC protein level is taken by the authors to indicate a role for FRQ itself in the response to starvation. This conclusion is subject to the caveat that manipulations of the activity of multifunctional kinases and phosphatases will certainly have pleiotropic effects on many cellular processes beyond FRQ protein activity.

      Because of the sometimes-speculative nature of our conclusions and based on the suggestion of the editor, we restructured the Discussion and discuss now the mechanism addressed by the Reviewer in the subsection "Ideas and Speculation". We added a sentence to the section about the possible pleiotropic effects of the tested signaling pathways: "Starvation triggers characteristic changes in the activity of signaling routes that affect basic components of the circadian clock. Although the multifunctional pathways might act via pleiotropic mechanisms as well, based on their earlier characterized role in the control of the Neurospora clock, their action can be inserted into a model describing the glucose-dependent reorganization of the oscillator."

      The third section of the paper is a major transcriptomic study of the effects of starvation on global gene expression. Two strains are compared under two conditions: wc wild-type and the wc-1 knockout strain, under fed and starved conditions. The hypothesis is that WCC has a role in the starvation response. The results of starvation on the wild-type are unsurprising and predictable: the expression of many genes involved in metabolic processes is affected. There are no new insights that come from these results and no new testable hypotheses are generated by the data.

      We agree with the reviewer that it is not surprising that glucose depletion strongly affects genes involved in metabolic processes and monosaccharide transport. These data obtained in wt served rather as a control for our experimental conditions. As a new aspect, our analysis focused on the differences between wt and wc-1 in the transcriptomic response to altered glucose availability.

      The authors refer to the wc-1 mutant strain as "clockless" and discuss its effects on the transcriptome only in terms of WC-1's function in the clock mechanism. However, WCC is known to be a major transcriptional regulator, controlling a number of genes beyond the TTFL. As acknowledged earlier in the paper, WC-1 is also the major light receptor in Neurospora. The transcriptomics experiments were carried out in a light/dark cycle, with cultures harvested at the end of the light period, when "an adapted state for light-dependent genes can be expected" according to the authors. However, wc-1 mutants are essentially blind, and so those samples are equivalent to being harvested in the dark. The multifunctional nature of WCC complicates the interpretation of the transcriptomics data. The differences in the transcriptome between wild-type and wc-1 may not be due to loss of clock function, but rather the loss of a major multifunctional transcription factor, or the difference between light and "dark".

      The reviewer is right, when we discussed the difference between wt and wc-1 in the transcriptional response to glucose, we did not emphasize the possible contribution of the photoreceptor function of the WCC. We added the following sentence to the revised version of the discussion: "Further investigations could differentiate between the clock and photoreceptor functions of the WCC in the glucose-dependent control of the transcriptome." Furthermore, we more specifically indicate that in wc-1 the lack of the WCC (and not the lack of a functional clock) results in the altered transcriptomic response to starvation when compared to wt (P15 L14-17).

      In the final set of experiments, the authors tested the hypothesis that the changes in the transcriptome between wild type and wc-1 might make wc-1 less competent to recover growth after starvation. They also test the recovery of frq9, a "clockless" mutant. The very surprising result is that the growth rates of these two mutants are slower than the wild type after transfer from starvation media to high glucose. This is surprising because there will be several generations of nuclear division and doublings of mass within a few hours and the transcriptome should have recovered fully fairly rapidly. A mechanism for this apparent "after-effect" is suggested with evidence concerning differences in expression of a glucose transporter, but it is not clear why this expression should not change rapidly with re-feeding on high glucose. As with previous experiments, the cultures were grown in light/dark cycles, which results in different conditions for the mutants, both of which have very low or absent WC-1 and are therefore blind to light. The potential effects of light have been disregarded.

      The reviewer is right that several generations of nuclear divisions occur within a few hours and lead to a number of doublings of the biomass. However, when the first phase of regeneration is delayed in one or more strains compared to the control, until the stationary phase a substantial difference in the biomass can be expected.

      To the expression change of the glucose transporter: In order to emphasize the different tendency of how glt-1 levels respond to glucose in the different strains, in the previous version of the manuscript we normalized the expression levels to the beginning of recovery (time point of glucose addition). Thus, expression differences between the strains were not shown. To give a more comprehensive picture, in the revised version of the manuscript expression levels without normalization are depicted (Fig 5F). The mutants did not adapt efficiently to changes in the glucose levels, i.e. expression of the transporter was relatively high in both wc-1 and frq10 during starvation and did not further increase upon glucose addition. On the other hand, 24 hours after glucose resupply, glt-1 levels were similar in all strains which might contribute to the similar growth rates observed under steady-state conditions in the standard medium.

      To the photoreceptor-independent function of the WCC during growth recovery: In the revised version of the manuscript we present additional data suggesting the importance of the photoreceptor-independent function of the WCC for efficient recovery from starvation. Fig. 5C and Fig. 5D show now that upon resupply of glucose, wt grows faster than the clock-deficient strains Δwc-1 and frq10 in both LD cycles and constant darkness, indicating that the role of the WCC in growth regeneration is at least partially independent of its photoreceptor function. To the function of the WCC in frq10: frq10 can not be considered blind. Although both Δwc-1 and frq10 lack a functional clock and WC levels are reduced in frq10, these strains show significant differences in WCC activity. While Δwc-1 is considered blind, in frq10 lack of the negative feedback results in high activity of the WCC in both DD and LL and expression levels of all examined, light-sensitive or light-dependent genes were found comparable in wt and in frq-less mutants (Schafmeier et al., 2005; Hunt et al., 2007; own unpublished data).

      The title of the paper refers to a "flexible circadian clock" but this concept of flexibility is not developed in the paper. I would substitute "the White Collar Complex" for this phrase: "Adaptation to starvation requires a functional White Collar Complex in Neurospora crassa" would be more accurate. Some experiments are also conducted using an frq null "clockless" strain, but because WC expression is very low in frq null mutants, any effects of frq null could also be attributed to WC depletion.

      As detailed above, low level of the WCC in the frq-less mutant does not mean low transcriptional activity and accordingly, the two clock mutants, wc-1 and frq10 show important functional differences. We used the word "flexible" to indicate that the molecular clock is able to operate under critical nutrient conditions and with a significantly changed stoichiometry of its key components. Results of our new experiments performed in DD (mentioned above) indicate that growth regeneration is rather independent of the photoreceptor function of the WCC. Nevertheless, we accepted the criticism of the reviewer and changed the title to "Adaptation to glucose starvation is associated with molecular reorganization of the circadian clock in Neurospora crassa".

      The major conclusion I took away from this paper is the multifunctional nature of the WCC as a transcription factor complex. It has been known for a long time that WCC controls the expression of many genes beyond the frq gene at the core of the circadian transcription/translation feedback loop. WC-1 is also the major blue light photoreceptor in Neurospora, controlling the expression of light-regulated genes, and this fact is barely touched on in the paper. These new data now extend the role of WCC in the regulation of metabolic networks as well.

      Reviewer #2 (Public Review):

      The authors have performed an interesting study addressing a topical question in considering how circadian oscillators remain accurate in changing environmental conditions and these circadian oscillators contribute to responses to environmental changes. The authors have performed their studies in Neurospora crassa. The authors have made a very interesting finding that starvation causes a profound decrease in white collar 1 WC-1 abundance, yet the circadian system continues to run despite this decrease in the abundance of a core oscillator component. The study of chronic glucose starvation in a Δwc-1 mutant is interesting and provides the opportunity to investigate the role of the WHITE COLLAR COMPLEX (WCC) and the clock system in adaption to starvation.

      Strengths:

      The authors have used a range of techniques to measure clock behaviour, including qPCR, phosphorylation, protein abundance, and subcellular localisation studies.

      An frq9 mutant was used to test the effects of FRQ on WC1 abundance since WC1 decreased during starvation. This is elegant, though it is not quite clear the logic of this experiment because FRQ did not change abundance during starvation, so why did the author think this experiment was needed?

      We regret that the examination of frq9 was not clearly justified in the previous version of the manuscript. It is true that FRQ levels did not change during starvation, only phosphorylation of the protein was affected, i.e. FRQ became more phosphorylated (displayed by an electrophoretic mobility shift on the Western blot (Garceau N, Liu Y, Loros J J, Dunlap J C. Cell. 1997;89:469–476.)) under low glucose conditions. We tested the starvation response in the FRQ-less strain because WCC level changed significantly in wt upon glucose depletion and expression of WC proteins is known to be controlled by FRQ. In the revised version of the manuscript we tried to introduce and explain the experiments performed with frq9 more thoroughly (P7 L22-P8 L14; P16 L21 – P17 L6).

      An interesting experiment was performed to test whether CK1a-dependent phosphorylation and inactivation of the WCC are involved in the starvation response. An FRQΔFCD1-2 mutant is used in which FRQ cannot interact with CK1a and therefore CK1a cannot phosphorylate and inactivate WC. This experiment suggested that CK1a is not involved in the response to starvation, again leading to the conclusion that FRQ is not involved in the starvation regulation of WC.

      The referee is right, effect of FRQ-bound CK-1a seems to be minor on the adaptation of the molecular clock to starvation, and this is also our conclusion in the manuscript. The major message of this experiment was that FRQ became phosphorylated in response to starvation without stably interacting with CK1a, probably via another mechanism. We agree with the notion that the behavior of WCC levels upon starvation was similar to that in the FRQ-less mutant.

      PKA is shown to be involved in the starvation-induced reduction of WC because the starvation-induced reduction in abundances of WC-1 was absent in the mcb strain in which the regulatory subunit of PKA is defective and hence, PKA is constitutively active.

      The authors have found an interesting potential link between glucose levels and WCC phosphorylation, they demonstrated that starvation reduces PP2A activity and that in a regulatory mutant of PP2A, which has reduced PP2A activity, there is little effect of starvation on WCC levels, suggesting the hypothesis that glucose-dependent PP2A dephosphorylation stabilises WCC.

      Analysis of starvation-regulated transcriptome in Δwc-1 and wild type found strong evidence that the transcriptomic response to starvation is in part dependent on WCC. Much of the misregulated transcriptome appears to be associated with metabolism.

      In a series of growth studies in wild-type frq and wc-1 mutants the authors provide strong evidence that FRQ and WC are involved in growth and survival following starvation, and recovery from starvation.

      Weaknesses:

      The authors describe Neurospora crassa as a model for circadian biology and apparently make the assumption that the findings are indicative of the behaviour of clock systems in other kingdoms. This is not the case. Neurospora crassa is a wonderful model for studying fungal clocks and is a great tool for studying basic circadian dynamics, but the interesting findings here are of a detailed molecular nature and therefore are applicable for fungal clocks, but not other kingdoms.

      We agree that we still do not know whether the described mechanism is specific for only fungal clocks. However, besides the basic feedback loop, overlapping mechanisms (controlled by e.g. casein kinases, glycogen synthase kinase, PKA, PP2A) are involved in the regulation of circadian timekeeping in different eukaryotic systems (reviewed in Reischl and Kramer, 2011, FEBS Lett; Brenna and Albrecht, 2020, Front Physiol). Our results suggest that some of these common factors (PKA, GSK, PP2A) are involved in the reorganization of the Neurospora clock in response to changes in glucose availability. Therefore, it is possible that analogous changes occur in the time keeping mechanisms of other eukaryotic systems when they face serious environmental challenges.

      We included a short section into the Discussion which gives a short overview about known interactions between glucose availability and circadian timekeeping at different levels of the phylogenetic hierarchy (P15 L18 – P16 L7).

      The authors assume that the reader is intimate with the intricacies of Neurospora crassa circadian studies and the significance of differences between LL and DD investigations. More background on the logic of the experiments would be helpful for readers from other fields.

      Thank you for the comment. In the revised version of the manuscript we tried to introduce the molecular clock of Neurospora more thoroughly and completed the description of the experimental conditions with detailed explanations.

      The data in Figure 2 are essential for the interpretation of the findings, demonstrating the presence of free-running rhythms. However, the data are entirely qualitative, making it hard to fully assess the authors' interpretations, a more quantitative assessment of the data would improve clarity.

      We quantified the Western blot signals and show the results in Fig 1E in the new version of the manuscript (according to the reviewer's suggestion Fig 2 of the old version is now part of Fig 1). Our data indicate that oscillation of FRQ levels is similar under both nutrient conditions.

      The conclusion that FRQ contributes to the regulation of WC1 abundance in response to starvation does not seem to be supported by the data because FRQ RNA does not change upon starvation. Furthermore, the authors conclude that the starvation-induced decrease in WC-1 and WC-2 protein levels are due to FRQ because a lack of reduction in an frq9 mutant is open to misinterpretation because this mutant makes WC levels low and therefore starvation might not lower already low levels of WC. Indeed WC-1 is lower in the frq9 mutant under any condition than in the WT under starvation and WC-2 does decrease in abundance in the frq9 mutant in starvation. The data strongly suggest to this reader that FRQ does not participate in the regulation of WC abundance in response to starvation.

      After rereading the criticized section, we admit that the text was not well structured and we carried out several modifications. We intended to emphasize that upon drastic changes of the glucose availability frq RNA levels remained compensated in wt, but this compensation was affected when functional FRQ was not present. We agree with the reviewer's opinion that the low expression of the WCC in frq9 makes it difficult to compare the glucose-dependence of WCC expression in frq9 and wt. We modified the conclusion by adding this information and now mainly focus on the strain-dependent difference in the changes of frq RNA expression. (P7 L22-P8 L14)

      The discussion accurately summarises the results and provides an interpretation but lacking is a comparison to other circadian systems in other kingdoms. How do the data compare with the effects of glucose and other sugars on the mammalian, plant, and insect clocks?

      We included a short section into the Discussion which gives a short overview about known interactions between glucose availability and circadian timekeeping in different organisms (P15 L18-P16 L7).

      How changes in WCC might result in changes in transcription is not explained. This might be very obvious to the authors but to the reader, it is not. Are the transcriptional outputs direct targets of WCC? Has WCC CHIPseq been performed by the authors or others, are the regulated transcripts directly bound by WCC? What are the enriched promoter sequences in the regulated genes, is it possible to identify the network by which these changes in transcription occur?

      We now show the list of genes (Figure 4 – Figure supplement 2) that changed in a strain-specific manner in response to glucose starvation and, based on Chip-Seq results, were earlier described as direct targets of the WCC (Smith et al., 2010; Hurley et al., 2014). Based on the literature data showing that the WCC affects the expression of several other transcription factors and controls basic cellular functions which might affect the expression of further genes, it was not surprising that only 90 out of the 1377 genes were reported to be direct targets of the WCC.

      Whilst the authors claim it is the circadian clock that is involved in the starvation response, in my view a more precise interpretation of the data is that WCC is involved in the response. Since WCC is a photoreceptor with dual function in the clock, is it yet possible to conclude that the effects discovered are due to the clock role of WCC? Or do the data support the role of light signalling in regulating the starvation response through WCC?

      We thank you for the comment. In the revised version of the manuscript we more specifically indicate that in wc-1 the lack of the WCC (and not the lack of a functional clock) results in the altered transcriptomic response to starvation compared to wt. In addition, in the revised version we present a new experiment (Fig. 5D.) which shows that upon resupply of glucose wt grows faster also in constant darkness than the clock-deficient strains wc-1 and frq10 do. This indicates that the role of the WCC in growth regeneration is largely independent of its photoreceptor function.

      The authors do not apparently reconcile that the effect of starvation is to hugely decreases WCC levels, but they find the transcriptional and growth response to starvation requires WCC?

      We agree with the reviewer that the problem of how low levels of WCC could sufficiently support the transcription of frq and different output genes under starvation conditions was not discussed properly. Our results suggest a model in which the maintained level of nuclear WCC and the weakened inhibition by both FRQ (the hyperphosphorylated form is less active in the negative feedback) and PKA (its activity lowered upon glucose depletion) together might ensure that transcriptional activity of the WCC is preserved upon glucose withdrawal in both DD and LL despite the decrease of the overall level of the complex. In the revised version these aspects are discussed more thoroughly (P16-18).

      This study contributes to the increased focus of the circadian community on the regulation of outputs by circadian oscillators. The manuscript will be of interest to many in the field. There needs to be less assumption of knowledge about the N. Crassa circadian system, and better discussion in a broader context of clocks in other kingdoms.

      We added a new section to the Discussion with data concerning interrelationships between glucose availability and the circadian clock in other organisms.

    1. Author Response

      Reviewer #1 (Public Review):

      Drosophila ovarian follicle cells have been utilized as a model system to study organogenesis and tumorigenesis of epithelia. Studies have found that lack of proper cell polarity causes invasive delamination of cells and formation of multilayered epithelia, reminiscent of Epithelial-Mesenchymal Transition (EMT). Using this system, the authors analyzed the single-cell transcriptome of follicle cells and show that distinct cell populations emerge shortly after induction of polarity loss. Authors identified dynamic activation of Keap1-Nrf2 pathway Finally, subpopulation classification and analysis of regulon activity identified that Keap1-Nrf2 pathway is responsible for epithelial multilayering caused by polarity loss.

      Strengths:

      The authors characterized the single-cell transcriptome of follicle cell subpopulations after induction of polarity loss. Using temperature-inducible driver, they can induce the polarity loss in a short period of time, which enables detection of epithelial populations in various transition stages. Detected cell-heterogeneity could be caused intrinsically or by environmental cues within in vivo tissue. Therefore, it is likely well recapitulating tumorigenesis in vivo.

      Weaknesses:

      1) Authors should show cells corresponding to identified key cell clusters within the tissue by immunostaining, GFP-trap, or RNA FISH.

      We thank the reviewer for their comment. However, for this particular case, we would like to underscore the observation that the clusters derived from our integrated analysis do not exhibit mutually exclusive gene expression. This is unlike other studies where different clusters exhibit unique markers. The different clusters in this study represent distinguishable cell states and not distinct cell types. Even though the Lgl-KD follicle cells transcriptomically deviate from their corresponding cells of origin to form their own clusters, they continue to express several markers that show gene-expression overlap with normal follicle cells. This overlap exacerbates the problem of identifying distinct cells using differentially-enriched markers.

      However, we have shown the antibody staining against Drpr to identify cluster 8 follicle cells that associate with Dcp1+ dying germline cells. We have used GstD-lacZ reporter (cluster 7 marker, specifically cluster 7_3) to show pathway activity within the multilayer. Besides GstD-lacZ, we also show F-Actin enrichment in cluster 7 (specifically 7_3) cells, that is significantly enriched in invasive cells. Additionally, we now have added images depicting the cell/stage specific expression pattern of JNK pathway components pJNK and puc, as well as that of Thor (4E-BP) which is expressed at high levels in cluster 8 and medium levels in cluster 7, and Xbp1-GFP (UPR stress sensor) that marks late stages of Lgl-KD cells.

      2) Images are low magnification and difficult to see individual cells.

      We have replaced several such images in the revised manuscript. Specifically, the revised manuscript has entirely new (or improved versions of) image panels in figure 5. In figure 1A, the focus is the entire ovariole and therefore, we have only highlighted the enrichment of Hnt and pH3 antibody staining separately for a subsetted region of interest (ROI). The ROI panels are included within the larger image itself. For figure 6, we have converted the LUTs of panels showing distinct channels for RFP and Shg/Arm antibody stainings to grayscale.

      3) Manuscript is written weighted toward the technical aspect and more biology behind this study has to be discussed.

      We have added new paragraphs to discuss the evidence supporting the loss of polarity, specifically that of Lgl, in human cancers. Additionally, we have also discussed how our results regarding Keap1 relates to what is already known about it and the implications of our results in context to cancer progression and metastasis.

      Reviewer #2 (Public Review):

      Chatterjee et al. perform extensive image and single-cell RNA sequencing (scRNA-seq) analysis of Drosophila ovaries with and without knockdown of a gene, Lethal giant larvae (Lgl), which is known to establish apical-basal polarity as well as controlling proliferation of epithelial tissues. The goal of the study is to characterize the effect of apicobasal-polarity loss in epithelial cells via Lgl knockdown on Drosophila ovaries at the phenotypic, cellular, single-cell gene expression and regulatory level. By focusing on single-cell gene expression clusters that are unique to Lgl-KD compared to those from flies without the knockdown, they were able to identify a highly transient cluster (cluster 7) which consists of tumorigenic cells. Differential markers within a sub-cluster (cluster 7_3) of this cluster followed by validation using a GstD-lac-Z enhancer-trap reporter assay lead to their conclusion that cluster 7 represents the cells of multilayering phenotype (i.e., the major Lgl-KD phenotype observed from image analysis) where activation of Keap1-Nrf2 signaling was observed. The KEAP1-NRF2 pathway is associated with protecting cells from oxidative stress. KEAP1 forms part of an E3 ubiquitin ligase, which controls NRF2, a transcription factor, by targeting it for ubiquitin-mediated proteasomal degradation. Surprisingly, inducing loss of function of both Keap1 and separately NRF2 (cnc in Drosophila) in Lgl-KD cells resulted in the same phenotype/rescue, loss of the multilayering phenotype. Over expression of Keap1 in Lgl-KD induced increased multilayer volume compared to Lgl-KD alone further supporting the role of Keap1 in cellular invasion and possibly early stages of tumorigenesis when epithelial cells start losing their polarity.

      The strengths of this paper are:

      The mutually reinforcing advanced imaging, scRNA-seq and genetic manipulation (knockdown and over expression) experiments/analyses that largely support the major conclusions of the manuscript which are summarized above as well as more minor observations that the authors make.

      The systems biology flow of the study from broad to a specific gene/pathway implicated in the phenotype. The authors start with a clear phenotypic characterization of Lgl-KD and genome-wide scRNA-seq analysis. This leads to regulatory factor enrichment and further identification of a cluster (cluster 7) and then to a sub-cluster (cluster 7_3). This is followed by the identification of the KEAP1-NRF2 pathway and demonstration that KEAP1 knockdown and overexpression in Lgl-KD rescues and aggravates the cell multilayering phenotype, respectively.

      The multilayering phenotype, genes and regulatory factors associated with loss of polarity are known to play an important role in the epithelial to mesenchymal transition (EMT). For example, this includes the enrichment of AP-1 family members, which are known to regulate EMT, in the regulon analyses as well as identification of KEAP1-NRF2.

      The weaknesses of the paper are:

      The framing/motivation of the study could be improved especially for those who study EMT/metastasis in humans. Given that loss of polarity is one of many events associated with tumorigenesis and metastatic progression, the claims made that studying Lgl-KD in Drosophila ovaries directly leads to insights into tumor cell invasiveness, early stages of tumorigenesis and EMT may leave some readers doubtful if they are not familiar with Lgl. Reviewing major findings that show that Lgl is a tumor suppressor as is its human homologue Hugl-1 as well as making a stronger case that studying Lgl-KD in Drosophila is relevant for tumorigenesis and EMT would be helpful.

      We thank the reviewer for these suggestions. Accordingly, we have added new paragraphs to the Discussion section, where how the Lgl-KD mediated polarity loss links to mammalian tumorigenesis, as well as the implications of our results, have been discussed.

      Given that Keap1 antagonizes NRF2, the apparent contradictory result that inducing loss of function of both Keap1 and separately NRF2 (cnc in Drosophila) in Lgl-KD cells resulted in the same phenotype/rescue (loss of the multilayering phenotype) is not fully addressed. Keap1 over expression revealed it aggravates multilayering. NRF2 over expression experiments were not performed. In addition, it was shown that over expression and knockdown of Keap1 did not affect NRF2 gene expression (Figure 5C); however, Keap1 regulates Nrf2 at the protein level directly via ubiquitin-mediated proteasomal degradation. Nrf2 protein levels in flies with and without Lgl-KD with various manipulations of Keap1 including control, KD and OE were not measured.

      As the Keap1-Nrf2 pathway is widely studied in context of oxidative-stress response signaling, Keap1 is widely accepted as a negative regulator of Nrf2-driven transcription. However, Nrf2 has been found to positively drive the expression of Keap1 (Sykiotis and Bohmann, 2008), and that manipulating Keap1 did not change Nrf2 expression (Fig.5C). In response to this comment however, we performed additional experiments driving the ectopic expression of Nrf2 (CncC-OE) in Lgl-KD cells, which increased the invasiveness of Lgl-KD cells, similar to that by Keap1-OE. Since the UAS-CncC line has been shown to upregulate Keap1 expression (Sykiotis and Bohmann, 2008), we concluded that this increase in invasiveness is indirectly due to the increase in Keap1 expression itself.

      Given that the antagonizing relationship of Keap1 and Nrf2 is only relevant to oxidative-stress response pathway, the genetic epistasis experiments in this study render that relationship irrelevant in context to the observed phenotype, as KD or OE of both components result in comparable phenotypes. Previous studies showing that Keap1 plays a role in cytoskeletal regulation (which is in agreement with our observation) also add weight to the argument that the observed phenotype is likely an indirect consequence of Keap1-Nrf2 signaling activation.

      Many of the conclusions in early Results paragraphs are purely technical and not biological. For example, "These observations highlight the limitations of marker validation to identify specific cells of the differential Lgl-KD phenotype" and "SCENIC was able to detect the common as well as distinct transcriptomic states of the cells in unique Lgl-KD clusters, while also highlighting the heterogeneity among them". Some of these technical conclusions could be part of brief discussions in the Methods section.

      For those not familiar with various detailed scRNA-seq analysis approaches (e.g., RNA velocity analysis), a brief description of how they should be interpreted biologically in Methods would be helpful. This might help resolve what appear to be contradictory/confusing results. First, the upper branch of cluster 7 (which is a focus of the study) shown in Fig. 3B is in a "late" stage based on Velocity Pseudotime analysis (left panel) and a "root" or an early stage based on Terminal end-points of differential analysis (right panel). The bottom branch of cluster 7 is "late"/"stable end point" based on these two analyses which is now consistent. Second, given these differences between the upper and lower branch of cluster 7, how is cluster 7 biologically the same cluster? Third, the bottom branch of cluster 7 bleeds into cluster 8 and while Ets21C is uniquely expressed in the bottom branch of 7, important markers of the study including Jra, kay (AP-1 family members), grnd, cnc (NRF2), Keap1, and the genes shown in Fig. 6F are all robustly expressed in clusters 7 (bottom branch) and 8. The biologically relevant distinction between the bottom branch of cluster 7 and 8 is not clear. Is cluster 8 important/relevant to the phenotypes observed as well?

      We have now added the following paragraph elaborating the logical choices made within the analytical pipeline in our Methods section:

      In this study, we have highlighted RNA velocity-derived interpretations that strictly agree with the other analytical perspectives pursued in this study. We applied scVelo to obtain information on the underlying lineage for (1) all unique Lgl-KD clusters, and (2) cluster-7 cells. The cells of the unique Lgl-KD clusters represent a mixed population of mitotic, post-mitotic, border-follicle cells and dying germline-cell associating cells that depict inconsistent transcriptional lineages. In this group of cells, the true developmental end-point of the observed Lgl-KD lineage is cluster 8 (germline-cell death occurs at the end of Lgl-KD follicular development), which likely consists of a mixed population of cells from the lateral epithelia as well as the multilayered epithelia, all responding to germline-cell death. Indeed, certain sections of cluster 7 appear more similar to cluster 8 and others seem comparable to that of cluster 13. These observations underscore our conclusions that the unique Lgl-KD clusters exhibit distinguishable gene expression, representing different cell states. For cluster 7, the state of transcriptomic heterogeneity is what defines its unique state of gene expression and we have assessed this heterogeneity by specifically sub-setting those cells.

      For a comprehensive interpretation of the results of the RNA-velocity based analysis, more information can be found in the scVelo tutorial (https://scvelo.readthedocs.io/).

    1. Author Response

      Reviewer #1 (Public Review):

      Gu et al. examine how activity in the substantia nigra pars reticulata (SNr) contributes to proactive inhibition - the suppression of upcoming actions - by recording SNr activity in rats performing a task requiring them to be prepared to cancel a planned movement. This task was developed in a previous study by the same authors in which they examined how globus pallidus pars externa (GPe) activity depends on proactive inhibition (Gu et al., 2020), which motivated the present focus on SNr. The task is rich and the complementary analyses of how the neural activity relates to the behavior, at the level of individual neurons and populations, are appropriate and illuminating. Overall, this study is well done and has the potential to be a nice contribution to our understanding of how the SNr, and therefore the basal ganglia, mediate behavioral inhibition. Addressing a few questions, however, would improve the paper.

      We appreciate both the positive comments and constructive criticism.

      • It is not obvious why the presence or absence of proactive inhibition should be determined on a session-by-session basis. It seems quite possible that proactive inhibition is not an all-or-none phenomenon, and also that it might be exhibited to a greater or lesser extent across a session (e.g., due to changes in motivational drive). It would therefore strengthen the paper to better explain the rationale for comparing neural activity across entire sessions "with" and "without" proactive inhibition. Within-session variation in proactive inhibition could be quite advantageous, allowing for within-neuron comparisons. It is even possible that the differences in neural activity that the authors report here using session-by-session analysis are an underestimate of the true effect of proactive inhibition.

      It is true that some of our analyses compare whole sessions with- and without- overall behavioral evidence for proactive inhibition. But our primary results come from within-session comparisons of Maybe-Stop to No-Stop trials. For this purpose, the session-wide assessment of proactive inhibition is primarily a screen for which sessions to use for within-session analysis.

      It would be desirable if we could use behavior to determine the degree of proactive inhibition on each individual trial, and then compare this to neural measures. Unfortunately, this is not generally feasible in our experiments. Our key evidence for proactive inhibition is the prolongation of reaction times (RTs). However, RTs are famously highly variable over trials. This variability likely reflects a variety of factors, not simply proactive inhibition. For example, in our previous paper (Gu et al. 2020) we showed that dividing trials into slower and faster RTs did not reproduce the same neural differences as comparing Maybe-Stop to No-Stop trials.

      An alternative approach to investigating proactive inhibition is to focus on the increased restraint that typically follows over-hasty responses. We found that when rats fail to Stop, on the next trial the degree of SNr variability increases (Fig. 6). We have now expanded this analysis to include additional types of errors. We find that another form of over-hasty action, premature responses before the Go cue, are also followed on the next trial by increased SNr variability (Fig. 6- supp1). By contrast, other error types (wrong choices; failure to respond quickly enough) do not provoke greater variability. These additional within-session analyses provide convergent evidence for increased variability as an adaptive response to failures evoked by excessive haste.

      • It is difficult to rule out alternative explanations for the observed differences in SNr activity. While the authors acknowledge this point in the 3rd paragraph of the discussion, they only discuss one potential alternative - reward expectation. Another difference between maybe-stop and no-stop trials is the likelihood that a particular target should be selected, which has also been shown to modulate SNr activity (Basso & Wurtz, 2002). As is often the case with complex behavioral tasks, there may be many other differences between trial types that may contribute to differences in neural activity. It would be helpful for the authors to more fully explain how their results relate to contextual modulation of SNr activity, and why the dependence of SNr activity on proactive inhibition may be a novel finding.

      We have expanded the Discussion to include additional alternative explanations.

      • A natural question arising from this study, as with most studies of neural recordings during behavior, is the causal nature of the neural activity. It would be non-trivial and beyond the scope of the current study to perform the sort of perturbations that could determine whether population variability causally relates to preparation to suppress actions. But it would be useful to discuss future experiments that might be able to test causality.

      We added in Discussion the possibility of using optogenetic manipulations of specific inputs to SNr, to help determine their distinct contributions to SNr firing patterns and proactive behavior.

      Reviewer #2 (Public Review):

      The authors have recorded the activity of neurons in the rat substancia nigra pars reticulata (SNr) while animals performed a version of a stop-signal task. The goal of this study is to investigate and describe the contribution of SNr in proactive inhibitory control. By examining single-cell responses as well as population activity, the authors show that increasing the probability of stop signal trials induces several changes in SNr responses. First, specific populations of SNr neurons increase their activity during proactive, direction-specific inhibition. At the population level, neurons are biased away from the side of the movement that has to be potentially inhibited. Second, during proactive inhibition, neuron activity is more variable, both at the single-cell and population levels. Finally, the authors show that animals' outcome history influences both firing rates and variability of neuron responses in the current trial. Especially, neural variability is increased following a failure to inhibit a movement.

      Strengths

      The manuscript provides an interesting and timely insight into the role of the basal ganglia output nucleus in movement initiation control. The paper is often clearly and concisely written (although see one issue related to this below). One of the main strengths of the work is to allow an interesting comparison with recent work by the same team, aimed at investigating the responses of another basal ganglia nucleus (GPe) in the same task, using similar analyses (this comparison is not extensively exploited in the discussion section though). Another potential strength is the use of different analysis scales. The authors investigated single-unit responses as well as population "trajectories" in the neural state space. This is an interesting option that could have been better motivated, given that the two approaches assume quite different brain operations.

      Thank you for the interest and careful comments.

      Weaknesses

      The analyses and results sometimes lack clarity and details. For instance, and unless I missed the information, it is not clearly stated whether "maybe-stop" trial analyses only include Go trials or if (failed) Stop trials are also considered. Moreover, quite complicated figures are often described very briefly in the main text. Methods are also often too succinctly described, and sometimes refer to a previous publication (Gu et al., 2020) that readers did not necessarily read.

      We have made a range of changes to make the analyses and their rationale more clear. This includes specifying that Maybe-Stop trials include both Go and Stop trials (and why). We have also added more details in both main text and Methods.

      There are some points that the authors might need to discuss more. Especially, a global picture of the role of the different basal ganglia nuclei during movement control would have been appreciated. Also, the authors monitored the activity of the rat basal ganglia output. We would have appreciated more information regarding the impact of this output activity on SNr target areas, as compared to their previous work that focused on GPe for instance. Another example concerns the observation that SNr activity is elevated during active inhibition regardless of the firing rate pattern before movement (increase or decrease). As noted by the authors themselves, this is inconsistent with the classical role assigned to the basal ganglia output nucleus (i.e. a decrease in activity promotes movement). Despite that this observation is of potential interest to readers working on the basal ganglia, it is not discussed.

      The revised Discussion includes a section on how altered basal ganglia output may affect targets to alter behavior.

    1. Author Response

      Reviewer #3 (Public Review):

      In the submitted manuscript, Eliazer et. al. conclude that Dll4 and Mib present on myofibers maintain a continuum of SC fates providing SCs capable of regenerating muscle and repopulatin the SC niche. The data provide new insights into the maintenance of SCs, demonstrating niche-derived factors are responsible for regulating SC behavior. Loss of either Dll4 or Mib from the myofiber reduces SC numbers and impairs muscle regeneration. Overall the data provide compelling evidence that niche-derived Dll4 and Mib regulate SC fate, however, whether the interaction maintains a continuum of SC fates as concluded by the authors is insufficiently supported by the data provided.

      We thank the reviewer for their comments.

      One significant issue with the manuscript is the "discovery" of an SC continuum related to the relative levels of Pax7 expression. A similar continuum was established nearly a decade ago by Zammit et al., 2004 and Olguin et al., 2004 and thus, is not new. The authors need to reference the work and discuss the prior published data with regard to the observations in the current manuscript. The data establishing a continuum of SCs and the relationship to Pax7 protein levels can largely be eliminated and referenced by the two former manuscripts. For example, these manuscripts establish that elevated Pax7 levels drive quiescence and low Pax7 levels correlate with differentiation. The data from these manuscripts establish that SCs with modest Pax7 protein levels can acquire quiescence accompanied by increases in Pax7 protein

      The omission of these two seminal papers was a massive oversight on our behalf. They have now been included. In the original manuscript we acknowledged that SCs exist on a continuum-a gradual transition from one state to another, based on scRNA-seq studies and the present data (Dll4, Pax7 and Ddx6 expression). The references for the sequencing data were included. But with all due respect to the reviewer, the Zammit and Olguin papers binned Pax7 into discrete classes once satellite cells had activated. This is not a demonstration of a continuum. Moreover, we do not make any statements about Pax7 levels in activated conditions. Therefore, the reviewer is drawing comparisons between two different contexts. The statements we have made as they pertain to a continuum under homeostatic conditions are accurate with publications to date.

      The data relating the level of Pax7 expression with Dll4a and Mib are intriguing but the authors do not establish a direct relationship, demonstrating that Dll4 or Mib regulate Pax7 levels. An alternative explanation is that Dll4 and Mib inhibit differentiation and thus promote SC quiescence indirectly. This is a critical distinction, as the authors could be correct and Dll4 via Mib regulate SC fate.

      We don’t make the claim that Dll4/Mib1 regulates Pax7 directly. We would side with the majority of publications showing that Notch signaling directly regulates Pax7. We have now added further experiments to examine whether Dll4 regulates Notch signaling. We crossed a transgenic mouse line harboring a Notch reporter with MF-Dll4 mice to analyze Notch signaling in SCs. The first experiment we performed with this reporter was to correlate the levels of Pax7 and Notch signaling on a cell-by-cell basis. In control mice, we found a linear positive relationship between levels of Pax7 and the Notch reporter. Next, we compared Notch reporter levels in control versus Dll4-null. We observed that Notch reporter levels decreased to below detectable levels in Dll4 null muscle. Therefore, Dll4 acts non-autonomously to regulate Notch signaling in SCs during homeostasis (refer to Reviewer 1 comment 1, and Essential revisions #3).

      The reviewer raises an important point: Does Notch regulate quiescence directly or a differentiation/commitment program when SCs are in a quiescent state. We never claimed that Dll4/Mib1 regulates quiescence. The only way to conclude anything about quiescence would be to examine expression of proliferative markers in vivo. Rather, throughout the manuscript we referred to Dll4 regulating the state of the quiescent SC pool, as measured by changes in Pax7 and Ddx6 expression. In the discussion section we had discussed that Notch signaling may regulate differentiation/commitment of cells in a quiescent state.

      It is unclear that the loss of Dll4 or Mib1 reduce diversity of SCs. If these repress differentiation then their loss would be expected to enhance differentiation and reduce SC numbers, which is what the data demonstrate.

      Diversity can be restated as the variability across a population. We demonstrate that the variance of Pax7 and Ddx6 expression decreases after Dll4 deletion. Important to note that we are analyzing the SCs that are not lost through differentiation. The fact that some of the SCs are lost through differentiation is not inconsistent with a shift in the continuum. We expect SCs to be lost through differentiation as they shift along the continuum towards a Dll4/Notch/Pax7 low state.

      We observe reduced number of Dll4/Pax7 high cells, which is consistent with a shift in continuum. The counterpoint would be that Dll4/Notch/Pax7 high cells commit to differentiation. There is no evidence for that conclusion in this work or any other work published to date. We discussed this issue in the results section.

      We have also performed an experiment where mice were treated with a lower dose of TMX to reduce rather than delete Dll4. We find that the total number of SCs does not change, while the relative number of Dll4/Pax7 high cells is reduced while mid and low are increased (Figure 4). This is consistent with a shift in a continuum of states.

      Finally, the injury data provided are for 4d post injury and thus, the data may represent a delay in regeneration as opposed to a failure to regenerate. At 30 d post injury regeneration is typically considered complete. How do wild type and Dll4 null as well as Mib null muscle compare at 30d post injury.

      We analyzed muscle regeneration of MF-Dll4fl/fl tissue, 40 days after injury. The mean CSA of muscle fibers are significantly smaller than the control fibers, suggesting a defect in tissue regeneration. This is now included in Figure 5-figure supplement 2. Due to time constraints, we have not performed the same experiment with Mib1 mutants.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The data presented here is, on the whole, descriptive. Whilst the descriptive elements are strong and important, more analysis and quantification is required to support the conclusions made in the paper. For example, in contrast to their analysis of the rail-MIP, their assertion that the ciliary vane orientation is linked to the CPC orientation is not backed up by quantification. In addition, this paper does not extensively discuss proteins within the MIP densities and central pair complex in detail, to the extent they can be discussed using the recent structures from Chlamydomonas.

      We thank the reviewer for pointing out these areas for improvement, which are addressed. We are grateful for their helpful suggestions, which we have incorporated to the best of our ability to improve the quality of the manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      Wang et al. elegantly exploit single-cell RNA-seq datasets to question the putative involvement of lncRNAs in human germ cell development. In the first part of the study, the authors use computational approaches to identify and characterize, from existing data, lncRNAs expressed in the germline. Of note, the scRNA-seq data used were generated from polyA+ RNAs, and thus non-polyadenylated lncRNAs could not be retrieved. Most of the lncRNAs identified in the germ cells and in the somatic cells of the gonads were previously unannotated. While this increases the catalog of lncRNA genes in the human genome, further characterization is needed to determine which fraction of these newly identified lncRNAs represent bona fide transcripts or transcriptional noise.

      Differential expression analysis between developmental stages, sexes, or cell types led to several observations: (i) whatever the stage of development, the number of expressed lncRNAs is higher in fetal germ cells compared to gonadal somatic cells; (ii) there is a continuous increase in the number of expressed lncRNA during the development of the germline; of note, a similar, although the more subtle trend is observed for protein-coding genes; (iii) the developmental stage at which there is the highest number of lncRNA expressed differs between male and female germ cells. While convincing, the significance of these observations is difficult to assess. However, the authors remain prudent with their conclusion and are not over-interpreting their findings.

      We appreciate Reviewer #2 precise summary of our analysis and highlighting the significances of these datasets for other researchers and future studies.

      Interestingly, integrating lncRNA expression to classify cell types led to the identification of a novel population of cells in the female germline that had not been revealed by protein-coding gene only-based classification. The biological relevance of this population, which cluster with mitotic populations, remains to be demonstrated. Finally, by examining lncRNA biotype, the authors could demonstrate an enrichment, in the germ cells, of the antisense head-to-head organization (in relation to the nearby protein-coding gene) compared to other biotypes. Whether this is different from the general distribution of lncRNA should be discussed.

      We analyzed the lncRNAs in NONCODEv5 database (human genome), and the result showed that XH type occupied 21.73% of the intragenic lncRNA-mRNA pairs in NONCODEv5 database (human genome), which is lower than 26.58% in fGC and 26.23% in mGC (Response Figure 1).

      Response Figure 1. Genomic distribution and biotypes of the lncRNAs in NONCODEv5 database and lncRNAs expressed in human gonad.

      In the second part of the manuscript, Wang et al focus on one pair of divergent lncRNA-protein coding genes (LNC1845-LHX8). To document the choice of this particular pair, it would be informative to have its correlation score indicated in Figure 3C. he existence of this transcript was validated using female fetal ovaries, and its function was addressed in late primordial germ cells like cells (PGCLC) derived from human embryonic stem cells (hESCs). The authors have used an admirable set of orthogonal approaches that led them to conclude as to a role for LNC1845 in regulating in cis the nearby gene LHX8. They further went on to identify the underlying mechanisms, which involve modification of the chromatin landscape through direct interaction of LNC1845 with a histone modifier. Among the different strategies used (KO, stop transcription, overexpression), the shRNA-mediated knock-down is the only one to specifically address the function of the transcript itself, as opposed to the active transcription. The result of this experiment led the authors to conclude that the LNC1845 RNA is functional, a conclusion that is reinforced by the demonstration of physical interaction between the LNC1845 RNA and WDR5, a component of MLL methyltransferase complexes. The result of the KD experiment is however puzzling as RNAi has been shown not to be the method of choice for targeting nuclear lncRNAs (Lennox et al. NAR 2016).

      We thank the Reviewer #2’s suggestion to add the correlation score of LNC1845-LHX8 pair and the Pearson Correlation of this pair is 0.3268. We have added the number to Figure 4C because which the expression correlation of LNC1845 and LHX8 was first mentioned. We have compared many other similar studies, shRNA knockdown has been widely used to target nuclear lncRNAs (Guttman et al. Nature 2011; Luo et al. Cell Stem Cell 2016; Subhash et al. Nucleic Acids Res. 2018; Li et al. Genome Res 2021), and the knockdown efficiency seemed to be feasible and acceptable to be used. The knockdown results are consistent with the deletion mutation and stop transcription approaches, all three showed that LNC1845 transcriptional expression is required for proper LHX8 expression in late PGCLCs.

      Overall, the functional investigation is convincing and strengthened by the inclusion of multiple clones for each approach, and by the convergence in the outcome of each individual approach. The depth of characterization is also remarkable. The analyses of the mechanisms at stake are somehow less solid, as there is less evidence demonstrating the involvement of the LNC1845 RNA and its interaction with WDR5.

      We have added more experimental evidence to strengthen the model especially the interaction of LNC1845 and WDR5. Apart from the RIP-qPCR results of WDR5 demonstrating the enrichment of LNC1845 by WDR5 pulldown (Figure S8D), we performed chromatin isolation by RNA purification (ChIRP) assay using antisense oligos along the entire LNC1845 transcript sequence. ChIRP results confirmed that WDR5 protein were enriched when anti-LNC1845 oligo probes were used to isolate the complex but not the controls without the probes or without overexpression of LNC1845 transcript (Response Figure 2). Taken together, the findings of both approaches support the model that LNC1845 directly interacts with WDR5 to modulate the H3K4me3 modification for LHX8 transcriptional activation. (Related to supplementary figure 8D and 8E.)

      Response Figure 2. LNC1845 binding for WDR5 was verified by CHIRP-western blot.

      Altogether, this study provides a convincing demonstration of the role of a lncRNA on the regulation of a nearby gene in the context of the germline. However, to have a better understanding of the functionality of lncRNA genes in general, it would be interesting to know whether other pairs of lncRNA-PC genes have been functionally investigated in this context, where no function for the lncRNA gene could be demonstrated. Negative results are highly informative and if so, these could be included in the manuscript.

      We appreciate Reviewer #2 suggestion to add other lncRNA-PC gene pairs results. In fact, we have analyzed and presented the results of another 2 pairs in figure 7D. LncRNAs LNC3346 and LNC15266 were also transcriptionally regulated by FOXP3, and they may regulate their neighbor genes TMCO1 and MPP5, as figure 7D showed. Our analysis showed that other lncRNA-PC gene pairs may also have the similar transcriptional regulation as LNC1845-LHX8 during germ cell development.

    1. Author Response

      Reviewer #1 (Public Review):

      Einarsson et al have produced CAGE data from EBV-immortalised lymphoblastoid cells from more than a hundred individuals from two genetically diverse African populations (YRI and LWK), and used it to study how sequence variation affects the activity of promoters at the level of expression variability and at the level of transcription start site usage within promoters across individuals.

      The dataset is very exciting, and the analyses were performed carefully and described well. The results show that promoters in the genome vary a lot with respect to their expression variability across individuals and that their level of variability is closely associated with their biological function and their sequence and architectural features. These results are often confirmatory - it is well established that promoters have different architectures associated with different sequence elements, different types of gene regulation and even differences across individual cells. In general, the multifarious observations boil down to one key distinction:

      • Regulated genes have promoters that look and act differently from those of housekeeping genes.

      We are pleased that the reviewer is as excited as we are about the unique dataset, the rigorous analyses performed, and the biological results. While we agree that housekeeping and regulated genes show apparent differences in terms of promoter variability, our analyses were not informed or guided by the expression of promoters/genes across cell types or tissues, but rather by their variability within the same cell type across individuals. It is indeed interesting that the same underlying mechanisms that cause stable expression across cell types also attenuates variability across individuals. And, similarly, promoters that display cell-type restricted (regulated) expression levels tend to also be more variable within the same cell type across individuals. While one may argue that these relationships are unsurprising they have, to the best of our knowledge, not been demonstrated before. Of note, however, while most low variable promoters regulate housekeeping genes and highly variable ones regulate regulatory genes, this is not always the case.

      While this is unsurprising, the authors then proceed to analyse other underlying differences between low variability (mostly housekeeping) and high variability (overwhelmingly regulated) promoters. Several observations have alternative and sometimes more elegant explanations if some of the previously worked out properties of housekeeping vs regulated promoters are taken into consideration:

      • The authors are keen to interpret the architectural features of ubiquitously expressed (housekeeping) promoters as selected for robustness against mutations in ensuring stable and steady expression levels.

      However, there are some known facts about both housekeeping and regulated promoters that make alternative interpretations plausible.

      • When discussing broad promoters, the authors disregard the well known fact that the most commonly used transcription start positions are those with YR sequence at (-1,+1) position. Any mutation within the span of broad promoter cluster that removes an existing YR or introduces a new one has the capacity to change both the TSS distribution pattern and overall level of expression of that promoter - but only slightly. This way, broad promoters can be viewed as adaptation not for robustness but for ability to take many mutations with small effect size that will drive any positive selection smoothly across a changing fitness landscape.

      We thank the reviewer for these remarks. We fully agree with the scenario described by the reviewer, that disruptions of TSSs may have different consequences depending on whether this would be in a broad promoter with multiple YR sequences or within a sharp promoter. However, we argue that the observation that promoters containing such flexible TSSs are not affected much upon genetic perturbations reveals robustness. Per definition, robustness is the ability to produce a persistent phenotype (in our case the molecular phenotype of promoter expression) even in perturbed conditions (e.g. under the influence of natural genetic variation affecting TSS usage). The very fact that TSS disruptions will only have small effect sizes in certain promoters but not in others, tells us that the unaffected or only mildly affected promoters have architectural properties that minimize the effect sizes of these disruptions and thereby cause robustness in overall promoter expression. Hence, we do not see our explanations and those of the reviewer to contradict each other.

      • Indeed, the main property of low variability promoters is that there isn't a single nucleotide change (either substitution or indel) that can substantially change their activity. (In that they are clearly different from e.g. TATA-dependent promoters, where one change can abolish TBP binding or deprive the promoter of a YR dinucleotide at a suitable distance from the TATA box.) This is achieved by their dependence on broad and weak sequence signatures such as GC composition and nucleosome positioning signal. However, most such genes are not known to have a strict requirement for dosage control. On the contrary, dosage seems to be much more critical for the functional classes that in the authors' analysis show variable expression.

      • Whether it is a removal of YR dinucleotide, introduction of a new one, or the change of nucleosome positioning, it seems that the transcription level from housekeeping, low variability promoters is unaffected, or at least affected mildly enough that it is not within the statistical power of the CAGE data across different individuals to detect the difference. Rather than robustness, it can be interpreted as competition - the architecture recruits preinitiation complex at a fairly constant rate, and it is the different YR positions that "compete" for serving as transcription initiation position, with the CAGE signal reflecting the relative effectiveness of each position in that competition. If one of the YR dinucleotides is removed, often the other, neighbouring ones will be used instead. The same might happen for potential multiple nucleosome positioning signals - if one becomes less efficient at stopping a nucleosome, another will be used more often.

      • The fact that decomposed parts of housekeeping promoters add up to approximately the same expression level across individuals even when they are uncorrelated point that they might actually be anticorrelated - indeed, the UFSP2 plot in Figure 4E looks like the two decomposed promoters are anticorrelated. That would argue against the independence of the decomposed promoters - indeed it may again point to "competition" where the decrease in use of one will simply shift most initiation events to the other.

      We thank the reviewer for these thoughts. The reviewer has made an excellent observation regarding the correlation between decomposed promoters within low variable promoters. While decomposed promoter pairs of highly variable promoters frequently have correlated expression levels, low variable multi-modal promoters often contain decomposed promoters that have low or even negative expression correlation across individuals. We agree that negative correlation points to the possibility that these decomposed promoters are competing for the transcriptional machinery. Indeed, nucleosome positioning analysis (described below), suggests the existence of diverse configurations of chromatin accessibilities within low variable multi-modal promoters with low or negatively correlated decomposed promoters. This may suggest a competition between the usage of their decomposed promoters. We have revised the manuscript to better reflect this aspect, discussed the potential for YR shifts encoded within the promoter sequence, and also toned down the independence of decomposed promoters. However, regardless of whether decomposed promoters are independent (low correlation) or competing for the transcriptional machinery (negative correlation), we do not agree that this violates our conclusion of robustness. A competition between decomposed promoters within a low variable multi-modal promoter would favor the strongest decomposed promoter, and if the strongest decomposed promoter is affected by genetic perturbation (for instance though disruption of YRs or proximal TF binding) this will affect the competition and shift the dominant usage to another decomposed promoter, as suggested by the frQTL analysis, leading to minimal change in total promoter expression, i.e. a robust molecular phenotype.

      • In general, not everything is a result of direct evolutionary selection, and that is what should have clear landmarks of purifying selection. On the contrary, promoters, especially housekeeping promoters, have vastly different nucleotide and dinucleotide compositions across Metazoa, both at large and at relatively short distances, which means they can undergo concerted evolution as a group, which means they should be "robust" to mutations in a way that allows them to change much more and more rapidly than some other promoter architectures - especially TATA-dependent architectures whose key elements and spacing between them haven't substantially changed for more than a billion years, and possibly longer.

      We fully agree with the reviewer and have revised the manuscript to remove the evolutionary aspect of robustness. We believe our results are better interpreted with regards to the existence of inherent mechanisms of low variable multi-modal promoters to provide regulatory robustness. Indeed, the vastly different sequence composition of housekeeping promoters between species makes these properties even more interesting. We do not believe that the robustness for perturbations need to be encoded by a specific sequence signature. Rather, we observe that multimodal promoters with low variability require broad initiation regions and a flexibility in the usage of TSSs. This fits well with observations in flies (Schor et al, 2017, DOI: 10.1038/ng.3791) of shifts in the shape of the promoter, which we believe to reflect shifts in decomposed promoter usage, upon genetic perturbation.

      • While housekeeping promoters are broad but mostly not among the broadest, regulated promoters can be either broad or narrow. This is also known - while narrow promoters are overrepresented for tissue-specific and non-CGI promoters, promoters of Polycomb-bound developmental genes are often broad and have large CpG islands; the latter may account for some of the broadest CAGE clusters observed in the data. It would be an interesting finding if both TATA-dependent and developmental promoters were found to be variable across individuals in a non-trivial way (the trivial way being the variability due to larger dynamic range of their expression - e.g. the expression of SIX3 in many cell types is basically zero, while the dynamic range of RPL26L1 is very limited) - this should be checked by analysing them separately.

      We agree that an analysis of the variability of developmental, Polycomb-bound promoters would be very interesting and thank the reviewer for ideas for a follow-up study. We do not feel that LCLs are the best model system for analyzing developmental promoters and therefore argue that this is out of scope in this study.

      • While broad promoters can be decomposed into subclusters with differential expression across individuals, the authors do not seem to allow for the decomposition of intertwined TSS positions within the cluster, but rather postulate hard boundaries between subclusters. This is different from e.g. overlapping maternal and zygotic promoter use (Haberle et al Nature 2014), where the distribution of the used TSS positions is different but the clusters can overlap.

      This is correct, we do not allow for overlapping decomposed promoters. We agree that the work by Haberle et al (2014, DOI: 10.1038/nature12974) on switches between maternal and zygotic TSSs is an excellent demonstration of how intertwined promoters can occur and be of biological relevance. Our analysis is based on the observation that low variable promoters are often multimodal and can not be well-explained by simply the width of promoters. This led us to decompose multimodal promoters into their sub-peak constituents. We believe that the frQTL analysis and the new decomposed promoter QTL (dprQTL) analysis clearly demonstrate the value of our approach. While it would indeed be interesting to see the results of an alternative approach for decomposition, we feel this is out of scope in this study but acknowledge that additional determinants of promoter variability may possibly be discovered using alternative strategies.

      • Both Dreos et al (PLOS Comp Biol 2016) and Haberle et al. (2014) show that one stable element of a broad promoter is the positioning signal of its first downstream nucleosome. As seen very convincingly in both Drosophila and zebrafish, the dominant TSS position of the broad promoter is highly predictive of the position of first downstream nucleosome and its underlying positioning sequence, and the most plausible interpretation is that there is an "optimal" distance from nucleosome for transcriptional initiation, resulting in the dominant (i.e. most often used) TSS position. In mammals, broad promoters are even broader than in those two species and might have multiple nucleosome positioning signals they can use. In such cases, mutations in one of the nucleosome positioning signals, or indels changing the spacing between the nucleosome and the part of sequence that contains TSS, might lead to differential use of one nucleosome signal vs other. This would be compatible with the authors' observations in low variability promoters that decomposed promoters are used to different extents in different individuals.

      We thank the reviewer for this excellent suggestion. In the revised manuscript, we have analyzed both the preference of the distance between the dominant TSS and the downstream (+1) nucleosome and the positional fuzzyness of that nucleosome. We observe a clear separation between low variable multimodal promoters with highly correlated decomposed promoters and those with low correlated decomposed promoters. Interestingly, those with low correlated decomposed promoters show a much less restrictive +1 nucleosome positioning with higher fuzziness, in contrast to what we would expect from broad CGI promoters having a reported fixed +1 nucleosome positioning. While this may be unexpected, it fits well with a model on how a flexible nucleosome positioning architecture can allow differential usage of decomposed promoters. Our results suggest that an array of underlying nucleosome positioning configurations exists for these promoters across single cells, which causes fuzzy nucleosome positioning and may allow for a competition between initiation sites, which provide robustness through their compensatory usage. Interestingly, we find that these results are consistent when analyzing the relationship between transcription initiation and nucleosome positioning within a single individual. This suggests that there is an inherent mechanism of flexibility in TSS usage in these robust promoters even when there is no differential influence of genetic variants. However, to which extent TSS preference is affected by nucleosome positioning or whether nucleosome positioning reflects TSS usage remains unclear. We believe these results further strengthen our general conclusions and thank the reviewer for this constructive suggestion of new analysis.

      • If we were to look for sources of difference other than the actual sequence architecture, some differences between regulated and unregulated promoters can be explained by the key difference: the regulation of regulated genes comes from outside the core promoter; the regulation of housekeeping genes is largely dependent on the intrinsic activity of the core promoter itself. This way, for example, in the absence of a causative variant in the promoter itself, the observed variability in the SIX3 promoter might not be encoded in the promoter itself - instead, enhancer responsiveness might be encoded in the promoter, and the variability itself could be due an enhancer that can be hundreds of kilobases away. Such a scenario combined with broad promoter would likely result in decomposed promoters that are highly correlated across individuals - because they are both externally controlled by the same regulatory inputs.

      These thoughts are very much in line with our own ideas on how enhancers may influence expression variation. Here, we aimed to investigate variability from a promoter perspective and we are confident that we observe several promoter features associated with low variability. Describing these, we agree that it is important to speculate also on the added contributions by distal elements. We now acknowledge the likely added contribution by enhancers in the Discussion:

      “The promoter sequence may also encode a promoter’s intrinsic enhancer responsiveness (Arnold et al., 2017), which may influence its expression variability. Although current data cannot distinguish between direct or secondary effects, an increased variability mediated via enhancers is supported by a higher dependency on enhancer-promoter interactions for cell-type specific genes compared to housekeeping genes (Furlong and Levine, 2018; Schoenfelder and Fraser, 2019). However, compatibility differences between human promoter classes and enhancers only result in subtle effects in vitro (Bergman et al., 2022), suggesting that measurable promoter variability is likely a result of both intrinsic promoter variability and additive or synergistic contributions from enhancers. Directly modeling the influence and context-dependency of enhancers on promoter variability would therefore be important to further characterize regulatory features that may amplify gene expression variability.”

      Reviewer #2 (Public Review):

      This manuscript by Einarsson and colleagues in the Andersson lab examined how genetic variability across a population impacts both gene expression and promoter architecture in a human population. The authors generate new CAGE data in 108 lymphoblastoid cell lines (LCLs). The authors' analysis is focused on defining how DNA sequence and promoter architecture correlate with population-variation in expression across this cohort. In general, there is a lot that I like about this manuscript: The dataset will be an extremely valuable resource for the genomics community. Furthermore, the biological findings are often thoughtful and potentially interesting and significant for the community. The analysis is generally very strong and is clearly conducted by a lab that has a lot of expertise in this area. My main concerns are centered around the often unwarranted implication that DNA sequence or promoter features cause differences in variation at different genes.

      We are pleased that the reviewer is as excited as we are about the unique dataset, the rigorous analyses performed, and the biological results. In our revised manuscript we have followed the recommendations by the reviewer and:

      ● Toned down implied causal relationships and added additional interpretations to our results, including YR positional preferences

      ● Performed additional analyses on nucleosome positioning of low variable promoters, as well as genetic association testing for decomposed promoter expression

      In all, we believe these revisions substantially improved our manuscript and even strengthened our previous conclusions.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript "Centrally expressed Cav3.2 T-type calcium channel is critical for the initiation and maintenance of neuropathic pain" identifies a subset of parvalbumin-expressing GABAergic neurons in the anterior pretectum (APT) that co-express Cav2.3 T-type calcium channels. The firing frequency and burst patterns are potentiated in these neurons following spared nerve injury (SNI) and the development of neuropathic pain. Deletion of the channels in these cells reduced both the development and maintenance of mechanical and cold allodynia. Studies show nice co-expression of the PV and GFP in the Cav2.3.2eGFP-flox KI mouse line.

      Multi-unit recordings from PV-Cre X Ai32 mice show that PV neurons in the APT are fast-spiking and that the mean firing rate and frequency of spikes in bursts are potentiated in SNI animals. The graphs in Fig. 2, panel F show compiled data of 18-20 cells from 6-8 animals depending on the treatment. The statistical design for the in vivo experiments (and actually all of the studies) are not clearly stated with degrees of freedom. It is important to know if recordings from a single animal are considered independent observations, and if so, what the rationale for that is. This information should be included in the Quantification and Statistical Analysis section. In addition, it would be interesting to determine if T-type calcium channel blockers can reverse this behavior in these recordings.

      We considered each unit as an independent observation. We did so since the number of recorded PV+-units per animal (identified with the PINP method) was small and varied greatly between animals, from 2 to 6 units. We are not aware of statistical methods using a nested design for multiple cells in animals that could be used in such condition.

      Since the measured variables did not follow normal distributions, we performed unpaired comparison with the Wilcoxon sum rank test. This is now stated in the section ‘quantification and statistical analysis’ (page 19, line 683). P-values are now included in the figures and the result section when appropriate.

      In vitro electrophysiological studies show that the PV-expressing APT neurons exhibit fast-spiking to depolarization and single-cell RT-PCR shows that Cav3.2 is expressed in APT neurons that also express GABA. These cells show an after-hyperpolarization burst of APs that is reduced by blockers of Cav3.2 channels. There are no statistics displayed on panels C-E in Fig. 3, although they are reported in the text. Again, the test used and degrees of freedom, etc. should also be reported as it allows for evaluation of the experimental design.

      We apologize for the lack of statistics in Fig. 3 (now Fig. 4). Statistics are now clearly presented on each figure panel and the statistical tests are stated in the figure legends and in the results (page 4, line 144-159).

      As it is now stated in the “Quantification and Analysis section” (page 19, line 682), each neuron was considered as an independent observation since 1 to 3 neurons were recorded per mouse. The number of mice and the mean number of neurons per mouse are indicated in the data-set for each experimental condition in order to allow for a clear evaluation of the experimental design. Note that in the experiments with application of T-channel antagonist, only one neuron was recorded per slice. This is now specified in the Method Details (page 17, line 579).

      It is also noted in the Discussion (lines 185-186) that "Our in vitro data indicate that 92% of APT-PV+ neurons are able to discharge bursts of action potentials at high frequency underpinned by a large transient depolarization due to the activation of T channels." It would be more clear to refer to the rebound as the figure also shows the fast-spiking properties due to depolarization as well as the transient depolarization due to the rebound but only an effect of the Cav2.3 on the rebound.

      We agree and have changed the sentence accordingly (page 5, line 202).

      Behavioral studies of mechanical and cold allodynia in male and female naïve and SNI-treated KI and KO mice were performed. These results show a clear contribution of the Cav3.2 channels in APT in both the development and maintenance of neuropathic pain. Again, the statistical design is not clearly defined and it is extremely difficult to resolve what comparisons are delineated in panels B-E of Fig. 4.

      We fully agree with the reviewer that the rationale for the choice of statistical tests used to analyze the behavioral data was lacking. We have rewritten the relevant paragraph in the Quantification and Analysis section (page 19, lines 699-714). The statistical results presented in the Fig 4 and its supplemental figure (now Fig 5 and Figure 5 – Figure supplement 1) are now clearly stated in the legends.

      Reviewer #3 (Public Review):

      The authors used state-of-the-art techniques to investigate the role of centrally located (GABAergic APT neurons) CaV3.2 isoform of T-channels in an animal model of neuropathic pain using speared nerve injury model. This is generally an excellent and very rigorous study. The data is very compelling and it is likely going to have a major impact in the field of ion channels and pain transmission. The data presentation is superb and major conclusions are highly justified. Major strengths include the use of powerful complementary techniques such as molecular (single-cell PCR), mouse genetics, and pain testing in vivo, as well as sophisticated ex vivo (slice physiology) and in vivo recordings (burst analysis using tetrodes). This study may explain recent clinical studies that failed to show the efficacy of peripherally acting Cav3.2 channel blockers in patients with neuropathic pain. Hence, this study has the potential to change the focus from peripheral to supraspinal Cav3.2 channels in various pain pathologies.

      Some moderate weaknesses are identified and should be addressed:

      1) The data showing the effect of T-channel deletion on the excitability of GABAergic neurons of APT is very convincing. However, what is missing is a discussion of how changes in the excitability of inhibitory APT neurons impact the circuitry that is involved. Without knowing the circuitry involved, one could speculate that blocking inhibitory drive may do just the opposite effect of what is proposed and increase hyperalgesia.

      We agree with the reviewer that discussing this issue is essential and it has now been added (page 6, lines 270-295).

      2) Methods should clearly state if any experiments were done in a blinded fashion.

      Behavioral experiments were performed blind. This has been added in the method section (page 18, line 624). For in vivo electrophysiological experiments, we cannot say that we performed blind experiments (although we tried). Indeed, under anesthesia, the forelimb of SNI animals presents a slight but observable withdrawal.

      3) There is no mention anywhere of how was selective Cav3.2 knock-out achieved, nor how was this assessed. It would be very helpful if authors could perform recordings of T-channel amplitudes in sham animals, animals after SNI and after selective knock-out in the SNI group.

      The efficiency of the Cav3.2 deletion after Cre virus injection was assessed by immunolabeling of GFP in APT slices. As shown in Figure 5 – Figure supplement 1, we checked that unilateral injection of AAV8-hSyn-Cre-mCherry virus induced a drastic reduction in the number of GFP+ neurons when compared to the non-injected hemisphere. The absence of Cav3.2 expression in Cre injected APTs was systematically checked in each mouse at the end of the behavioral tests (Figure 5A). This is now added in the Method Details (page 18, line 655).

      4) It should be discussed that global Cav3.2 animals had only minimal neuropathic pain phenotype (Choi et al., 2007).

      This point is now discussed (page 6, line 251).

    1. Author Response

      Reviewer #3 (Public Review):

      Dingus et al. have developed an innovative and powerful approach for improving the intracellular stability of nanobodies. Nanobodies are single chain antibodies that are typically generated in select species such as llamas or alpacas. Because nanobodies are secreted and are present in general in the extracellular environment, they often become unstable when expressed in the reduced intracellular environment. Dingus et al. investigated 75 nanobodies from the Protein Data Bank and found that 42 were unstable when expressed intracellularly. In order to improve stability of these nanobodies, they first determined consensus residues that were present within the framework region, which does not include the CDR regions, in over 80% of the stable nanobodies. Mutating residues within the framework of unstable nanobodies to match consensus residues in the stable nanobodies stabilized 26 of 42 nanobodies. Mutating consensus unstable residues stabilized another 11. Thus 37/42 unstable nanobodies were stabilized using this mutational approach. Further experiments provided evidence that some of the stabilized nanobodies still had some affinity for their targets. Furthermore, one stabilized nanobody was stable when expressed in the retina in vivo and 3 of 5 were stable when expressed in bacteria.

      1) This study provides a straightforward approach to improving the intracellular stability of nanobodies that could prove to be very useful for solving a common and vexing problem.

      Thanks!

      2) From the data provided, it was difficult to determine whether the binding affinity of the mutated nanobodies had been diminished by the mutations that increased stability, and if so, by how much. Furthermore, target binding affinity was assessed for just 5 nanobodies, which calls into question whether this strategy will be useful.

      It is the case that we are unable to guarantee that any nanobody stabilized by our consensus-based approach will retain full target-binding affinity. It is additionally not guaranteed that a given nanobody will be able to bind its target in cells in the absence of any mutagenesis, as paratope structure may be influenced/compromised in the intracellular environment. We are additionally limited in what we can test intracellularly, as the majority of current nanobodies target extracellular factors that cannot be effectively expressed intracellularly. What we provide is a rationale for limited impact on target binding via partial consensus mutagenesis, which excludes highly variable framework positions, most likely to contribute directly to binding, from mutagenesis. While our approach to generalizable intracellular stabilization may not be perfect for every nanobody, we believe it is likely to be a simple and useful approach in a variety of cases, and likely the majority of cases.

      3) Ultimately, the goal of expressing most nanobodies intracellularly is to bind to endogenous targets. It is difficult to assess how useful the stabilization strategy will be since it was not determined whether any of the stabilized nanobodies could bind their endogenous targets intracellularly.

      We are limited in the number of intracellular targets we are currently able to test, as most current nanobodies target extracellular antigens. Endogenous intracellular targets are even more limited. However, we agree that targeting endogenous targets is ultimately the goal. We have included an in vivo experiment against the endogenous target GFAP in our revised manuscript, where we show that binding is preserved following mutagenesis (Figure 6).

    1. Author Response

      Reviewer #1 (Public Review):

      Kohler and Murray present high-throughput image-based measurements of how low-copy F plasmids move (segregate) inside E. coli cell. This active segregation ensures that each daughter cell inherit equal share of the plasmids. Previous work by different labs has shown that faithful F-plasmid segregation (as well as segregation of many other low-copy plasmids, segregation of chromosomes in many bacterial species and segregation of come supramolecular complexes) require ParA and ParB proteins (or proteins similar to them) and is achieved by an active transport mechanism. ParB is known to bind to the cargo (plasmid) and ParA forms a dimer upon ATP binding that binds to DNA (chromosome) non-specifically and also can bind to ParB (associated with cargo). After ATP hydrolysis (stimulated by the interaction with ParB), ParA dimer dissociates to monomers and from ParB and the chromosome. While different mechanisms of the ParA-dependent active transport had been proposed, recently two mechanisms become most popular - one based on the elastic dynamics of the chromatin (Lim et al. eLife 2014, Surovtsev PNAS 2016, Hu et al Biophys.J 2017, Schumaher Dev.Cell 2017) and the other based on a theoretically-derived "chemophoretic" force (Sugawara & Kaneko Biophysics 2011, Walter et al. Phys.Rev.Lett. 2017).

      It is a minor comment, but we would like to point out that we do not consider these two model types as alternatives but rather as models with different levels of coarse-graining. Our interest is in the molecular-level (stochastic) models (Lim et al. eLife 2014, Surovtsev PNAS 2016, Hu et al PNAS 2015, Hu et al Biophys.J 2017, Schumacher Dev.Cell 2017).

      The authors start by following motion of F plasmid with one or two plasmids per cell and by analyzing plasmid spatial distribution, plasmid displacement (referred to as velocity) as a function of their relative position, and autocorrelations of the position and the displacement. They concluded that these metrics are consistent with 'true positioning' (i.e. average displacement is biased toward the target position - center for one plasmid and 1/4 and 3/4 positions for two plasmids ) but not with 'approximate positioning' (i.e. when plasmid moves around target position, for example, in near-oscillatory fashion). This 'true positioning' can be described as a particle moving on the over-dampened spring. They reproduce this behavior by expanding the previous model for 'DNA-relay' mechanism (Lim et al. eLife 2014, Surovtsev PNAS 2016), in which plasmid is actively moved by the elastic force from the chromosome and ParA serves to transmit this force from the chromosome to the plasmid. Now, the authors explicitly consider in the model that the chromosome-bound ParA can diffuse (which the authors refer as 'hopping') and this allows the model to achieve 'true plasmid positioning' for some combination of model parameters in addition to oscillatory dynamics reported in the original paper (Surovtsev PNAS 2016).

      Based on their computational model, the authors proposed that two parameters, diffusion scale of ParA = 2(2Dh/kd)1/2/L (typical length diffused by ParA before dissociation) and ratio of ParB-dependent and independent hydrolysis rates = kh/kd are key control parameters defining what qualitative behavior is observed - random diffusion, near-oscillatory behavior, or overdamped spring ('true positioning'). They vary this two parameters ~30- fold and ~200-fold range by changing Dh and kh respectively, to illustrate how dynamics of the system changes between these 3 modes of motion. While these parameters clearly play important role, the drawback is that the authors did not put either theoretical reasoning why these parameters are truly governing or showed it by varying other model parameters (kh, number of ParA NParA, spring constant of chromosome k, diffusion coefficient of the plasmid Dp) to show that only these combinations define the type of the system behavior. The authors qualitative analysis on importance of relies on the steady state solution for the diffusion equation for ParA. It is really unfortunate that no ParA distribution was measured simultaneously with the plasmid motion, as this would allow to compare experimental ParA profiles to expected quasi-steady-state solutions.

      We spend almost an entire section and a figure explaining the theoretical reasoning behind the identification of the $\lambda=s/(L/2n)$ as an important system parameter (section “Hopping of ParA-ATP on the nucleoid as an explanation of regular positioning” and Figure 2) and predicted that regular positioning could only occur for $\lambda>1$. This was confirmed by parameter sweeps for the cases of 1 (Figure 3I) and multiple plasmids (Figure 5-figure supplement 1), indicating that $\lambda$ is indeed an important system parameter and that our conceptual understanding of this aspect of the system is correct. This point has now been made clearer.

      However, we agree that the reasoning for $\epsilon$ (varied through the hydrolysis rate $k_h$) was not clear. It was chosen to allow us to modulate the ParA concentration at the plasmid compared to elsewhere, motivated by the differences between different ParABS systems. We originally had also considered a third quantity related to the number of nucleoid-bound ParA but we found that this had little effect on the nature of the dynamics. All three quantities describe how the timescale of a reaction/process (ParA hopping/diffusion across the nucleoid, ParB induced hydrolsysis, ParA association to the nucleoid) compares to the timescale of basal hydrolysis, which we use as a reference timescale.

      We have now made this clearer as well as adding supplementary figures showing the effect of varying other system parameters at several locations in the phase diagram (Figure 3-figure supplement 3 and 4). These sweeps justify our identification of $\epsilon$ and $\lambda$ as a useful/important set of quantities for determining the dynamics of the system.

      Additionally, we now add example kymographs showing the ParA distribution (Figure 3-figure supplement 2C).

      The authors also show by simulations that overdamped spring dynamics can transition into oscillatory behavior when decreases, for example by cell growth. Indeed, they observed more oscillatory behavior when they compared single-plasmid dynamics in the longer cells compared to the shorter cells. This was not the case in double-plasmid cells, in eprfect agreement with their analysis. They also calculated ATP consumption in the model and concluded that the system operates close but below (perhaps, "above" should be used as it refers to bigger ) the threshold to oscillatory regime which minimize ATP consumption. While ATP consumption analysis is very intriguing, this statement (Abstract Ln24-25) seems at odds with the authors own analysis that another ParA-dependent plasmid system, pB171, operates mostly in oscillatory regime, and it is actually for this regime the authors' analysis suggest minimal ATP-consumption (Fig. 8).

      To clarify, we found that pB171 (which in our hands has a copy number of 2-3 in the SR1 reduced-copy-number strain) is only clearly oscillatory in cells with a single plasmid (and only mildly so in cells with two plasmids). Otherwise, it behaves very similarly to F plasmid. We therefore believe that these two distantly related ParABS systems exhibit, overall, similar dynamics and differ only in how close the systems are to the threshold of oscillatory instability. This was not clear as we did not specify the copy number of pB171. We now provide this in Figure 7–figure supplement 1.

      We refer to these systems as lying just below, rather than above, the threshold of the oscillatory instability because, on average, plasmids do not oscillate but only do so in cells with the lowest plasmid concentration.

      I think the real strength of the paper is that it can potentially to show that if one considers that the intracellular cargo can be moved by the fluctuating chromosome via ParA-mediated attachments, then various dynamics can be achieved depending on combinations of several control parameters (plasmid diffusion coefficient, ParA diffusion coefficient, rate of hydrolysis and so on) including previously reported 'oscillations' (Surovtsev PNAS 2016), 'local excursions' (Hu et al Biophys.J 2017) and 'true positioning' (Schumaher Dev.Cell 2017). The main drawback (in this reviewer opinion) that this is obscured by the current presentation and discussion of this work and previous modelling work on ParA-dependent systems. For example, instead of using "unifying" potential of the presented model, yet another name 'relay and hopping' is used in addition to previously used 'DNA-relay', 'Brownian ratchet', 'Flux-based positioning', …

      In the abstract and discussion, we already refer to developing a “unified” model (p1 L21, p15 L22 of the original manuscript) and in the discussion we explain how our model contains other models as limiting cases. But we agree with this recommendation - the unifying nature of our model is its main strength. We now emphasise this more.

      Regarding the model name, we felt obliged to refer to the previous named models (DNA-relay and Brownian ratchet) and simply gave our model a name to avoid confusion when making comparisons. We have now removed almost all mention of ‘hopping and relay’ and just refer to ‘our model’. However, our gitlab repository with the code must have a name and therefore is still called ‘Hopping and relay’ and so the same term is used in Table 3.

      … and it appears that the presented model is an alternative to these previously published work. And only in model description (in Methods section) one can find that the "... model is an extension of the previous DNA-relay model (Surovtsev et al., 2016a) that incorporates hopping and basal hydrolysis of ParA and uses analytic expressions for the fluctuations rather than a second order approximation"(p.17, ln15-17).

      We are sorry that this reviewer felt that the fact that our model is an extension of DNA relay is hidden in the methods. However, we wrote in the main text:

      “Motivated by the previous discussion, we decided to develop our own minimal molecular model (‘hopping and relay’) of ParABS positioning, taking the DNA relay model as a starting point … The original scheme is as follows… We supplemented this scheme with two additional components: diffusion (hopping) of DNA-bound ParA-ATP dimers across the nucleoid (with diffusion coefficient Dh, where the subscript indicates diffusion of the home position) and plasmid-independent ATP hydrolysis and dissociation (with rate kd). See Material and Methods for further details of the model. “

      We now make this clearer.

      However, we would argue that as models of the same system, there are naturally overlaps and the models of Hu et al and Schumacher et al could also be thought of as extensions of the DNA relay model.

      While it is of course the authors right to decide how to name their model, it should be explicitly clear to the reader what is a real conceptual difference between presented and previous models from the abstract, introduction and discussion section of the paper, not from the "fine-print" details in the supplementary materials.

      The main conceptual difference is that we have identified the importance of having a finite diffusive length scale for ParA diffusion/hopping on the nucleoid. This allows both oscillations and regular positioning to occur for biologically relevant parameter values and reproduces the length dependent transition from mid-cell positioning to confined oscillations that we observe for F plasmid. The DNA relay model does not have this behaviour as the ParA diffusive length scale in zero while it is infinite in the models of Ietswaart et al 2014 and Schumacher et al 2017. The model of Hu et al 2017 does have a finite length scale but the authors appear not to have realised its importance and never discovered the regular positioning regime at \lambda >1. While we make these points in the discussion in the context of Figure 8A, where we compare our model to the others, we agree with this reviewer that we should have been more explicit in the abstract and introduction. We have now corrected this.

      This would allow to avoid unnecessary confusion (especially for the readers not directly involved into the modelling of ParA/B system) and clarify that all these models rely on the elastic behavior of fluctuating chromosome to drive active transport of the cargo. This reviewer believes that more explicit discussion on the models (one from the authors and previously published) differences and similarities will help with our understanding of how ParA-dependent system operate. This discussion should also include works on PomXYZ system, in which it was shown that similar dynamic system can lead to specific positioning within the cell (Schumaher Dev.Cell 2017, Kober et al. Biophys.J 2019). This will may it explicit that the models results have direct impact beyond the ParA-dependent plasmid segregation.

      To further clarify the differences between the models (beyond the second and third sections of the main text and the discussion), we have now added a section to the methods and a new table (Table 3). We have also included the mentioned PomXYZ model. However, we would like this was not the first stochastic model to have ‘true’ positioning as this reviewer cites above. Though they did not include the mechanism of force generation, the model of Ietswaart et al 2014 produces regularly positioned plasmids and is referenced repeatedly in Schumacher et al. 2017.

      I think that expanded parameter analysis, and explicit model comparison/discussion will make the contribution of this work to the field more clear and with the potential to advance our general understanding of how the same underlying mechanism can lead to various modes of intracellular dynamics and patterning depending on parameters combination.

      Reviewer #2 (Public Review):

      The work presented in this manuscript details an analysis of the partitioning of low copy plasmids under the control of the ParABS system in bacteria. Using a high throughput imaging set up they were able to track the dynamics of the partition complex of one to a few plasmids over many cell cycles. The work provides an impressive amount of quantitative data for this chemo-mechanical system. Using this data, the paper sought to clarify whether the dynamics of plasmids is due to regular positioning or noisy oscillations around a mean position. They supplement their experimental work with an intuitive model that combines elements of previous modelling efforts. Their model relies on diffusion of the ParA substrate on the nucleoid with the dynamics of the ParB partition complex being driven by the underlying elastic force due to the nucleoid on which the substrate is tethered. Their model dynamics depend on two parameters, the ratio of the length over which the substrate can explore to the characteristic length of the space and the ratio of stimulated to non-stimulated hydrolysis rates of the substrate. If the length ratio is large, ParA can fully explore the space before interacting with the ParB complex leading to balanced fluxes and regular positioning. If it gets reduced, for example by lengthening the cell, oscillations can emerge as fluxes of substrates become imbalanced and a net force can pull the partition complex.

      Strengths:

      Given the large amount of data, the observations unambiguously show that one particular ParABS system under the conditions studied is carrying out regular positioning of plasmids. The model synthesizes prior work into a nice intuitive picture. These model parameters can be fit to the data leading to estimates of molecular kinetic parameters that are reasonable and in line with other observations. Lining up the experimental observations with the phase space of the model suggests that the system is poised on the edge of oscillations, allowing for the system to have regular positioning with low resource consumption.

      Weaknesses:

      However, despite the correspondence of the simulated results with the experimental findings, other explanations are not completely ruled out. The paper emphasizes that ParA diffusion/hopping on the nucleoid is essential for the establishment of regular positioning and that without it, only oscillations were possible. Prior simulation efforts, that the paper cites, which include ParA diffusion and mixing in the cytosol but no diffusion on the nucleoid have shown that regular positioning is possible and that oscillations could get triggered as the system lengthened. Thus ParA hopping is not a necessity for regular positioning (as claimed in the paper), but very well might be needed for the given kinetic parameters of the system studied here.

      We now comment on this result. In short, we believe that the mentioned model/regime is not relevant due to stochastic effects. We are not able to produce, with biological relevant parameters, regular positioning without ParA hopping.

      The paper also presents experimental results for a second ParABS system (pB171) that is more likely to show oscillations. They attribute the greater likelihood of oscillations for pB1717 being due to ParA exploring a smaller space than the F plasmid system that showed regular positioning. This is pure conjecture and the paper does not provide any evidence that this is the reason. Thus it is hard to conclude if oscillations may not be due to other factors.

      We do not explicitly make that claim. We did have a point in the phase diagram of Figure 8A representing pB171 with a lower value of lambda than F plasmid and stated “The location of pB171 is an estimate based on a qualitative comparison of its dynamics”. We agree this was unclear.

      We now indicate the region that has oscillations with roughly the same period as single plasmids of pB171. We also make it clear that we speculate, but have not shown, that the length scale of ParA hopping is smaller than for F plasmid.

      An important point here is that we can explain both oscillations and regular positioning in the same model with the same kinetic parameters, the regimes being determined by the cell length and plasmid number in a manner consistent with experimental observations.

    1. Author Response

      Reviewer #1 (Public Review):

      This work sheds light on the adverse effects of Bacillus thuringiensis, a strong pathogenic bacteria used as a microbial pesticide to kill lepidopteran larvae that threaten crops, on gut homeostasis of non-susceptible organisms. By using the Drosophila melanogaster as a non-susceptible organism model, this paper reveals the mechanisms by which the bacteria disrupt gut homeostasis. Authors combined the use of different genetic tools and Western blot experiments to successfully demonstrate that bacterial protoxins are released and activated throughout the fly gut after ingestion and influence intestinal stem cell proliferation and intestinal cell differentiation. This phenomenon relies on the interaction of activated protoxins with specific components of adherens junctions within the intestinal epithelium. Due to conserved mechanisms governing intestinal cell differentiation, this work could be the starting point for further studies in mammals.

      The conclusions proposed by the authors are in general well supported by the data. However, some improvements in data representation, as well as additional key control experiments, would be needed to further reinforce some key points of the paper.

      We thank reviewer1 for her appreciation of the work and in depth analysis of the data. We agree with all her comments and believe the suggestions significantly improved the manuscript.

      1) Figure 1 and others: Several graphs in the manuscript show the number of cells/20000µm2. How is the shape of the gut in the different conditions studied in this manuscript? The gut shape (shrunk gut versus normal gut for example) could influence the number of cells seen in a small area. For example, the number of total cells quantified in a small area (here 20000µm2) of a shrunk gut can be increased while their size decrease. As a result, the quantification of a specific cell type in a small region (here 20000µm2) can be biased and not represent the real number of cells present in the whole posterior part of the R4 region. Would it make sense to calculate a ratio "number of X cells/number of DAPI positive cells per 20000µm2"?

      We provided a suitable answer in the "Essential Revisions point 1" corresponding to this reviewer's concern. To summarize, we have now added whole posterior midgut images in the different conditions to highlight the intestinal morphology (Figure 1-figure supplement 1A). The whole gut morphology was not affected by the different challenges we performed. Indeed, we used low doses of spores and/or toxins in order to mimic "natural" amounts of spores/toxins the fly can eat in the environment and in order to avoid drastic gut lining disturbances.

      We have also added the cell type ratio in figure 1- figure supplement 2.

      2) Figure 4: Is it possible that Arm staining is less intense between ISC and progenitors after ingestion of the bacteria due to the fact there is a high rate of stem cell proliferation? Could it be an indirect effect of stem cell proliferation rather than the binding of the toxins to Cadherins?

      We thank the reviewer for this pertinent comment. Indeed, for this reason, we compared the intensity of Arm expression at the junction between neighboring progenitors with the Arm intensity around the rest of the cellular membranes and calculated the ratio between both values (see Figure 4-figure supplement 1F-G for an illustration of how we proceeded and the new section in the Material and Methods 736-742). Using this method, even if the whole Arm staining intensity is different (in all the midgut), the ratio reflects the internal cell-cell interaction changes between the two neighboring cells. Moreover, we have observed that Arm staining (using the usual monoclonal antibody N2 7A1 from the DSHB) was very variable from one midgut to another in the same feeding/intoxication condition. So, we do not want to draw conclusion about the whole Arm intensity due to this variability whatever are the intoxication conditions. Finally, the challenged guts always displayed a more disorganized epithelium due to cell proliferation and differentiation. Consequently, Arm staining in ECs and progenitor cells are found in the same focal plane while in unchallenged and well-organized guts, Arm staining in ECs is above the focal plane of Arm staining in progenitor cells. This likely leads to the impression that Arm staining is more intense in challenged midguts. This method description is now added in the Material and methods section (lines 736-742).

      Could the authors use the ReDDM system to distinguish between "old" and newly formed cells? This could be a good control to make sure that the signal is quantified in similar cells between the control and the different conditions.

      We have analyzed intensity of Arm expression between pairs of GFP cells. Most of these pairs arose from de novo divisions. Indeed, as shown in control conditions (water) with Dl-ReDDM (for example see figure 1-figure supplement 1D), pairs of GFP cells (ISC-ISC) are rare. Most pairs correspond to ISC-EB or ISC-EEP pairs with the progenitor marked by the RFP, meaning that it just arises from the GFP+ mother ISC. Therefore we assume, that in the esg>GFP genotype, pairs of GFP+ cells correspond to one ISC and one progenitor (see Figure 4 – figure supplement 1A-A'). Therefore, when we analyzed the Arm intensity between pairs of GFP cells after intoxication, these cells are very likely "newborn" cells. Even if we suppose there are ISCs and progenitors that remain stuck together for a long time (for instance several days), Cry1A toxins can also be able to disrupt their cell junction. In the context of Cry1A toxin activity, it seems important to analyze the whole impact on cell-cell junctions without discriminating old and new cell-cell interactions.

      We tried to use anti-Arm and anti-Pros double staining to mark new EEPs. Unfortunately, anti-Arm and anti-Prospero antibodies were both raised in mice. Co-staining with both antibodies give rise to bad labelling either for Arm or for Prospero or for both. Our first author spent lot of energy trying to set up good conditions but unfortunately this was unsuccessful.

      Here is an example of what we got (this was the best image we got) with esg>GFP flies fed with water (control) and labelled for Arm and Pros in red. White arrows point two EEPs. Red arrows points the Arm staining between two precursors (ISC/ISC or ISC/EB or EB/EB). It was extremely hard to identify junctions marked by Arm between EEPs and ISCs because the Pros staining was too strong.

      Another example with flies fed with spores of SA11 (increasing the number of EEs). In green is the esg>GFP and in Red Arm and Prospero. The right panel correspond to the single red channel (Arm/Prospero).

      Nevertheless, we have now performed a similar analysis in an esg>GFP, Shg::RFP background and analyzed Shg::RFP (Tomato::DE-Cadherin) labelling intensity. We found similar results that are presented in the new Figure 4 (data we Arm have been moved in Figure 4-figure supplement 1). This last analysis have been included in the text lines 285-299.

      Figure 4E' and 4G': Arm staining seems more intense when looking at the whole membrane levels of cells compared to control. Is it possible that the measured ratio contact intensity/membrane intensity presented in Figure 4I could be impacted and not reflect the real contact intensity between ISC and progenitor cells?

      Please check our answer just above: "…//… we have observed that Arm staining (using the usual monoclonal antibody N2 7A1 from the DSHB) was very variable from one midgut to another in the same feeding/intoxication condition. So, we do not want to draw conclusion about the whole Arm intensity due to this variability whatever are the intoxication conditions".

      See also our intensity measurement method described above to avoid bias: "…//… we compared the intensity of Arm expression at the junction between neighboring progenitors with the Arm intensity around the rest of the cellular membranes and calculated the ratio between both values (see Figure 4-figure supplement 1F-G for an illustration of how we proceeded and the new section in the Material and Methods 736-742). Using this method, even if the whole Arm staining intensity is different (in all the midgut), the ratio reflects the internal cell-cell interaction changes between the two neighboring cells."

      What is the hypothesis of the authors about the decrease of Arm or DE-Cad seen after bacterial/crystal ingestion? Does the interaction between the toxins and DE-Cad induce a relocation of DE-Cad?

      It has been shown that E-Cadherin could be recycled when adherens junctions are destabilized both in Drosophila and mammals(Buchon et al., 2010; O'Keefe et al., 2007; Tiwari et al., 2018). To investigate this possibility, we tried to analyze DE-Cad cytoplasmic relocalization using anti-DE-Cad immunostaining (DCAD2 antibody from DSHB) as well as Shg::RFP (Bloomington stock #58789) or Shg::GFP (Bloomington stock #60584) endogenous fusion. Unfortunately, we did not see obvious differences. Nevertheless, we have now added the split channels of the Shg::RFP labelling in the different conditions in Figure 4A-D'. Nevertheless, we are still interested in the behavior of the DE-cadherin (and signaling, see (Liang et al., 2017)) upon binding of the Cry1A toxin. N. Zucchini-Pascal (author in this article) are currently investigating this question.

      The authors should add more details about the way to quantify in the Material and methods section. How many cells have been quantified per intestine? How did they choose the cells where they quantified the contact intensity?..etc

      These details were missing in the methods and we thank the reviewer for highlighting this issue. We added these information to the methods (lines 725-742). The number of cell pairs analyzed was present in the raw data related to figure 4 but absent from the main figure and legend. It is now rectified. We only measured the intensity in isolated pairs of cells.

      Figure 4B, D, F and H: How did the authors recognize the ISCs?

      We agree with the reviewer comment. We cannot recognize ICS per se. Green cells correspond either to ISCs or to EBs. We modified the text accordingly (lines 285-287).

      Could the authors do quantifications of DE-Cad signal?

      This has been done. It is shown now in figure 4E and in Table 1. We also adapted the text (lines 289-299) to fine-tune our interpretation in light of this new analysis. Indeed, what we have now defined as "mild" adherens junction intensity is between the ratio 1.4 and 1.6 instead of the previous ratio (1.3 to 1.6), because we observed most of the EEP progenitors arising from cell displaying a junction intensity with their mother cells below the 1.4 ratio (see Table 1).

      Like Arm staining, the staining seems stronger at the whole membrane level in F and H compared to the control.

      As we described above for Arm staining, the intensity of Tomato::DE-Cad labelling can differ from one posterior midgut to another one. One simple explanation would be related to changes in the structure of midgut epithelium which is well organized in unchallenged conditions, while in challenged midguts the epithelial cells are not well-arranged anymore due to rapid cell proliferation and differentiation. Consequently, DE-Cad labelling in ECs is at the same level as that in ISC/progenitors cells, giving the impression that the labelling is stronger.

      3) Figure 5: How is the stem cell proliferation upon overexpression of DE-Cad in control or upon bacteria/crystals ingestion? Do the authors think that the decrease of Pros+RFP+ new cells upon overexpression of DE-Cad could result from a decrease of stem cell proliferation?

      Great suggestion. Thereby, we chose to count the progenitor cells (GFP+ cells) reflecting the ISC division during the last 3 days. Moreover, this also has the advantage of working on the same pictures (samples) used for all the analyzes shown in figure 5 and Figure 5-figure supplement 1. Hence, If we consider the number of GFP+ cells (esg expressing cells corresponding to ISC, EB or EEP) in challenged midguts, the overexpression of the DE-Cad did not seem to alter ISC division. In addition, we still observed more GFP+ cells when the midguts were challenged with SA11 or crystals than with BtkCry, in agreement with the rate of ISC division observed in the WT genetic background shown in figure 1B.

      We have now added the counting of GFP+ cells in Figure 5-figure supplement 1E. The text has been modified to integrate this results (lines 306-308).

      Did the authors quantify the % of new ECs in the context of overexpression of DE-Cad?

      The data has been added in figure 5F. The text has been modified to integrate this result lines 312-313.

      Figure 5F: As asked before, did the authors distinguish the signal between newly born cells and the signal between older cells?

      In the new figure 5G: we used the esg-ReDDM system that is very efficient. Almost all ISC and progenitors express the GFP. The counting have been done between cell pairs that express both the GFP and RFP. It is specified in the text lines 310-311. Nevertheless, we cannot distinguish between new and old cells here. Indeed, the esg-ReDDM system induce both the GFP and the RFP in all esg+ cells (the old ones and the new ones). Hence, if a division has occurred just before the induction of the system to give birth for instance to an ISC and an EB, both cells will express the GFP and the RFP. But should we consider those pairs of cells as old cells or new cells? Noteworthy, as we analyzed the intensity of junctions 3 days after intoxication and induction of the ReDDM system, we assume that the pairs of GFP+/RFP+ cells arose after the induction of the system. Indeed, to our knowledge, nobody has shown in the posterior midgut, that a progenitor remains stuck to its mother ISC as long as 3 days. Even if we assume that this event can occur, Cry1A toxins can also be able to disrupt their cell junction.

      We now have removed the DAPI channel and added the RFP+ channel in Figure 5-figure supplement 1A-D' (previously the Figure S4A-D) to illustrate this explanation and to facilitate the interpretation by the reader.

      It would be interesting to compare the junction intensity between mother ISCs and their daughter progenitors before and after intoxication in a same intestine. But we think that this event is quite rare because of the experimental conditions we used (i.e. analyses 3 days after the induction of the ReDDM/intoxication).

      The same experiments (stem cell proliferation + quantification of the % of new ECs) could be also done when authors overexpress of the Connectin, supplemental figure 5. This would be another control to conclude that the effects on cell differentiation are specific due to the interaction between DE-Cad and the toxins.

      We have added the analyses in Figure 5 - figure supplement 2J and K.

      The text has been completed lines 317-320.

      In the "crystals" condition, the overexpression of Connection seems to partially rescue the increase % of new Pros+RFP+ new cells observed in Figure 3F (Figure S5I compared to Figure 3F).

      Yes, we agree with the reviewer comment. In an esg-ReDDM background (figure 3F), crystals induced a much greater increase in EE numbers than did SA11 spores. However, in a WT or esg>GFP background, crystals induced a similar increase in EE/EEP to that induced by SA11 spores. So we do not yet have explanation excepted the genetic background of the esg-ReDDM.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use the nanobody tools generated in the companion manuscript and have combined them with DNA-Paint oligonucleotide labeling to generate super-resolution images of indirect flight muscles. Using this approach, they could map the precise organization of the different domains from the two giant titin-like fly homologs called Sallimus and Projectin against which the nanobodies had been raised with a precision ranging from 1 nm to 4 nm, depending on the distance between them. They show that in indirect flight muscles the N-ter of Sallimus is located within 50 nm of the Z-disc, and that its C-ter reaches the A-band roughly 100 nm away from the Z-disc. Likewise, they show that the N-ter of Projectin colocalizes with the C-ter of Sallimus at the edge of the A-band, whereas its C-ter is located about 250 nm away in the A-band and 350 nm from the Z-disc. It overall suggests a staggered and linear organization of both proteins with a potential area of overlap spanning 10-12 nm, that Sallimus could bridge the Z-disc to the A-band acting as a ruler, while Projectin should only overlap with 15% of the A-band and possibly a 10 nm of the I-band.

      Thanks for this nice summary of our findings.

      The value of this work comes from its use of advanced technologies (DNA-Paint + superresolution). The biological conclusions confirm and refine earlier and recent papers, especially EM papers and the impressive and very comprehensive JCB paper by Szikora et al in 2020, although the conclusions of the present work differ somewhat from those of Szikora who had predicted that Sallimus does not reach the A-band. That aspect could have been better discussed.

      We have further extended our discussions of the results from Szikora et al. 2020, in particular regarding Sallimus in this revised version.

      Reviewer #2 (Public Review):

      Taking advantage of the high molecular order of the Drosophila flight muscle, Schueder, Mangeol et al. leverage small (<4 nm) original nanobodies, tailored coupling to fluorophores, and DNA-PAINT resolution capabilities, to map the nanoarchitecture of two titin homologs, Sallismus and Projectin.

      Using a toolkit of nanobodies designed to bind to specific domains of the two proteins (described in the companion article "A nanobody toolbox to investigate localisation and dynamics of Drosophila titins" ), Schueder, Mangeol et al position these domains within the sarcomere with <5nm resolution, and demonstrate that the N-ter of Sallismus overlaps with the C-ter of Projectin at the A-band/I-band interface. They propose this architecture may help to anchor Sallismus to the muscle, thus supporting flight muscle function while ensuring muscle integrity.

      This study nicely extends previous work by Szikora et al, and precisely dissect the the sarcomeric geography of Sallismus and Projectin. From these results, the authors formulate specific functional hypotheses regarding the organization of flight muscles and how these are tuned to the mechanical constraints they undergo.

      Although they remain descriptive in essence, the conclusions of the paper are well supported by the experimental results.

      We thank this reviewer for the nice summary of our results.

      Reviewer #3 (Public Review):

      This manuscript by Schueder et al. provides new insight into an important question in muscle biology: how can the smaller titin-like molecules of the much larger sarcomeres of invertebrate muscle perform the same function as the larger titin of vertebrate muscles which have smaller sarcomeres? These functions include the assembly, stability and elasticity of the sarcomere. Using two state of the art methods--nanobodies and DNA-PAINT superresolution microscopy, the authors definitively show that in the highly ordered indirect flight muscle of Drosophila, the elongated proteins Sallimus and Projectin are arranged such that the N-terminus of Sallimus is embedded in the Z-disk, and the C-terminus is embedded in the outer portion of the A-band, and that in this outer portion of the A-band is also embedded the C-terminus of Projectin; thus, if the C-terminus of Sallimus can bind to thick filaments, and/or these overlapping portions of Sallimus and Projectin interact, there would be a linkage of the Z-disk and/or thin filament to the thick filaments to help determine the length and stability of the sarcomere.

      The strengths of this paper include the implementation of nanobody and DNA-PAINT superresolution microscopy for the first time for muscle. The extraordinary 5-10 nm resolution of this method alloiws imaging for definitive localization of the termini of these elongated proteins in the Drosophila flight muscle sarcomere. In addition, the manuscript is well written with sufficient background information and rationale presented, is easy to read, complex new methods are well-described, the figures are of high quality, and the conclusions are well-justified. A minor weakness is that despite the authors demonstrating that the Cterminus of Sallimus is located at the outer edge of the A-band, and that the N-terminus of Projectin is located also in the outer edge of the A-band, the authors provide no data to show whether, for example, these portions of these titin-like molecules interact, or whether Sallimus might interact with thick filaments. Such data would be required to prove their model. However, I can understand that this would require extensive additional study, and the authors have already provided a tremendous amount of data for this first step in supporting the model. Nevertheless, the authors should cite a relevant previous study on the Sallimus homolog in C. elegans called TTN-1, which is also a 2 MDa polypeptide of similar domain organization to at least the large isoforms of Salliums found in fly synchronous muscles. In the study by Forbes et al. (2010), immunostaining, albeit not to the impressive resolution achieved in the present paper, showed that TTN-1 was also localized to the I-band with extension into the outer edge of the A-band. More importantly, that study also showed that "fragment 11/12", Ig38-40, which is located fairly close to the C-terminus of TTN-1 binds to myosin with nanomolar affinity (Kd= 1.5 nM), making plausible the idea that TTN-1 may bind to the thick filament in vivo.

      We thank this reviewer for sharing his enthusiasm about our results and methodology, and also about the way the data are presented. This is one more argument for us to leave a shortened Figure 1 in the PAINT manuscript.

      We are particularly thankful for pointing out the important C. elegans data that we had missed and that, as the reviewer said, perfectly fit with the model we propose for flight muscle (and also the larval muscle data, as the C-term of Sls is the same). Hence, we highlight this paper now in our discussion and compare to our findings.

      Reviewer #4 (Public Review):

      This manuscript reports combining recently developed and described in the accompanying paper nanobodies against Sallimus and Projectin with DNA-Paint technology that allows super-resolution imaging. Presented data prove that such a combination provides a powerful system for imaging at a nano-scale the large and protein-dense structures such as Drosophila flight muscle. The main outcome is the observation that in flight muscle sarcomeres Salimus and Projectin overlap at the I/A band border. This was elegantly achieved using double color DNA-Paint with Sls and Projectin nanobodies.

      We thank the reviewer for appreciating the quality of our work.

      Overall, as it stands, this manuscript even if of high technological value, remains entirely descriptive and short in providing new insights into muscle structure and architecture. The main finding, an overlap between short Sls isoform and Proj in flight muscle sarcomeres, is redundant with the author's observation (described in the companion paper "A nanobody toolbox to investigate localisation and dynamics of Drosophila titins") that in larval muscles expressing a long Sls isoform, Sls and Proj overlap as well.

      Alternatively, combination of Sls and Proj nanobodies with DNA-Paint represents an interesting example of technological development that could strengthen the accompanying nanobodies toolkit manuscript.

      Every structural paper reports the structure and is thus by definition descriptive. This is the aim of our manuscript. We do not think that the other nanobody resource paper reports an overlap of Sls and Projectin in the larvae. To resolve such a possible overlap, super resolution would be needed. The other paper does report that larval Sls isoform is dramatically stretched, more than 2 µm, and that Projectin is decorating the thick filament, likely in an oriented manner. If N-term of Projectin overlaps with C-term of Sallimus in this muscle is an open question that needs DNA-PAINT imaging of larval muscle. This requires a TIRF setting that is technically not trivial to achieve for larval muscle and hence has not been done by anybody.

    1. Author Response

      Reviewer #2 (Public Review):

      Point 1: The transcriptomic analysis of E12.5 endocardial cushion cells in the various mouse models is informative in the extraction of Igf2- and H19-specific gene functions. In Fig. 6D, a huge sex effect is obvious with many more DEGs in female embryos compared to males. How can this be explained given that Igf2/H19 reside on Chr7 and do not primarily affect gene expression on the X chromosome? Is any chromosomal bias observed in the genomic distribution of DEGs?

      We examined chromosomal distribution of DEGs between WT and +/hIC1 (Supplemental Figure 6D) and did not see any bias on X chromosome. We described this result on lines 278-280: “Although the number of +/hIC1-specific DEGs largely differed between males and females, there was no sex-specific bias on the X chromosome (Supplemental Figure 6D).” Additionally, we agree with the reviewer that it is noteworthy that the dysregulated H19/Igf2 expression affected transcriptome in a sex-specific manner, especially when the mutation is located on a somatic chromosome. Although investigating the role of hormones versus sex chromosome in these effects would be quite interesting, it is beyond the scope of current study.

      Point 2: A separate issue is raised by Fig. 6E that shows a most dramatic dysregulation of a single gene in the delta3.8/hIC1 "rescue" model. Interestingly, this gene is Shh. Hence, these embryos should exhibit some dramatic skeletal abnormalities or other defects linked to sonic hedgehog function.

      The reason why Shh appeared to be differentially expressed between wild-type and d3.8/hIC1 samples was that Shh expression was 0 across all the samples except for two wild-type samples. In order to detect all the DEGs that might be lowly expressed, we did not want to filter DEGs based on the level of total expression. As a result, Shh was represented as significantly differently expressed in d3.8/hIC1 samples, although its expression in our samples appears to be too low to have any significant effect on development. This explanation was added to lines 310-312. To confirm that this was an exceptional case, we analyzed the expression of DEGs obtained from other pairwise comparisons. In the volcano plots below, genes of which expression is not statistically different between two groups are marked grey. Genes of which expression is statistically different and detected in both groups are marked red. Genes with statistically different but not detected in one group at all, such as Shh, are marked blue (Figure G). It is clear that that almost all of our DEGs are expressed consistently across the groups, and genes with no expression detected in one group are very rare.

      Point 3: The placental analysis needs to be strengthened. Placentas should be consistently positioned with the decidua facing up, and the chorionic plate down. The placentas in Fig. 3F are sectioned at an angle and the chorionic plate is missing. These images must be replaced with better histological sections.

      As requested, we have replaced placental images with better representative sections (Figure 3F and 4E). In addition, we have improved alignment of placental histology figures.

      Point 4: The CD34 staining has not worked and does not show any fetal vasculature, in particular not in the WT sample.

      As requested, we have replaced the CD34 vascular stained images with those that better represent fetal vasculature (Figure 3G).

      Point 5: The "thrombi" highlighted in Fig. 4E are well within the normal range, to make the point that these are persistent abnormalities more thorough measurements would need to be performed (number, size, etc).

      As requested, we measured the number and relative size of the thrombi that are found in dH19/hIC1 placentas with lesions. No thrombi were found in wild-type placentas whereas an average of 1.3 thrombi were found in six dH19/hIC1 placentas. The size of the thrombi widely varied, but occupied average of 2.58% of the labyrinth zone where these lesions were found (Supplemental Figure 4D). Additionally, we replaced the image in Figure 4E into the section that better represents the lesion.

      Point 6: The statement that H19 is disproportionately contributing to the labyrinth phenotype (lines 154/155) is not warranted as Igf2 expression is reduced to virtually nothing in these mice. Even though there is more H19 in the labyrinth than in the junctional zone, the phenotype may still be driven by a loss of Igf2. Given the quasi Igf2-null situation in +/hIC1 mice, is the glycogen cell type phenotype recapitulated in these mice, and how do glycogen numbers compare in the other mouse models?

      The sentence was edited in line 157. We performed Periodic acid Schiff (PAS) staining on +/hIC1 placentas to address if glycogen cells are affected by abnormal H19/Igf2 expression (Supplemental Figure 1E). In contrary to previous reports where Igf2-null mice had lower placental glycogen concentration (Lopez et al., 1996) and H19 deletion led to increased placental glycogen storage (Esquiliano et al., 2009), our quantification on PAS-stained images showed that the glycogen content is not significantly different between wild-type and +/hIC1 placentas. We have described this result in lines 166-168.

      Point 7: How do delta3.8/+ and delta3.8/hIC1 mice with a VSD survive? Is it resolved some time after birth such that heart function is compatible with postnatal viability? And more importantly, do H19 expression levels correlate with phenotype severity on an individual basis?

      Our study was limited to phenotypes prior to birth, thus postnatal/adult phenotypes were not examined. Because the VSD showed only partial penetrance in these mice, we cannot state that the d3.8/+ or d3.8/hlC1 mice with VSDs survive. It has also been previously reported in another mouse model with incomplete penetrance of a VSD that the mice which survived to adulthood did not have the VSDs (Sakata et al., 2002). We find it highly unlikely that either mouse model would survive significantly past the postnatal timepoint with a VSD. We have examined two PN0 d3.8/hIC1 neonates, and both did not have VSD.

      Regarding the second point, the only way to quantitatively address this question would be to do qPCR or RNA-seq on individual hearts, which then makes it impossible for those hearts to be examined for histology to confirm the VSD. Thus, hearts used to identify VSDs via histology could not also be used for quantitative H19 measurements. One thing to note is that the H19/Igf2 expression in independent replicates of d3.8/hIC1 cardiac ECs used in our RNA-seq experiment is quite variable, not clustering together in contrast to other mouse models used in this study (Fig. 6A). Such wide range of variability in the extent of H19/Igf2 dysregulation suggests that H19/Igf2 levels could have an impact on the penetrance or the severity of the VSD phenotype in d3.8/hIC1 embryos.

    1. Author Response

      Reviewer #2 (Public Review):

      Zylbertal and Bianco propose a new model of trial-to-trial neuronal variability that incorporates the spatial distance between neurons. The 7-parameter model is attractive because of its simplicity: A neuron's activity is a function of stimulus drive, neighboring neurons, and global inhibition. A neuroscientist studying almost any brain area in any model organism could make use of this model, provided that they have access to 1) simultaneously-recorded neurons and 2) the spatial locations of those neurons. I could foresee this model being the de-facto model to compare to all future models, as it is easy to code up and interpret. The paper explores the effectiveness of this distance model by modeling neural activity in the zebrafish optic tectum. They find that this distance-based model can capture 1) bursting found in spontaneous activity, 2) ongoing co-fluctuations during stimulus-evoked activity, and 3) adaptation effects during prey-catching behavior.

      Strengths:

      The main strength of the paper is the interpretability of the distance-based model. This model is agnostic to the brain area from which the population of neurons is recorded, making the model broadly applicable to many neuroscientists. I would certainly use this model for any baseline comparisons of trial-to-trial variability.

      The model is assessed in three different contexts, including spontaneous activity and behavior. That the model provides some prediction in all three contexts is a strong indicator that this model will be useful in other contexts, including other model organisms. The model could reasonably be extended to other cognitive states (e.g., spatial attention) or accounting for other neuron properties (such as feature tuning, as mentioned in the manuscript).

      The analyses and intuition to show how the distance-based model explains adaptation were insightful and concise.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      Model evaluation and comparison: The paper does not fully evaluate the model or its assumptions; here, I note details in which evaluation is needed. A key assumption of the model - that correlations fall off in a gaussian manner (Fig. 1C-E - is not supported by Fig. 1C, which appears to have an exponential fall-off. Functions other than gaussian may provide better fits.

      A key feature of our model is that connection strengths smoothly decrease with distance. However, we did not intend to make strong claims about the exact function parametrizing this distance relationship. In light of the reviewer’s comment, we have additionally tested an exponential function and find that it too can describe activity correlations in OT with a negligible decrease in r2 (Figure 1 – figure supplement 1A-C). The main purpose of the analysis was to show that the correlation is maximal around the seed and decays uniformly with distance from it (i.e. no sub-networks or cliques are detected). We have emphasized this in a revised conclusion paragraph and note that while multiple functions can be used to parameterize the relationship, they are nonetheless certainly simplifications. Secondly, we also ran a version of the network simulation where the connections decay in space according to an exponential rather than Gaussian function and show that, as expected, tectal bursting is robust to this change.

      Furthermore, it is not clear whether the r^2s in Fig. 1E are computed in a held-out manner (more details about what goes into computing r^2 are needed).

      These values are computed by fitting the 2-d Gaussian (or exponential function) to all neurons excluding the seed itself (added a short clarification in the Methods).

      Assessing the model based on peak location alone (Fig. 1E) is not sufficient, as other smooth monotonically-decreasing functions may perform similarly.

      As discussed above, an exponential function indeed performs similarly to a Gaussian. However, goodness of fit is secondary to the main aim of Fig 1E, which is to show that the correlation peak tends to fall near the seed cell.

      Simulating from the model greatly improves the reader's understanding (Fig. 2D), but no explanation is given for why the simulations (Fig. 2D) have almost no background spikes and much fewer, non-co-occurring bursts than those of real data (Fig. 2E).

      In part this is because the simulation results depicted in Fig 2D were derived from the ‘baseline model’, prior to optimizing to match biological bursting statistics. It is thus expected that activity will differ from experimental observation and was our main motive to tune the model parameters (now emphasized in the text). However, the model will certainly not account for all aspects of tectal activity; rather, it was designed to reproduce bursting as a prominent feature of ongoing activity and in the second part of the paper we explore the extent to which it can account for other phenomena. As noted above, in the revised abstract, introduction and discussion we have tried to clarify the motivation for developing the model and how it was used to gain insight into activity-dependent changes in network excitability.

      A key assumption of the distance model (Fig. 2A) is that each neuron has the same gaussian fall-off (i.e., sigma_excitation and sigma_inhibition), but it is unclear if the data support this assumption.

      We intentionally opted for a simple model (i.e. described by few parameters), in part due to the lack of connectivity data and additionally to set a lower bound on the extent to which multiple features of tectal activity could be accounted for. More complex models with additional degrees of freedom (such as cell-specific connectivity) may well describe the data better, but likely at the cost of interpretability. We consider such extensions are beyond the scope of the present study but might be fruitful avenues for future research.

      Although an excitatory and inhibitory gain is assumed (Fig. 2A), it is not clear from the data (Fig. 1C) that an inhibitory gain is needed (no negative correlations are observed in Fig. 1C-D).

      This is now explored in the revised Figure 3A which includes the condition of zero inhibition gain. See also response to reviewer 1.

      After optimization (Fig. 3), the model is evaluated on predicting burst properties but not evaluated on predicting held-out responses (R^2s or likelihoods), and no other model (e.g., fitting a GLM or a model with only an excitatory gain) is considered. In particular, one may consider a model in which "assemblies" do exist - does such an assembly model lead to better held-out prediction performance?

      The model we developed is a mechanistic, generative model. In contrast to Pillow et al 2008, we did not fit the model to data but rather we used it to simulate network activity and tuned the seven parameters (using EMOO) to best match biological observations. Thus, rather than assessing goodness-of-fit using cross-validation, our approach involved comparison of summary statistics related to the target emergent phenomenon (tectal bursting). This was necessary as bursting appears highly stochastic. Further to the comments above, we have expanded the parameter space to include instances with only an excitatory gain (where bursting failed) and no distance-dependence (again, busting failed). Introducing assemblies into the model will inevitably support bursting (and introduce many more free parameters), but one of our key observations is that such assemblies are not required for this aspect of spontaneous activity. Again, our aim was not to produce a detailed picture of tectal connectivity, but rather to develop a minimal model and estimate the extent to which it can account for observed features of activity. Note that the second half of the paper (Figure 4 onwards) shows the model can explain phenomena that were not considered during parameter tuning.

      It is unclear why a genetic algorithm (Fig. 1A-C) is necessary versus a grid search; it appears that solutions in Generation 2 (Fig. 3C, leftmost plot, points close to the origin) are as good as solutions in Generation 30 and that the spreads of points across generations do not shrink (as one would expect from better mutations). Given the small number of parameters (7), a grid search is reasonable, computationally tractable, and easier to understand for all readers (Fig. 3A).

      Perhaps in hindsight a grid search would have worked, but at increased computational cost (each instantiation of the model is computationally expansive). At the time we chose EMOO, and since it produced satisfactory results, we kept it. As often happens with multi-objective optimization, an improvement in one objective usually happens at the expense of other objectives, so the spread of the points does not shrink much but they move closer to the axes (i.e. reduced error). The final parameter combination is closer to the origin than any point in generation 2, though admittedly not by much. Importantly, however, optimizing the model using the training features generalized to other burst-related statistics.

      It is unclear why the excitatory and inhibitory gains of the temporal profiles (Fig. 3I) appear to be gaussian but are formulated as exponential (formula for I_ij^X in Methods).

      The interactions indeed have exponential decay in time. These might appear Gaussian because the axis scale is logarithmic.

      Overall, comparing this model to other possible (similar) models and reporting held-out prediction performance will support the claim that the distance model is a good explanation for trial-to-trial variability.

      See comments above. A key point we want to stress is that we intentionally explored a minimal network model and found that, despite obvious simplifications of the biology, it was nonetheless able to explain multiple aspects of tectal physiology and behaviour. We hope that it inspires future studies and can be extended, in parallel to experimental findings, to more accurately represent the cell-type diversity and cell-specific connectivity of the tectal network.

      Data results: Data results were clear and straightforward. However, the explanation was not given for certain results. For example, the relationship between pre-stimulus linear drive and delta R was weak; the examples in Fig. 4C do not appear to be representative of the other sessions. The example sessions in Fig. 4C have R^2=0.17 and 0.19, the two outliers in the R^2 histogram (Fig. 4D).

      The revised figure 4 is based on new data and new analysis (see below), and the presented examples no longer represent the extreme tail of the distribution (they still, however, represent strong examples, as is now explicitly indicated in the figure legend).

      The black trace in Fig. 4D has large variations (e.g., a linear drive of 25 and 30 have a change in delta R of ~0.1 - greater than the overall change of the dashed line at both ends, ~0.08) but the SEMs are very tight. This suggests that either this last fluctuation is real and a major effect of the data (although not present in Fig. 4C) or the SEM is not conservative enough. No null distribution or statistics were computed on the R^2 distribution (Fig. 4C, blue distribution) to confirm the R^2s are statistically significant and not due to random fluctuations.

      We agree that this was not sufficiently robust and in response to this comment we undertook a significant revision to figure 4 and the associated text:

      i) The revised figure is based on an entirely new dataset, allowing us to verify the results on independent data. We used 5 min ISI for all stimulus presentations, regardless of stimulus type (high or low elevation), thus ensuring that we are only examining differences in state brought about by previous ongoing activity, without risk of ‘contamination’ by evoked activity.

      ii) As per the reviewer’s suggestion, we compared model-estimated pre-stimulus state to a null estimate using randomly sampled time-points. We additionally compared the optimised model with the baseline model. Whereas the null (random times) estimates had no predictive power, both models using pre-stimulus activity were able to explain a fraction of the response residuals with the optimised model performing better.

      iii) We refined the binning process by first computing, for each response, the mean of response residuals across neurons for each bin of estimated linear drive, and then averaging across responses. This prevents the relationship being skewed by rare instances involving unusually large numbers of neurons for a particular linear drive bin, and thereby eliminates the fluctuations the reviewer was referring to.

      The absence of any background activity in Fig. 6B (e.g., during the rest blocks) is confusing, given that in spontaneous activity many bursts and background activity are present (Fig. 2E).

      The raster only presents evoked responses and no background activity is shown. This has been clarified in the revised figure and legend.

      Finally, it appears that the anterior optic tectum contributes to convergent saccades (CS) (Fig. 7E) but no post-saccadic activity is shown to assess how activity changes after the saccade (e.g., plotting activity from 0 to 60).

      Activity before and after the saccade is shown in Fig 7A. Fig 7E shows the ‘linear drive’ (or ‘excitability’), and how it changes leading up to the saccade. Since we were interested in the association between pre-saccade state and saccade-associated activity, we did not plot post-saccadic linear drive. However, as can be seen in the below figure for the reviewer, linear drive is strongly suppressed by the saccade, as expected due to CS-associated activity.

      No explanation is given why activity drops ~30 seconds before a convergent saccade (Fig. 7E).

      This is no longer shown after we trimmed the history data in Fig 7E in accordance with a comment from reviewer 1. We speculate, however, that the mean linear drive of a compact population of neurons would be somewhat periodical, since a high linear drive leads to a burst which results in a prolonged inhibition (low linear drive) with a slow recovery and so on.

      No statistical test is performed on the R^2 distribution (Fig. 7H) to confirm the R^2s (with a mean close to R^2=0.01) are meaningful and not due to random fluctuations.

      We revised the analysis in Fig 7 along the same lines as the revision of Fig 4. Model-estimated linear drive predicts CS-associated activity whereas a null estimate (random times) shows no such relationship.

      Presentation: A disjointed part of the paper is that for the first part (Figs. 1-3), the focus is on capturing burst activity, but for the second part (Figs. 4-7), the focus is on trial-to-trial variability with no mention of bursts. It is unclear how the reader should relate the two and if bursts serve a purpose for stimulus-evoked activity.

      In the first part of the paper (Figs. 1-3), we use ongoing activity to develop an understanding (formulated as a network model) of how activity modulates the network state. In the second part, we test this understanding in the context of evoked responses and show that model-estimated network state explains a fraction of visual response variability and experience-dependent changes in activity and behaviour. In the revised MS we further emphasize this idea and have edited the results text to strengthen the connections between these parts of the study. See also comments above.

      Citations: The manuscript may cite other relevant studies in electrophysiology that have investigated noise correlations, such as:

      • Luczak et al., Neuron 2009 (comparing spontaneous and evoked activity).

      • Cohen and Kohn, Nat Neuro 2011 (review on noise correlations).

      • Smith and Kohn, JNeurosci 2008 (looking at correlations over distance).

      • Lin et al., Neuron 2015 (modeling shared variability).

      • Goris et al., Nat Neuro 2014 (check out Fig. 4).

      • Umakantha et al., Neuron 2021 (links noise correlation and dim reduction; includes other recent references to noise correlations).

      We agree that the manuscript could benefit from citing some of these suggested studies and have added citations accordingly.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by McCafferty et al. presents the integrative computational structural modelling of the IFT-A complex, which is important to proper cilium organelle formation in eukaryotic cells. Recent advances in protein structure prediction (AlphaFold) allowed the authors to model the structures of the 6 individual subunits of the IFT-A complex. Interactions between IFT-A proteins were experimentally investigated by purifying Tetrahymena cilia, isolating IFT complexes, and utilizing chemical crosslinking and mass spectrometry (MS). In addition, the authors present a somewhat improved 23Å cryo-electron tomography (cryo-ET) map of the IFT-A complex (previously determined cryo-ET structures of IFT trains have resolutions of 24 - 40 Å). Integrative modelling using the predicted structures of the 6 IFT-A proteins and the experimental data as restraints allows the authors to present a structural model for the entire IFT-A complex. This model is analysed in the context of the polymeric IFT train structure, interactions with the IFT-B complex, and the structural position of ciliopathy disease variants.

      This is in principle a timely and interesting study that attempts to push the limits of structural modelling of large protein complexes using structure prediction in combination with experimental data. Unfortunately, the study has several shortcomings and the data providing restraints for the integrative modelling are not optimal.

      1) Chemical crosslinking and MS were used to obtain both intra-molecular crosslinks used to validate the structural models of the individual IFT-A proteins as well as inter-molecular crosslinks used as restraints in the structural modelling of the hexameric IFT-A complex. It is mentioned on p. 4, line 9, that IFT-A complexes were enriched from the flagellar lysate M+M fractions using SEC and that fractions from SEC containing IFT-A complexes were crosslinked for MS analysis. However, the authors do not show the data for this sample, neither SEC profiles, SDS-PAGE nor data of the cross-linked samples. On p. 7 the authors write that their SEC profile corresponds to monomeric IFT-A, but this is not shown anywhere in the manuscript. The reason this is so important is that the IFT-A complex assembles into linear polymeric structures together with the IFT-B complex as so-called IFT trains in cilia. Data obtained from isolated IFT trains would thus have additional crosslinks between subunits in neighbouring IFT-A complexes that, if used to restrain the position of subunits within a hexameric IFT-A complex, would likely result in a wrong architecture. The fact that the authors also observe crosslinks between IFT-A and IFT-B proteins strongly suggests that they indeed carried out the crosslinking experiment on polymeric rather than monomeric IFT complexes.

      These are excellent points, and we apologize for previously omitting these data. In the new Figure 1—figure supplement 2, we now include size exclusion chromatography elution profiles for IFT-A along with molecular weight calibrants, plotting the mass spectrometrically-determined abundances of IFT-A subunits. Based on these data, we experimentally determined the molecular weight of the IFT-A particles that we analyzed to lie between 720 kDa and 1.1 MDa, consistent with the expected monomeric molecular weight of 772 kDa.

      These samples were isolated directly from Tetrahymena cilia and were composed of ~3% each of IFT-A and IFT-B. However, as we now note on p. 11, the samples were subsequently concentrated before crosslinking. We speculate that concentrating the particles could have induced some degree of oligomerization and interactions with IFT-B, which may in turn explain the small number of crosslinks consistent with IFT-A/IFT-A and IFT-A/IFT-B interactions. However, we have now removed all discussion of specific IFTA/B contacts in the paper and present only the general orientation of the two complexes as determined by cryo-ET.

      2) Given that the crosslink/MS data are unlikely to provide sufficient restraints for IFT-A structure assembly (and may even be misleading), the cryo-ET data become increasingly important. Unfortunately, the 23Å cryo-ET map does not provide sufficient detail to unambiguously fit domains of the IFT-A subunits as several of these have similar architectures consisting of WD-repeats followed by TPRs.

      We now address this comment using a different approach, which we describe in full on p. 5, 14-15, and Figure 2—figure supplements 1-4 of the paper.

      In particular, we used AlphaFold-Multimer (AF-Multimer) to identify confidently-modeled rigid-body domains and domain-domain interactions for directly contacting protein pairs (see Figure 2—figure supplements 1-2), which we used as starting models for integrative modeling (see Figure 2—figure supplements 3-4). We incorporated our cross-links as distance restraints for the modeling. This approach allowed us to model the entire IFT-A complex in a manner compatible both with our experimental structural data and the computationally derived restraints. We suspect this will be a very useful strategy for others to adopt, as the approach should be generalizable to many other large molecular assemblies that are too big to predict using AF-Multimer alone. Importantly, we see high concordance between the AlphaFold intermolecular constraints and our crosslinks (as plotted in the new Figure 2—figure supplement 4), and the models produced by this strategy agree well with the two structures presented in the newly posted preprints, which were arrived at using very distinct methodologies.

      This approach allowed us to withhold the cryo-ET tomogram from the modeling altogether in order to generate a fully independent model. We could then compare the final model to the subtomogram average and, by docking the model into the cryo-ET tomogram, to build a model of polymeric IFT-A, as described on p. 6 and presented in the new Figure 4, Figure 4—figure supplement 1, and Figure 4—animation 1.

      3) Two preprints of the IFT-A structure appeared over the last few weeks. Hesketh et al., (https://www.biorxiv.org/content/10.1101/2022.08.09.503213v1) have obtained a single particle cryo-EM structure of the human IFT-A complex at 3.5Å resolution for the IFT121/122/139 part of the complex providing amino acids side-chain information. In addition, Lacey et al. (https://www.biorxiv.org/content/10.1101/2022.08.01.502329v1) provide a 10-18Å resolution cryo-ET structure of the Chlamydomonas IFT trains containing both IFT-A and IFT-B. It is noteworthy that the model outlined in the current manuscript is very different from the IFT-A models of Hesketh et al., and Lacey et al. (the Lacey et al. manuscript by the way shares an author with the McCafferty et al., manuscript). In both Hesketh et al., and Lacey et al. the IFT121 and IFT122 subunits interact via the N-terminal WD-repeats and the C-terminal TPRs with the beta-propellers (WD-repeats) positioned parallel and in close contact. In the model proposed by McCafferty, the beta-propellers of IFT121 and IFT122 are positioned far away from each other (>50Å) and are perpendicular to each other. Several other large discrepancies are found in the relative positions of IFT-A subunits. This suggests serious problems with the structural model of IFT-A proposed by McCafferty and needs to be addressed with great care.

      This is an important point that we have indeed considered with great care. Our new model now positions the WD-40 domains of IFT121 and IFT122 proximal to each other and broadly matches the 2 preprints in general placement and orientation of all subunits, including the placement of IFT43, for which only we and Hesketh provide models.

      We now include an extensive comparison to the structures reported in the other two preprints. Note that a direct 3D alignment of the structures was not possible, as we were the only group to deposit our atomic coordinates. However, we now include a new Figure 4—figure supplement 2 orienting our structure to match figures appearing in those preprints, and use this as the basis for comparison, which can be found on p. 10. While it is not possible to calculate a quantitative measure of agreement (e.g. RMSD), our IFT43/120/121/139 structure visually agrees with the structure of Hesketh et al., even to the placement of IFT43, which is highly disordered for the most part, and which is omitted from Lacey et al. Our structure also generally agrees with that of Lacey et al. in this region, with the exception of what appears to be a re-orientation of the N-terminus of IFT139 in the Lacey structure relative to that of ours and Hesketh, which appear to be concordant with each other (again, with the caveat that we are limited in the comparisons we can make without having access to atomic coordinates.) Most importantly, all three structures agree with respect to the nature of the IFT-A monomer-monomer interactions in the polymeric train, with IFT140 acting to bridge adjacent monomers. Differences in the resolutions of the cryo-ET subtomogram averages (which range from 18 to 30 Å) are relatively small across the 3 studies and, at least as best we can tell by this necessarily crude comparison at this stage, do not obviously lead to any major changes in the structures of the polymeric assemblies.

      4) The authors observe crosslinks between the IFT-A proteins (IFT122 and IFT140) and IFT-B proteins (IFT70, IFT88, and IFT172) as discussed on pg. 6 and shown in figure 5A. To accommodate these crosslinks into the structural model of the IFT train shown in Figure 5A, the authors place the IFT-B subunits IFT70 and IFT88 far apart in the IFT-B complex. However, these subunits are known to interact directly (Taschner et al. JCB 2014) and indeed sit in proximity to the IFT train structure as observed by Lacey et al. While the crosslinking data may well be correct, the incorrect structural model of IFT-A likely forces an incorrect positioning of IFT-B proteins to fulfill the crosslinking data.

      It is now clearly evident that the earlier segmentation of the monomeric unit from within the polymeric IFT-A chain, which we based on the published segmentation of Jordan et al. Nat Cell Biol. 2018, did not properly capture the true boundaries of an IFT-A monomer, especially with regard to IFT140, which extends outward to connect adjacent monomers. The use of this artificially truncated monomer as a molecular envelope in our initial modeling effectively forced the IFT-A subunits to pack in a reversed orientation in order to fit the truncated density.

      In order to address this issue, we omitted the cryo-ET data from the modeling altogether and instead incorporated evidence capturing domain-domain structures of interacting protein pairs from AlphaFold-Multimer. This substantially reduced the number of degrees of freedom to be explored by the integrative modeling process in order to satisfy the available structural restraints, leading in turn to significantly better convergence of independent modeling runs and high concordance with the input data (Figure 2—figure supplement 3 and Figure 2—figure supplement 4), and a significantly improved structural model of the IFT-A monomer. Docking this refined monomer structure into the (now fully independent) cryoET tomogram produced a model of the polymer that fit well into the cryo-ET density (Figure 4 and Figure 4—figure supplement 1) and agreed in large part with those derived by Hesketh and Lacey, as described above and visualized in Figure 4—figure supplement 2.

    1. Author Resonse

      Reviewer #1 (Public Review):

      The manuscript by Himmel et al is an interesting study representing a topic of substantial interest to the somatosensory neurobiology community. Here, the authors use CIII peripheral neurons to investigate polymodality of sensory neurons. From vertebrates to invertebrates, this is a long-standing question in the field: how is it that the same class of sensory neurons that express receptors for myriad sensory modalities encode different behavioral responses. This system in Drosophila seems to be an intriguing system to study this question, making use of the genetic toolkit in the fly and ease of behavioral assays. In this study, the authors identify a number of channels that are important for cold nociception, and they showed that some of these do not appear to also encode mechanosensation. Despite my initial enthusiasm for this paper, halfway through, it felt as if I were reading two different papers that were loosely tied together. This lack of cohesion significantly reduced my enthusiasm for this work. Below are some of my criticisms:

      We thank Reviewer #1 for their feedback. In addition to the points below, and in accordance with the reviewer’s overall criticisms, we have revised the body text to make it more cohesive. Our main goal with this revision was to better explain to the reader the shift from anoctamins to SLC12 cotransporters.

      1) The first half of the paper is about a role for Anoctamins in cold nociception, but the second half switched somewhat abruptly to ncc69 and kcc. I assumed the authors would connect these genes in a genetic pathway, performing some kind of epistatic genetic interaction studies or even biochemical assays, and that this was the reason to switch the focus of the paper midway through. But this was not the case. Moreover, they performed a different constellation of experiments for the genes in the first half vs the second half of the paper (eg. Showed a role in cold nociception vs mechanosensation or showing phenotype from overexpression). This lack of cohesion made it difficult to follow the work.

      We have edited the text to better explain this shift. Two notable changes are: (1) moving the phylogenetics to Figure 1, to more immediately present and demonstrate that subdued is part of the ANO1/ANO2 family of calcium-activated chloride channels; and (2) a new cartoon schematic in Figure 6 to more strongly communicate to a reader that chloride is a hypothetical mechanism of cold discrimination.

      In short, previous work and our phylogenetic analyses indicate that subdued is a Cl- channel (we have moved the phylogeny earlier in the paper to make this clear from the onset). We were therefore surprised that knockdown/mutation resulted in reduced CT behavior, as neural Cl- currents are often inhibitory. Thus, we looked to known mechanisms of Cl- homeostasis to try to formulate an informed hypothesis about the function of anoctamins in this system; hence the shift in focus to SLC12.

      In response to the second half of the comment: We have in fact performed cold nociception and mechanosensation experiments for both the anoctamins and the SLC12 cotransporters, although the SLC12 mechanosensation results were in a supplemental figure. We have moved the mechanaosensation results to the main Figure 6 to make this clearer. With respect to simple overexpression, the goal of the anoctamin experiments was to test the necessity of anoctamins to cold-evoked behavior, whereas the goal of the SLC12 experiments was to differentially modulate Cl- homeostasis, and this could hypothetically be accomplished by both knockdown and overexpression (hence we performed both knockdown and overexpression).

      2) In Fig1B,C how does one confirm a CIII neuron is being analyzed. It might help the reader if there were at least some zoomed out photos where all the cell types are labeled and potentially compared to a schematic. Moreover, is there a CIII specific marker to use to co-stain for confirmation of neuron type?

      Our CIII fusion is a specific marker for CIII neurons. To better demonstrate this, we have added images of the new CIII fusion expression patterns overlapping with a previously described CIII GAL4 driver (i.e. nompC-GAL4), and provided text describing how the CIII fusion transgene was discovered and generated. Please see the new Figure 1-Figure supplement 1.

      3) As this paper is predicated on detecting differences by behavioral phenotype, the scoring analysis is not as robust as it could be, especially considering the wealth of tools in Drosophila for mapping behaviors. The "CT" phenotype is begging for a richer behavioral quantification. This critique becomes relevant here when considering the optogenetic induced CT behavior in Fig5. If the authors were to use unbiased quantitative metrics to measure behavior, they could show how similar the opto behavior is to the natural cold evoked behavior. Perhaps the two are not the same, although loosely fitting under the umbrella of "CT".

      In accordance with our response above to necessary revisions, we have added one additional metric and reorganized the figures to better demonstrate the complexity of the behavior. We have no further data or new tools at this time.

      To improve our optogenetic analyses, we have added data for Channelrhodopsin-dependent CIII activation, which has been previously shown to induce cold-like behaviors at high levels of activation and innocuous touch-like behaviors at low levels of activation (Turner, Armengol et al 2016). Further, we have added videos (Figure 5—videos 1-3) showing behavior in response to both Channelrhodopsin and Aurora activation.

      With respect to differences in behavior, we have pointed out some differences in the Aurora-evoked behavior from the cold-evoked behavior: chloride optogenetics induces innocuous touch-like behaviors following CT. Please see lines 296-299.

      4) Following on from the last comment, the touch assays in Fig3 have a different measurement system from the other figures. Perhaps touch deficits would be identified with richer behavioral quantification. Moreover, do these RNAi larvae show any responses to noxious mechanical stimulation?

      The touch assays necessarily have different metrics from cold assays, as the touch-evoked behaviors are quite different from cold-evoked change in length (which are relatively simple, prima facie).

      With respect to noxious mechanical stimulation, while Class III neurons have been shown to facilitate this modality and be connected to relevant circuitry (please see Hu et al 2017 https://doi.org/10.1038/nn.4580 and Takagi et al 2017 https://doi.org/10.1016/j.neuron.2017.10.030), Class IV neurons are the primary sensory neuron which initiate the noxious mechanical-induced rolling response. Although this is an interesting question, we believe it is outside the scope of this study.

      Reviewer #2 (Public Review):

      Himmel and colleagues study how individual sensory neurons can be tuned to detect noxious vs. gentle touch stimuli. Functional studies of Drosophila class III dendritic arborization neurons characterized roles in gentle touch and identified a receptor, NompC, and other factors that mediate these responses. Subsequent work primarily from the authors of the current study focused on roles for the same sensory neurons in cold nociception. The two proposed sensory inputs lead to quite distinct sets of behaviors, with touch leading to halting, head turning and reverse peristalsis, and noxious cold leading to whole body contraction. How activity of one type of sensory neuron could lead to such different responses remains an outstanding question, both at the levels of reception and circuitry.

      The cIII responses to noxious cold and innocuous touch raises questions that the authors address here, proposing that studies of this system could advance the understanding of chronic neuropathic pain. A candidate approach inspired by studies in vertebrate nociceptors led the authors to study anoctamin/TMEM16 channels subdued, and CG15270, termed wwk by the authors. The authors focus on a pathway for gentle touch vs. cold nociception discrimination through anoctamins. Several of the experiments in this manuscript are well done, in particular, the electrophysiological recordings provide a substantial advance. However, the genetic and expression analysis has several gaps and should be strengthened. The data also do not provide strong support for some key aspects of the proposed model, namely the importance of relative levels of Cl co-transporters.

      Major comments:

      1) Knockout studies are accomplished using two MiMIC insertions whose effects on subdued or CG15270/wwk are not characterized by the authors. This needs to be established. The MiMIC system is also not well explained in the text for readers.

      We have modified the text to better explain MiMICs (Lines 137-140) and we have verified the mutagenic effects of these MiMIC insertions via RT-PCR (Figure 2 – supplement 1). We believe these data, in conjunction with other converging lines of evidence (e.g. rescue) demonstrate necessity of these genes in cold nociception.

      2) Subdued expression is inferred by a Gal4 enhancer trap. This can be a hazardous way of determining expression patterns given the uncertain relevance of the local enhancers driving the expression. According to microarray analysis subdued is strongly expressed in cIII neurons, but c240-Gal4 is barely present compared to nearby neurons, raising questions about whether this line reflects the expression pattern, including levels, even though the authors suggest that the line is previously validated (line 95; it is unclear what previously validated means). Figure 1B should not be labeled "subdued > GFP" since it is not clear that this is the case. Another more direct method of assessing expression in cIII is necessary. Confidence is higher for wwk using a T2A-Gal4 line, however, Figure 1C might be misleading to readers and indicate that wwk-T2A-Gal4 is cIII specific whereas in supplemental data the authors show how it is much more broadly expressed. The expression pattern in the supplemental figures should be moved to the main figures.

      We have removed the phrase “previously validated” and we have modified Figure 1 to change how we refer to the GFP expression (removed “subdued > GFP”).

      In accordance with the response to necessary revisions above, we make use of several converging lines of evidence to infer expression, including GAL4 expression patterns, microarray, and qPCR (the two latter experiments from isolated CIII samples). That subdued and wwk are expressed in CIII is clearly the most parsimonious hypothesis.

      We have also carefully reviewed our body text to be certain we do not make claims of differential expression between different neural subtypes based on differences in fluorescence in the GAL4-driven GFP imaging. We do not believe that this would be a reasonable way to infer differences in expression levels in any instance.

      With respect to the design of Figure 1, the intent is not to mislead the reader, and we state in the text that wwk is not solely expressed in CIII (lines 120-125). As eLife makes supplemental figures available directly alongside the main figures, we have left the relevant supplemental figures as supplements – we simply think this makes more sense from a standpoint of readability and style.

      3) In figure 8 the authors propose a model in which the relative levels of K-Cl cotransporters Kcc (outward) and Ncc69 (inward) in cIII neurons determine high intracellular Cl- levels and a Cl- dependent depolarizing current in cIII neurons. They test this model using overexpression and loss of function data, but the results do not support their model since for most of the overexpression and LOF of kcc and ncc69 do not significantly affect cold nociception, the exception being ncc69 RNAi. The authors suggest that this could be due to Cl homeostasis regulated by other cotransporters. Nonetheless, it leaves a significant unexplained gap in the model that needs to be addressed.

      We respectfully disagree that our results are not consistent with the stated hypothesis. In fact, it is the lack of change under certain conditions which lend evidence against the alternative hypothesis that CIII neurons maintain relatively low intracellular Cl-. The hypothesis we are testing is that ncc69 expression is driving relatively high intracellular Cl- concentrations, thus resulting in depolarizing Cl- currents.

      Under this hypothesis, we would predict that knockdown of ncc69 and overexpression of kcc would reduce cold sensitivity at 5˚C. That knockdown of ncc69 and overexpression of kcc reduces cold sensitivity is consistent with this hypothesis (and we point out in text that the evidence for kcc is less convincing) – at the least, these results do not disprove it.

      Under this hypothesis, we would also predict that knockdown of kcc and overexpression of ncc69 would not result in reduced cold sensitivity at 5˚C. As there was no phenotype at 5C, our results are likewise consistent with the hypothesis (at the least, they do not disprove it).

      We did find it curious that ncc69 RNAi did not affect neural activity at 10˚C, but speculate that our inability to detect physiological effects for ncc69 knockdown are limitations of our electrophysiology methodology (and we discuss this in the manuscript).

      The only piece of data inconsistent with the hypothesis may be that kcc overexpression may not have affected cold nociception at 5˚C – the data aren’t overwhelmingly convincing. However, this is only one experiment among many, and we believe the preponderance of evidence is consistent with the hypothesis. That is not to say we believe this hypothesis has complete explanatory power, however, as noted by our discussion of both the ncc69 electrophysiological and kcc behavioral data, and by our suggestion that there may be other regulatory mechanisms at work. This latter suggestion is wholly speculative, and we believe appropriate for the discussion section. We agree (and state in the discussion) that this would require further experimentation.

      4) Related to the #3, the authors should verify the microarray data that form the basis for their differential expression model.

      We have performed qPCR for ncc69 and kcc. Although qPCR is semiquantitative when comparing between genes, the Ct value for ncc69 was lower than for kcc, indicating more transcripts were present at the onset (assuming identical efficacy). These data (although semi-quantitative), the microarray, and our behavioral and electrophysiological data are consistent with the stated hypothesis.

      Reviewer #3 (Public Review):

      There are also several modest weaknesses in the paper:

      1) A notable gap remains in the evidence for the hypothesized mechanisms that enhance electrical activity during cold stimulation and the proposed role of anoctamins (Fig. 8) - the lack of evidence for Ca2+-dependent activation of Cl- current. The recording methods used in the fillet preparation should enable direct tests of this important part of the model.

      We have performed an additional experiment at the reviewer’s suggestion. Please see above (in essential revisions) and below (in recommendations for authors).

      2) The behavioral and electrophysiological consequences of knocking down either of the two anoctamins are incomplete (Fig.2), raising the significant question of whether combined knock-down of both anoctamins in the CIII neurons would largely eliminate the cold-specific responses.

      While the results of this experiment would certainly be interesting, we are unsure of how it would be acutely informative in this context and are not convinced that any possible outcomes would disprove any particular hypothesis. In part, this is because we know that blocking synaptic transmission in CIII neurons (via tetanus toxin) does not completely ablate cold-evoked behavior (Turner & Armengol et al 2016 https://doi.org/10.1016/j.cub.2016.09.038). This is also the case for combinatorial mutation of other genes associated with cold nociception (please see Turner & Armengol et al 2016; and more recently, Patel et al 2022 https://doi.org/10.3389/fnmol.2022.942548). Further, the husbandry required to generate the double knockdowns would be quite challenging and might result in GAL4 titration (hypothetically less strongly knocking down each gene). For these reasons, we have not performed this suggested experiment.

      3) Blind procedures were not used to minimize unconscious bias in the analyses of video-recorded behavior, although some of the analyses were partially automated.

      This is correct and a relative weakness of the study. We note it in our methods section. The use of semi-automated data analyses of the behavioral videos is designed to minimize experimenter-specific variability.

      4) The term "hypersensitization" is confusing. Pain physiologists typically use "sensitization" when behavioral or neural responses are increased from normal. In the case of increased neuronal sensitivity, if the mechanism involves an increase in responsiveness to depolarizing inputs or an increased probability of spontaneous discharge, the term "hyperexcitability" is appropriate. Hypersensitization connotes an extreme sensitization state compared to a known normal sensitization state (which already signifies increased sensitivity). In contrast, the effects of ncc69 overexpression in this manuscript are best described simply as sensitization (increased reflexive and neuronal sensitivity to cooling) and hyperexcitability (expressed as increased spontaneous activity at room temperature).

      We have modified the text in accordance with the reviewer’s suggestions (see recommendations for authors section). We have also changed the title of the paper to “Chloride-dependent mechanisms of multimodal sensory discrimination and nociceptive sensitization in Drosophila”

    1. Author Response

      Reviewer #3 (Public Review):

      This paper focuses on characterizing differences between D. suzukii and D. melanogaster preferences for laying eggs on substrates of varying sugar content and stiffness. The authors demonstrate that D. suzukii show a weaker preference for multiple sugars in oviposition choice assays, that D. suzukii show a loss of sugar responsiveness in some labellar sensilla, and that some GR-encoding genes are expressed at much lower levels compared to. D. melanogaster in the legs and labellum. Intriguingly, a number of mechanosensory channel genes are upregulated In D. suzukii legs and labellum. The authors show that D. suzukii females prefer stiffer oviposition substrates compared to D. melanogaster and the balance of sweetness/texture preference differs between the two species. This is consistent with their ecological niches, with D. suzukii generally preferring to lay eggs in ripe fruit and D. melanogaster generally preferring overripe fruit.

      This paper builds on previous work from this group (Dweck et al., 2021) and others (Karageorgi et al., 2017 and others) that previously demonstrated that D. suzukii prefer to lay eggs on stiffer substrates compared to D. melanogaster, will tolerate more bitter substrates and show reduced expression of several bitter GR genes. This manuscript appropriately acknowledges this work and the findings are consistent with these studies.

      The manuscript is well-written, the experiments are well-controlled, the figures clearly convey the experimental findings, the data support the authors conclusions, and the statistical analysis is appropriate.

      The weakest point of the paper is the lack of connection drawn between the sequencing, electrophysiological, and behavioral data. For example, the electrophysiological responses to glucose appear to be the same in both species in Figure 3 but the behavioral responses in Figure 2 are different between the two species. The authors do not provide any speculation as to what could account for this seeming discrepancy.

      The revised ms. contains the following statement: " The weaker behavioral responses to glucose observed in D. suzukii could derive from weaker responses of untested taste neurons. Multiple taste organs, including the pharynx as well as the labellum and legs, contribute to oviposition behavior; sensory neurons of the ovipositor appear to play an important role as well (Yang et al., 2008; Joseph et al., 2012; Chen et al., 2022). The weaker behavioral response to glucose in D. suzukii could also arise from differences in central processing of glucose signals. It will be interesting to determine if there are differences in the connectivity of taste circuits in the two species. Alternatively, taste projection neurons in D. suzukii could have a reduced dynamic range, saturate at lower levels of receptor neuron firing, and be less able to distinguish among higher sugar concentrations."

      Additionally, although Gr64d transcript is almost completely absent in D. suzukii leg RNA seq data in Figure 4B, there are no differences in the electrophysiological responses in leg sensilla in Figure 3.

      This seems to imply that, although there are differences gene expression of some Grs that this does not necessarily lead to a functional difference.

      We have added to the Discussion a statement to clarify that although similar, the sugar responses of leg sensilla are in fact not the same in the two species: "Leg sensilla of D. suzukii responded to sucrose, but dose-response analysis of the f5s sensillum of the leg showed that the response was lower than in its D. melanogaster counterpart to higher concentrations of sucrose (Figure 3—figure supplement 1E) ."

      The authors identify mechanosensory genes that are upregulated in D. suzukii compared to D. melanogaster and suggest that these changes underlie the difference in substrate stiffness. However, it is not immediately clear that high levels of these mechanosensors would impart a new oviposition preference. Although the authors acknowledge that there are likely circuit-level differences between the two species, they do not directly test the role of any of these mechanosensors in oviposition preference in either species.

      See response below to the point about nompC.

      In Figure 3 there are clear differences in some of labellar responses but the leg responses look similar overall. This suggests that the labellum is playing a special role in oviposition evaluation. The paper would be strengthened by providing more insight into which tissues (labellum, legs, wings, ovipositor, etc...) are likely used to sample potential egg laying substrates.

      We agree and have added to the Discussion the following: "Multiple taste organs, including the pharynx as well as the labellum and legs, contribute to oviposition behavior; sensory neurons of the ovipositor appear to play an important role as well (Yang Science 2008; Joseph Genetics 2012; Chen PNAS 2022)."

    1. Author Response

      Reviewer #3 (Public Review):

      Main results:

      1) TCR convergence is different from publicity: The authors look at CDR3 sequence features of convergent TCRs in the large Emerson CMV cohort. Amino usage does not perfectly correlate with codon degeneracy, for example, arginine (which has 6 codons) is less common in convergent TCRs, whereas leucine and serine are elevated. It's argued that there's more to convergence than just recombination biases, which makes sense. (I wonder if the trends for charged amino acids could be explained by the enrichment of convergent TCRs in CD8 T cells, which tend to have more acidic CDR3 loops). There's also a claim that the overlap between convergent and public TCRs is lower in tumors with a high mutational burden (TMB), but this part is sketchy: the definition of public TCRs is murky and hard to interpret, and the correlation between TMB and convergence-publicity overlap is modest (two cohorts with low TMB have higher overlap, and the other three have lower, but there is no association over those three, if anything the trend is in the other direction). It's also not clear why the overlap between COVID19 cohort convergent TCRs and public TCRs defined by the pre-2019 Emerson cohort should be high. A confounder here is the potential association between convergence and clonal expansion since expanded clonotypes can spawn apparently convergent TCRs due to sequencing errors. The paper "TCR Convergence in Individuals Treated With Immune Checkpoint Inhibition for Cancer" (Ref#5 here) gives evidence that sequencing errors may be inflating convergence in this specific dataset.

      We really appreciate the reviewer’s feedback. We respond to each of the reviewer’s points below:

      (1) Amino acid preference of convergent TCRs might be caused by CD8+ T cell enrichment. To test this hypothesis, we performed the same analysis using only CD8+ T cells (using the Cader 2019 lymphoma cohort). The results are shown below. We do not observe significant changes after excluding CD4+ T cells, indicating that this enrichment might be caused by factors other than CD4/CD8 differences.

      (2) Definition of public TCRs. We have changed the definition of public TCRs. Instead of mixing the Emerson cohort into each group and using the mixed cohort to define the public TCRs, we just used the 666 samples of the Emerson cohort to define the same set of public TCRs and applied them to each cohort. Both the dataset and the approach used in this manuscript is consistent with a previous study on the same topic (Madi et al., 2014, elife).

      (3) Convergence-publicity overlap: We agree with the reviewer that some high TMB tumors did not show further decrease of convergence-publicity overlap. One potential explanation is that the correlation between the two is not linear. By adding additional cohorts in this revision (healthy and recovered COVID-19 patients), we confirmed the previously observed overall trend between TMB and the overlap, which supported our conclusions (see figure below). On the other hand, we believe that the high overlap of convergent TCRs among healthy cohorts might result from exposure to common antigens. In the cancer patients, while still exposed, private antigens derived from tumor cells are expected to compete for resources, thus reducing the proportion of these public TCRs in the blood repertoire. The above discussion has been added to the revised manuscript:

      “Healthy individuals are expected to be exposed to common pathogens, which might induce public T cell responses. On the other hand, cancer patients have more neoantigens due to the accumulative mutation, which drives their antigen-specific T cells to recognize these 'private' antigens. This reduces the proportion of public TCRs in antigen-specific TCRs. Furthermore, a higher tumor mutation burden (TMB) would indicate a higher abundance of neoantigens, resulting in a lower ratio of public TCRs.”

      2) Convergent TCRs are more likely to be antigen-specific: This is nicely shown on two datasets: the large dextramer dataset from 10x genomics, and the COVID19 datasets from Adaptive biotech. But given previous work on TCR convergence, for example, the Pogorelyy ALICE paper, and many others, this is also not super-surprising.

      We thank the reviewer for bringing up this related work. In the Pogorelyy ALICE paper, the authors defined TCR neighbors based on one nucleotide difference of a given CDR3, which included both synonymous and non-synonymous changes. In other words, ALICE combines both convergence and mismatched (with hamming distance 1) sequences as neighbors. Although highly relevant, our approach is different by focusing only on the convergence, as mismatch has been extensively investigated by previous studies. We have now added this paper as Ref 27, and discussed the difference between ALICE and our method in the revised manuscript.

      3) Convergent T cells exhibit a CD8+ cytotoxic gene signature: This is based on a nice analysis of mouse and human single-cell datasets. One striking finding is that convergent TCRs are WAY more common in CD8+ T cells than in CD4+ T cells. It would be interesting to know how much of this could be explained by greater clonal expansion of CD8+ T cells, together with sequencing errors. A subtle point here is that some of the P values are probably inflated by the presence of expanded clonotypes: a group of cells belonging to the same expanded clonotype will tend to have similar gene expression (and therefore similar cluster membership), and will necessarily all be either convergent or not convergent collectively since they share the same TCR. So it's probably not quite right to treat them as independent for the purposes of assessing associations between gene expression clusters and convergence (or any other TCR-defined feature). You can see evidence for clonal expansion in Figure 3C, where TRAV genes are among the most enriched, suggesting that Cluster 04 may contain expanded clones.

      (1) We agree with the reviewer that a possible explanation of the CD8/CD4 difference is the larger cell expansion of CD8+ T cells. We tested this hypothesis by counting the number of T cell clones instead of cell number to remove the effect that would have been caused by CD8 T cell expansion. We first investigated the bulk TCR repertoire sequencing samples as Figure 3 - figure supplement 2C-2D (see figure below). We observed higher convergence levels for the CD8+ T cell clones compared to CD4+ T cells. The additional description of this topic was added at the last paragraph of the result section of “Convergent T cells exhibit a CD8+ cytotoxic gene signature” as follows:

      “The results may be explained by larger cell expansions of CD8+ T cells than CD4+ T cells. Therefore, we calculated the number of convergent clones within CD8+ T cells and CD4+ T cells from the above datasets to exclude the effects of cell expansion. As a result, in the scRNA-seq mouse data, while only 1.54% of the CD4+ clones were convergent, 3.76% of the CD8+ clones showed convergence. Likewise, 0.17% of convergent CD4+ T cell clones and 1.03% of convergent CD8+ T cell clones were found in human scRNA-seq data. In the bulk TCR-seq lymphoma data, similar results were also observed, where the gap between the convergent levels of CD4+ and CD8+ T cells narrowed but remained significant (Figure 3—figure supplement 2C-2D). In conclusion, these results suggest that CD8+ T cells show higher levels of convergence than CD4+ T cells, which substantiated our hypothesis that convergent T cells are more likely antigen-experienced. This observation has been tested using multiple datasets with diverse sequencing platforms and sequencing depth to minimize the impact of batch or other technical artifacts.”

      (2) We next investigated the effect of cell expansion in the single cell analysis. We agree with the reviewer that some highly-expanded convergent clones could inflate the p-value. Therefore, we revised the calculation of TCR convergence by using the T cell clone instead of individual cells. We observed that the clusters of interest mentioned in the paper (for both mouse and human data) remain at the top convergent level among all clusters (see table below), with p values estimated using Binomial exact test. These results supported our hypothesis that TCR convergence is enriched for T cell clusters that are more likely antigen-experienced.

      4) TCR convergence is associated with the clinical outcome of ICB treatment: The associations for the first analysis are described as significant in the text, and they are, but just barely (0.045 and 0.047, but you have to check the figure to see that).

      As suggested by the reviewer, we have added the p-value to the test so that it is easier to see. In this revision, we adopted another definition of convergent level, changing from the ratio of convergent TCR to the actual number of convergent T cell clones within each sample. The p-values were more significant using this new indicator (0.02 and 0.00038). To avoid the effect of other variables that might be correlative with convergent levels, especially the sequencing depth, the multivariate Cox model was used for both datasets tested in the paper, correcting for TCR clonality, TCR diversity and sequencing depth (and different treatment methods for melanomas data). As a result, convergence remains significantly prognostic after adjusting for the additional variables.

      5) Introduction/Discussion: Overall, the authors could do a better job citing previous work on convergence, for example, papers from Venturi on convergent recombination and the work from Mora and Walczak (ALICE, another recombination modeling). They also present the use of convergence as an ICB biomarker as a novel finding, but Ref 5 introduces this concept and validates it in another cohort. Ref 5 also has a careful analysis of the link between sequencing errors and convergence, which could have been more carefully considered here.

      We thank the reviewer for this excellent suggestion. We have added the citation of Venturi on convergent recombination as Ref 43 and we cited it at the last paragraph of the result selection:

      “Convergent recombination was claimed to be the mechanistic basis for public TCR response in many previous studies(Quigley et al., 2010; Venturi et al., 2006).”

      We also included work from Mora and Walczak in the fourth paragraph of the introduction and the third paragraph of the discussion as Ref 27 to introduce this TCR similarity-based clustering method as well as its application in predicting ICB response:

      “This idea has led several TCR similarity-based clustering algorithms, such as ALICE (Pogorelyy et al., 2019), TCRdist (Dash et al., 2017), GLIPH2 (Huang et al., 2020), iSMART (Zhang et al., 2020), and GIANA (Zhang et al., 2021), to be developed for studying antigen-driven T cell expansion during viral infection or tumorigenesis.”

      “In addition, the potential prognostic value of TCR convergence and TCR similarity-based clustering was testified in other studies(Looney et al., 2019; Pogorelyy et al., 2019).”

      Ref 5 was recited while discussing the effect of sequencing error on TCR convergence in the fourth paragraph of discussion:

      “Improper handling of sequencing errors may result in the overestimation of TCR convergence (Looney et al., 2019).”

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by Borsatto et al describes atomic-level structural details of the central core domain of non-structural protein 1 (Nsp1) of SARS-CoV-2, the virus responsible for the ongoing COVID-19 pandemic. Authors combined X-ray crystallography, fragment screening, computational modelling, and molecular dynamics simulation approaches to characterize potentially druggable pockets in Nsp1 core (aa 10-126). This study presents several notable strengths. For example, authors screened and tested 60 fragments from the Maybridge Ro3 library and solved a co-crystal structure of Nsp1 core with one such fragment 2E10 (N-(2,3dihydro-1H-inden-5-yl) acetamide) to 1.1Å resolution. The molecular dynamics simulation and other computational experiments were performed rigorously.

      Nsp1 blocks the path of mRNA in ribosomes to modulate protein synthesis in the host cell. Nsp1 also binds the first stem-loop (SL1) of SARS-CoV-2 mRNA. The authors used a molecular docking program (HADDOCK) to build models of the Nsp1/RNA complex and predicted two modes of Nsp1 binding to SL1 RNA. A comparative structural analysis of Nsp1/2E10 experimental structure with Nsp1/SL1 (model) reveals that small molecule compounds occupying this site may block RNA binding of Nsp1. Given the established role of this interface in modulating the host and viral gene expression programs, this finding provides an important framework for designing the small molecules capable of completely blocking this interface.

      A weakness of this study is the lack of experimental validation of the two modes of Nsp1 binding to SL1 RNA.

      The mechanism of binding, in particular whether Nsp1 binds to the ribosome first and then to the SL1 or the other way round, is still debated. Moreover, to the best of our knowledge, to this day there is no structure of the N-terminal region Nsp1 bound to the ribosome. Thus, we expect that obtaining a structure of the binary and possibly ternary complex to validate the predicted binding mode will necessitate considerable time and efforts and will hopefully be the focus of a follow up study.

      Reviewer #2 (Public Review):

      In this manuscript, the authors have identified cryptic pockets in the Nsp1 protein of the SARSCoV-2 virus. The authors used computational methods to identify these pockets and demonstrate drug binding via simulation studies. The authors also show that such cryptic pockets exist in other beta-coronaviruses as well.

      The authors carried out fragment-based screening using macromolecular crystallography and confirmed the presence of drug bound in one of the pockets identified. However, the binding assays showed a weak binding with high error.

      The weak binding is typical for fragments, however we agree that the error was high, therefore, we re-measured the data (both for Nsp1N and full-length Nsp1) to bring the error down. The new values can be found in Figure 6 – Figure supplement 2.

      Further, the authors perform Nsp1-mRNA simulation studies to identify how Nsp1 binds to the 5'UTR of SARS-CoV-2 mRNA and mention that targeting the identified pocket in Nsp1-N could disrupt the SARS-CoV-2 Nsp1-mRNA complex. However, there are conflicting reports on direct binding between the SARS-CoV-2 Nsp1-mRNA (See references 17 & 29).<br /> Nsp1 helps establish viral infection in the host, and hence identifying the druggable site in this protein is important. Therefore, this study is important and exciting.

    1. Author Response

      Reviewer #1 (Public Review):

      Yanis Zekri et al have addressed an important question of the possible role of thyroid hormone (T3) and its nuclear receptor (TR) on local BAT thermogenesis and energy expenditure. In this well-written manuscript and well-carried work, the authors address the above question by A) by generating the BATKO mice by selectively eliminating TR signaling in BAT by knocking-in a TRα1L400R, a dominant negative version of the TRa1 receptor, and by floxing the ThRb gene. They characterized this mouse thoroughly to show that they totally lacked T3 responsiveness. Using qPCR they evaluated the selective abrogation of Thrb and Hr expression in BAT tissue relative to other tissue sites. B) Using time-course transcriptome analysis they then go on to enlist all the T3/TR direct target genes using well-defined criteria and further linking with their ChipSeq data they identified 639 putative target genes which are under the direct control of T3/TR signaling. Interestingly their gene analysis lead them to some target genes directly involved with UCP1 and PGC1α in addition to genes of many other metabolic processes related to BAT thermogenesis. The experiments on denervated BAT on wild-type PTU-fed was a rather neat experiment to eliminate the influence of noradrenergic terminal BAT target genes. Furthermore, the cold exposure experiments and the high-fat diets feeding with the series of complex analyses led them to the conclusion that BAT KO animals suffered from reduced efficiency of BAT adaptive thermogenesis. By comparing the BAT transcriptome of BATKO and CTRL mice after 24h at 4{degree sign}C, the authors further go on to show how BAT TR signaling controls other subsets of genes, especially a wide variety of metabolic regulations, especially lipolysis/fatty acid oxidation. Finally, EdU injection experiments showed a direct effect of T3 on BAT proliferation.

      I think it was well thought and well-designed study for understanding the complex action of cell-autonomous T3 regulation of adaptive thermogenesis. The conclusions of this paper are well supported by the data provided.

      Thank you very much for this very pertinent and kind summary of our work.

      Reviewer #2 (Public Review):

      The authors designed this study to identify the direct T3 target genes that underlie the T3 actions in the brown adipose tissue (BAT). The unique model used (dominant negative TRa knock-in and a TRb knock-out) allows for the isolation of BAT-specific actions from other well-known systemic effects on thermogenesis, including the central nervous system. The strengths of the study reside in the novel methodological approach. Previous studies of T3 actions in the BAT used animal models that did not allow for full isolation of BAT-specific effects of T3. A limitation however is the combination of TRa knock-in (which causes permanent suppression of TRa-dependent genes) with the TRb knockout, which only prevents T3 induction of TRb-dependent genes. Nonetheless, the results were impressive with the identification of about 1,500 genes differentially expressed in the BAT, among which UCP1 and PGC1a were the two main ones. Although it has been known that both UCP1 and PGC1a are downstream targets of T3, the work establishes through an ingenious approach the critical direct role played by T3 in BAT thermogenesis. In addition, the genetic approach utilized here is of great value and could be easily expanded to other tissues and systems.

      Thank you for this very pertinent summary of our work. We just want to clarify one point: we do not believe that Ucp1 is, quantitatively, one of the main genes regulated by T3 in the BAT. First, it does not belong to the set of the most induced genes after T3/T4 injection in mice. Most importantly, Ucp1 expression was not altered in BATKO mice exposed at 4°C according to unbiased RNAseq analysis. Only targeted qRT-PCR analysis could evidence a modest change. We do not call into question the crucial role of Ucp1 in BAT thermogenesis. However, we think that our approach put into perspective the relevance of Ucp1 in the T3-dependent control of BAT thermogenesis, suggesting that other mechanisms might be more directly linked to T3 activity.

      Reviewer #3 (Public Review):

      This paper details the importance of thyroid hormone signaling in BAT in response to environmental and nutritional stress. The authors utilize a novel genetic model to specifically target BAT and impair thyroid hormone signaling. The physiologic insight is of great interest. The role of the sympathetic nervous system in the BAT response is not fully addressed but it appears that cell-autonomous signaling mediates TH signaling in response to hyperthyroidism. The link cistromically between the TR and PGC1 is also novel and of interest.

      Thank you very much for your kind comments that are highly appreciated.

    1. Author Response

      Reviewer #1 (Public Review):

      This work addresses a long-standing question about how tolerance develops at the presynaptic level. That the number of receptors is unchanged following the treatment of animals with opioids was known since the early work using receptor binding assays. The conclusion was that receptor/effector coupling was disrupted was thought to be the primary mechanism underlying tolerance. This work indicates that the location of receptors is critically important in the development of tolerance. This work is groundbreaking and a game changer in the understanding of tolerance at the cellular level.

      We appreciate that the Reviewer is positive about the potential impact of our study.

      Reviewer #2 (Public Review):

      Jullie et al addressed the long-standing question of how presynaptic desensitization of opioid receptor signaling can occur on the timescale of hours despite the fact that it does not occur on the timescale of minutes. They also compared the mu and delta opioid receptors in this context and asked whether their desensitization occurs in a homologous or heterologous manner when co-expressed in the same neurons.

      A major strength of the work is the use of a relatively high-volume imaging assay of synaptic transmission based on VAMP2-SEP to detect exocytosis of synaptic vesicles and its modulation by heterologously expressed opioid receptors in cultured neurons. This allowed for large data sets to be acquired and analyzed with good statistical power. It also reports on a validated metric of synaptic transmission.

      A significant weakness arises from the need to overexpress opioid receptors in cultured striatal neurons in order to conduct the experiments with high reliability. Because the authors did not attempt to address receptor expression levels and relate overexpression to endogenous receptor expression levels in axons, the physiological significance of the findings remains, to some extent, in doubt.

      Using heterologously expressed receptors, the primary finding that slow desensitization (of presynaptic suppression of neurotransmission) occurs via endocytosis of membrane-localized opioid receptors, is well supported by multiple lines of evidence. 1) Blocking receptor endocytosis, either via mutation of GRK2/3 phosphorylation sites or pharmacological block with compound 101 prevents slow desensitization of MOR. ) SEP-MOR and SEP-DOR fluorescence (indicative of plasma membrane localization) is reduced by chronic agonist treatment.

      The secondary findings that MOR and DOR do not desensitize or undergo endocytosis in a heterologous manner, and that DOR-depletion from the plasma membrane is more facile than MOR and independent of C-terminus phosphorylation, are well supported by the data and analyses.

      Despite the reliance on heterologously expressed opioid receptors, the findings are likely to have a high impact on the fields of GPCR trafficking and opioid signaling, as they address a major outstanding question with direct relevance to opioid drug tolerance and may generalize to other GPCRs.

      The findings also evoke new questions that will spur further work in the field. For example, just focusing on DOR, by what mechanism does agonist-driven DOR endocytosis occur not via GRK2/3 phosphorylation? By extension, would G protein-biased DOR agonists be expected to produce less tolerance? To be clear these are not to be addressed in this manuscript.

      We appreciate that this Reviewer found that the current manuscript addresses long standing questions in the field and that our results are well supported by the data, acknowledging the strength of the presented method. We agree that our methods and results do have some associated limitations, particularly with respect to linking the present mechanistic findings to true physiology, and that the question of receptor expression level is pertinent to this link. We have attempted to address this to the best of our ability in the revised manuscript, as summarized below. We agree with the Reviewer that there remain many interesting questions for further study, and have modified the Discussion to more clearly point this out.

      Reviewer #3 (Public Review):

      The studies in the manuscript "Endocytic trafficking determines cellular tolerance of presynaptic opioid signaling" use a novel approach to assess the signaling of presynaptic opioid receptors that inhibit the release of neurotransmitters. Historically, studies have used whole-cell patch-clamp electrophysiology studies of spontaneous and evoked neurotransmitter release to measure the presynaptic effects of opioid receptors. Since the recordings were made in postsynaptic cells that expressed receptors for the released neurotransmitter, the electrophysiological measurements are indirect with respect to the presynaptic receptors under study. The technique used in this manuscript is based on a pHlorin-based unquenching assay that is a measure of synaptic vesicle exocytosis. In this case, the super-ecliptic pHluorin (SEP) is a pH-sensitive GFP that increases fluorescence as the synaptic vesicle protein that it is attached to (VAMP2-SEP) relocates from the acidic synaptic vesicle to the plasma membrane. Opioid agonists inhibit this activity with acute administration and this inhibition is reduced with prolonged, or chronic administration (hours), demonstrating tolerance. The SEP protein can also be conjugated to opioid receptors and used to measure the proportion of receptors on the plasma membrane compared to internalized receptors. The studies show that agonist activation of mu-opioid receptors (MORs) induces endocytosis that is dependent on phosphorylation of the C-terminus and that the development of tolerance is correlated with the loss of MORs at the surface. The results are different for the delta-opioid receptor (DOR) which is also internalized with acute agonist administration but that loss of receptors on the membrane occurs more rapidly and is not dependent on phosphorylation of the C-terminus.

      The results in the studies are clearly presented and clearly substantiate the prior work using electrophysiology to show the late development of tolerance of presynaptic opioid receptor signaling. The studies extend prior work by showing that endocytosis of both MOR and DOR occurs in presynaptic locations but that the cellular mechanisms underlying the maintenance of these receptors on the plasma membrane differ. The imaging results show convincing effect sizes, even with genetic and pharmacological manipulations, that will allow for even further investigation into the cellular mechanisms underlying the development of tolerance. Since these studies transfected the cultured striatal neurons with both the opioid receptors and the VAMP2-SEP, one question that remains is whether imaging of the VAMP2-SEP has the resolution to detect inhibition of endocytosis by endogenous opioid receptors. Since the authors make the point that this technique has advantages over traditional electrophysiological approaches, it is important that the technique allows for the measurement of endogenous levels of receptors. There are minor questions about the statistics used in some of the graphs, and the utility of the presentation of p values on the right-hand axis but these concerns do not alter the overall significance of the studies, which are high impact.

      We are pleased that this Reviewer found our results generally convincing and impactful. We are grateful for the critical comments and suggestions, particularly with regard to improving the statistical analysis and simplifying / removing speculation from our model. We have done our best to address both important aspects in the revised manuscript, as detailed below.

    1. Author Response

      Reviewer #3 (Public Review):

      1) This work focuses exclusively on excitatory input. However, as the authors mention, LGMD neurons also receive inhibitory inputs, and these inputs also appear to segregate to different areas of the dendritic tree depending on the pathway. The contribution of inhibition is mostly ignored throughout the manuscript, but I think that it would be beneficial to discuss how inhibitory inputs fit into the story. For example, if OFF inhibition maps onto the C field, then presumably when there is mixed ON/OFF stimulation there is inhibition of the ON excitation onto the C field? If so, how much excitation of the C field is left? How much does the retainment of spatial coherence sensitivity with mixed stimuli arise from the fact that OFF excitation might dominate because it inhibits the C field? I don't think that additional experiments are needed, but a discussion would be useful. Related, does the model include inhibitory synapses?

      We have not elaborated more specifically on inhibition, as the experimental characterization of its interaction with excitation has not yet been investigated experimentally. We agree that the interaction between excitation and inhibition for mixed ON/OFF stimuli in field C is an interesting topic, but it is unlikely to affect substantially responses to ON stimuli alone. We added a paragraph on E-I integration to the discussion (lines 461-473). The model does include inhibitory synapses which are now more clearly described.

      2) The argument that the cellular organization found here is good because it allows grasshoppers to be sensitive to white approaching stimuli while disregarding spatial coherence and saving energy seems plausible. But it's not clear to me why this is 'optimal' (from the title - 'optimizes neuronal computation'). What exactly is being optimized here? And why is it good that grasshoppers can't discriminate the spatial coherence of ON looming stimuli? Is everything that approaches a grasshopper fast and white always a bad thing, but not the case if the approaching thing is black? Some further placement of these findings into an ecological setting might be helpful here.

      Our thinking is not that there is an advantage to responding to incoherent white looms (on the contrary), but that white looming stimuli in nature are likely less frequent than black/white mixtures or than all dark stimuli. Thus, the inability to discriminate white spatial coherence might have been sacrificed to decrease energy expenditure. We agree that ‘optimal’ might be too strong a wording and we have modified the title and text accordingly. Hopefully the text is now clearer on this point.

    1. Author Response

      Reviewer #1 (Public Review):

      Abdel-Hag, Reem et al. investigated the beneficial effects of a fiber-rich diet in the pathology of α-synuclein overexpressing (ASO) mice, a preclinical model of Parkinson's disease. They found that a prebiotic intervention attenuates motor deficits and reduces microglial reactivity in the substantia nigra and striatum. They extended these findings by doing scRNA sequencing, and they identified the expansion of a protective disease-associated microglia (DAM), a microglial subset previously described during the early stages of disease in several mouse models. Interestingly, the data indicate that microglia do not influence the behavior of ASO mice in the early stages of disease progression. However, microglia are the key mediators of the protective effects of prebiotic treatment in ASO mice. Overall, the conclusions of this paper are well supported by data, but some aspects should be considered to improve the manuscript.

      1) Colony-stimulating factor 1 receptor (CSF1R) inhibition has been widely used as a method for microglia depletion, however, the impact of this approach on peripheral immune cells is controversial. The authors elegantly showed that most gut-associated immune cell populations were unaffected by PLX5622. However, CSF1R signaling has been implicated in the maintenance of gut homeostasis. Could it be possible that PLX5622 treatment affects directly the gut microbiome composition? Are the beneficial changes in the gut microbiome composition of a prebiotic diet still maintained in combination with PLX5622? CSF1R inhibitors with low brain penetration such as PLX73086 and therefore unable to deplete resident microglia (Bellver- Landete, Victor et al., Nat Commun, 2019) would be helpful to rule out peripheral off-target effects.

      We agree that loss of benefits by the prebiotic diet following PLX5622 treatment is possibly due to changes to the microbiome, and cannot exclude this possibility. The mechanism of action of PLX5662 in reshaping the microbiome would most likely involve effects through changes in immune (or other) cell types in the gut, as the drug is not known to have direct effects on the microbiome. As described by the referee, we carefully profiled the mucosal immune system of mice treated with PLX5622 and control chow, and show minor changes associated with the drug. These are control experiments that very few previous studies using PLX5622 have performed, and suggest immune-mediated microbiome changes may be subtle. Further, we do not suggest in the manuscript that microbiome changes, in the first place, mediated the benefits of the prebiotic diet but rather focus the current study on the well-known effect of microglia depletion by PLX5622.

      Microbiome profiling and additional experiments transferring microbiota from diet-treated animals, with and without PLX5622, to naïve mice would be needed to determine the functional effects of gut bacteria on microglial activation and motor symptoms. The use of PLX73086 is also an excellent way to address this point, as are several additional approaches. Comprehensively investigating the indirect contributions of the microbiome to motor symptoms in ASO mice represent a separate series of studies, in our respectful opinion. Nonetheless, this is an important caveat of our work and we now include the following text in the Discussion section to address this point: “Our study does not rule out indirect effects of PLX5622 that include reshaping the microbiome to promote motor symptoms in prebiotic diet-fed mice”. We thank the referee for this comment.

      2) The authors claimed that microglial depletion eliminates the protective effects of the prebiotic diet in ASO mice by showing increased levels of aggregated aSyn in the SN (Fig 5G). However, microglial depletion also has the same effect on WT mice. How do authors interpret this result?

      The referee raises an astute point. Microglia appear to play a complex role in PD and mouse models, with both positive and negative effects demonstrated in various context (for example, PMIDSs: 29401614, 32086763). A primary and non-exclusive function of microglia is the removal of -synuclein accumulations (PMIDs: 32170061, 34555357). Importantly, there is no change in motor behavior in prebiotic-fed WT mice with or without PLX5622 treatment, as expected (see Figs. 5D-F). We have been careful in the manuscript to not suggest that microglia effects on motor symptoms are via a process that include -synuclein aggregates, as this has not been convincingly shown in this mouse model at the time point we are studying (ie., 22 weeks of age). While it would be straightforward to add a statement suggesting why -synuclein levels increase in WT mice on drug, our preferred remedy here is to point out this observation so it does not go unnoticed, but refrain from speculation in the absence of data since this is not a major point of the study. We have now inserted the statement “However, in prebiotic-fed WT and ASO mice, depletion of microglia significantly increased levels of aggregated αSyn in the SN, while levels in the STR remained unchanged (Figure 5G-H).” We thank the referee for this important comment.

      3) What is the rationale for doing a long-term (17 weeks) prebiotic intervention? Have the authors considered doing a short-term intervention? The prebiotic diet should change quickly the gut microbiome composition within a few days or weeks.

      We have previously shown that long-term microbiome depletion is required to impact motor performance in ASO mice (similar timeline as current prebiotic study) (Sampson et al., Cell, 2016). In unpublished data, short-term antibiotic treatment (4 weeks before motor testing) is unable to improve motor symptoms in ASO mice. Thus, we chose a timeframe for the current prebiotic studies guided by empiric data, but further details on dose intervals remain unknown. We agree that the microbiome should rapidly respond to the prebiotic diet, but it is unknown if this response is durable or would the ‘pre-treated’ microbiome profile re-establish at some time after removal of the experimental diet. We respectfully suggest that these more specialized studies are better suited for future projects.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript applies the framework of information theory to study a subset of cellular receptors (called lectins) that bind to glycan molecules, with a specific focus on the kinds of glycans that are typical of fungal pathogens. The authors use the concentration of various types of ligands as the input to the signaling channel, and measure the "response" of individual cells using a GFP reporter whose expression is driven by a promoter that responds to NFκB. While this work is overall technically solid, I would suggest that readers keep several issues in mind while evaluating these results.

      1) One of the largest potential limitations of the study is the reliance of the authors on exogenous expression of the relevant receptors in U937 cells. Using a cell-line system like this has several advantages, most notably the fact that the authors can engineer different reporters and different combinations of receptors easily into the same cells. This would be much more difficult with, say, primary cells extracted from a mouse or a human. While the ability to introduce different proteins into the cells is a benefit, the problem is that it is not clear how physiologically relevant the results are. To their credit, the authors perform several controls that suggest that differences in transfection efficiency are not the source of the differences in channel capacity between, say, dectin-1 and dectin-2. As the authors themselves clearly demonstrate, however, the differences in the properties of these signaling system are not based on receptor expression levels, but rather on some other property of the receptor. Now, it could be that the dectin-2 receptor is somehow just more "noisy" in terms of its activity compared to, say, dectin-1. This seems a somewhat less likely explanation, however, and so it is likely that downstream details of the signaling systems differ in some way between dectin-2 and the more "information efficient" receptors studied by the authors.

      The channel capacity of a cell signaling network depends critically on the distributions of the downstream signaling molecules in question: see the original paper by Cheong et al. (2011, Science 334 (6054), 354-8) and subsequent papers (notably Selimkhanov et al. (2014) Science 346 (6215), 1370-3 and Suderman et al. (2018) Interface Focus 8 (6), 20180039). The U937 cells considered here clearly don't serve the physiological function of detecting the glycans considered by the authors; despite the fact that this is an artificial cell line, the fact the authors have to exogenously express the relevant receptors indicates that these cells are not necessarily a good model for the types of cells in the body that actually have evolved to sense these glycan molecules.

      Signaling molecules readily exhibit cell-type-specific expression levels that influence cellular responses to external stimuli (Rowland et al.(2017) Nat Commun 8, 16009). So it is unclear that the distributions of downstream signaling molecules in U937 cells mirror those that would be observed in the immune cell types relevant to this response. As such, the physiological relevance of the differences between dectin-2 channel capacities and those exhibited by the other receptors are currently unclear.

      We appreciate Reviewer #1’s in-depth comments related to physiological relevance of the U937 cell. A big benefit of using information theory to investigate a biological communication channel is the realization of quantitative measurement of information that the channel transmits without having detailed measurement of spatiotemporal dynamics of receptors and downstream signaling cascades. In addition, the quantity of measured information itself in turn gives us a decent prediction about detailed signaling mechanisms by comparing the information quantity difference. For example, we investigated how transmission of glycan information from dectin-2 is synergistically modulated in the presence of either dectin-1, DC-SIGN or mincle. Our approach allows to investigate how individual lectins on immune cells contribute to glycan information transmission and be integrated in the presence other type of lectins. Therefore, the findings describe how physiologically relevant lectins are integrating the extracellular signal in a more defined way. Furthermore, we found that our model cell line has one order of magnitude higher expression of dectin-2 compared with primary human monocytes and exhibits a similar zymosan binding pattern (will be described in Recommendations for the authors and Figure R8).

      We fully agree that acquiring more information on the information transmission capability of primary immune cells would increase physiological relevance. In the revised manuscript we addressed this concern by comparing the receptor expression levels of our model cell lines with primary monocytes, for which we find an agreement of cellular heterogeneity. However, we would also like to point out that the very basic nature of our question, of how information stored in glycans is processed by lectins, is not tightly bound to these difference of primary cells and cell lines.

      Line 382: Finally, it is important to take into consideration that our conclusions came from model cell lines, which were used as a surrogate for cell-type-specific lectin expression patterns of primary immune cells. Human monocytes and dectin-2 positive U937 cells have comparable receptor densities and respond similar to stimulation with zymosan particles (SI Fig. 6A and B).

      2) Another issue that readers might want to keep in mind is that the details of the channel capacity calculation are a bit unclear as the manuscript is currently written. The authors indicate that their channel capacity calculations follow the approach of Cheong et al. (2011) Science 334 (6054), 354-8. However, the extent to which they follow that previous approach is not obvious. For instance, the calculations presented in the 2011 work use a combined bootstrapping/linear extrapolation approach to estimate the mutual information at infinite population size in order to deal with known inaccuracies in the calculation that arise from finite-size effects. The Cheong approach also deals with the question of how many bins to use in order to estimate the joint probability distribution across signal and response.

      They do this by comparing the mutual information they calculate for the real data with that calculated for random data to ensure that they are not calculating spuriously high mutual information based on having too many bins. While the Cheong et al. paper does a great job explaining why these steps need to be undertaken, a subsequent paper by Suderman et al. (2017, PNAS 114 (22), 5755-60) explains the approach in even greater detail in the supporting information. Those authors also implemented several improvements to the general approach, including a bootstrap method for more accurately estimating the error in the mutual information and channel capacity estimates.

      The problem here is that, while the authors claim to follow the approach of Cheong et al., it seems that they have re-implemented the calculation, and they do not provide sufficient detail to evaluate the extent to which they are performing the same exact calculation. Since estimates of mutual information are technically challenging, specific details of the steps in their approach would be helpful in order to understand how closely their results can be compared with the results of previous authors. For instance, Cheong et al. estimate the "channel capacity" by trying a set of likely unimodal and bimodal distributions for the input to the channel, and choosing the maximal value as the channel capacity. This is clearly a very approximate approach, since the channel capacity is defined as the supremum over an (uncountably infinite) set of input probability distributions. In any case, the authors of the current manuscript use a different approach to this maximization problem. Although it is a bit unclear how their approach works, it seems that they treat the probability of each input bin as an independent parameter (under the constraint that the probabilities sum to one) and then use an optimization algorithm implemented in Python to maximize the mutual information. In principle, this could be a better approach, since the set of input distributions considered is potentially much larger. The details of the optimization algorithm matter, however, and those are currently unclear as the paper is written.

      We thank Reviewer #1’s recommendation for increasing the legitimacy of the calculation. In the revised manuscript we tried to explain channel capacity calculation procedures in more detail with statistical approaches that adopted from Cheong et al. (2011) and Suderman et al. (2018) (SI section 1 and 2). Furthermore, we decide the number of binning from not only random dataset but also the number of total samples as shown below:

      Figure R1. A) Extrapolated channel capacity values of random dataset at infinitely subsampled distribution under various total number of samples and output binning. The white line in the heatmap represents the channel capacity value at 0.01 bit. B) Extrapolated channel capacity values at infinite subsample size of U937 cells’ input (TNF-a doses) and output (GFP reporter) response.

      Figure R1 describes channel capacity values from random (A) and experimental dataset (B, TNFAR + TNF-a). The channel capacity values from random data indicates the dependence of channel capacity on the number of the output binning and total number sample. According to this heatmap, we decided the allowed bias as 0.01 bits as shown in contour line shown in Figure R1A. Since our minimum dataset that used for channel capacity calculation in the absence of labelled input is near 90,000, the expected bias in channel capacity calculation is therefore less than 0.01 bits in binning range from 10 to 1000 as shown in Figure R1A.

      Furthermore, we demonstrated mutual information maximization procedure using predefined unibimodal input distribution and compared with the systematic method that we used in the work. We found that there is no noticeable difference in channel capacity value between two approaches (SI Figure 3M).

      3) Another issue to be careful about when interpreting these findings is the fact that the authors use logarithmic bins when calculating the channel capacity estimates. This is equivalent to saying that the "output" of the cell signaling channel is not the amount of protein produced under the control of the NFκB promoter, but rather the log of the protein level. Essentially, the authors are considering a case where the relevant output of the system is not the amount of protein itself, but the fold change in the amount of protein. That might be a reasonable assumption, especially if the protein being produced is a transcription factor whose own promoters have evolved to detect fold changes. For many proteins, however, the cell is likely responsive to linear changes in protein concentration, not fold changes. And so choosing the log of the protein level as the output may not make sense in terms of understanding how much information is actually contained in this particular output variable. Regardless, choosing logarithmic bins is not purely a matter of convenience or arbitrary choice, but rather corresponds to a very strong statement about what the relevant output of the channel is.

      We understand Reviewer #1’s concern regarding the choice of log binning. We found that if the number of binning is higher than 200, no matter the binning methods, including linear, logarithmic or equal frequency, the estimated channel capacities in each binning number are converged into the same value. The only difference is how quickly the values approach the converged channel capacity as increasing the binning number (shown in Figure R2). In the revised manuscript, we used linear binning to represent more relevant protein signaling as the Reviewer mentioned. Note that the channel capacity values calculated from linear binning do not show noticeable different from our previously calculated channel capacity values.

      On the other hand, linear binning generates significant bias, if we consider labelled input (i.e., continuous input) into channel capacity calculation, due to the increase of binning in input region.

      Figure R2. Output binning number and binning method dependence of channel capacity value for experimental dataset. The inset plots show the relative difference of channel capacity value to the maximum channel capacity value in the entire binning range (i.e., from 10 to 1000) of the corresponding binning method.

      According to Reviewer #1’s comment we have changed the binning method from logarithmic binning to linear binning in the whole experimental dataset except in the presence of labelled input (i.e., dectin-2 antibody). If we consider channel capacity between labelled input and NF-kB reporter, equal frequency binning is used for every layer of the channel capacity (i.e., labelled input-binding, binding-GFP, labelled input-GFP)

      Reviewer #2 (Public Review):

      My expertise is more on the theoretical than the experimental aspects of this paper, so those will be the focus of these comments.

      Signal transduction is an important area of study for mathematical biologists and biophysicists. This setting is a natural one for information-theoretic methods, and such methods are attracting increasing research interest. Experimental results that attempt to directly quantify the Shannon capacity of signal transduction are particularly interesting. This paper represents an important contribution to this emerging field.

      My main comments are about the rigorousness and correctness of the theoretical results. More details about these results would improve the paper and help the reader understand the results.

      We understand reviewer #2’s comment related with rigorousness and correctness of the theoretical results of this work. In the revised manuscript, we added following contents to help the reader to better understand the channel capacity calculation procedures.

      • General illustrative introduction regarding how we measured input and output dataset and how we handle those data to prepare joint probability distribution shown in SI section 1.1 and 1.2.

      • Exemplified mutual information maximization procedure using experimental and arbitrary dataset shown in SI section 1.3.

      The calculation of channel capacity, given in the methods, is quite a standard calculation and appears to be correct. However, I was confused by the use of the "weighting value" w_i, which is not specified in the manuscript. The input distribution appears to be a product of the weight w_i and the input probability value p_i, and these appear always to occur together as a product w_i p_i. (In joint probabilities w_i p(i,j), the input probability can be extracted using Bayes' rule, leaving w_i p_i p(j|i).) This leads met wonder two things. First, what role does w_i play (is it even necessary)? Second, of particular interest here is the capacity-achieving input distribution p_i, but w_i obscures it; is the physical input distribution p_i equal to the capacity-achieving distribution? If not, what is the meaning of capacity?

      We thank Reviewer #2’s comment regarding the arbitrariness of the weightings. We realize there was a lack of explanation on the weighting values in the original manuscript. 𝑃x(𝑖) is a marginal probability distribution of input from the original dataset and 𝑃x'(𝑖) is the marginal probability distribution of modified input that maximize the mutual information. In usual case 𝑃x(𝑖) is not equal to 𝑃x'(𝑖) and therefore one needs to find 𝑃x'(𝑖) from 𝑃x(𝑖). Because 𝑃x'(𝑖) is a linear combination of 𝑃x(𝑖), it can be expressed as 𝑤(𝑖)𝑃x(𝑖) , where 𝑤(𝑖) is the weightings, under constraint ∑input/i 𝑤(𝑖)𝑃x (𝑖) = 1 . The changed input distribution, in turn, modifies the joint probability distribution as 𝑃'xy (𝑖, 𝑗) = 𝑤(𝑖)𝑃xy)(𝑖, 𝑗). To help readers understand of this work we expanded the Appendix with illustrative descriptions.

      A more minor but important point: the inputs and outputs of the communication channel are never explicitly defined, which makes the meaning of the results unclear. When evaluating the capacity of an information channel, the inputs X and outputs Y should be carefully defined, so that the mutual information I(X;Y) is meaningful; the mutual information is then maximized to obtain capacity. Although it can be inferred that the input X is the ligand concentration, and the output Y is the expression of GFP, it would be helpful if this were stated explicitly.

      We agree with Reviewer’s suggestion for better description of input and output in the manuscript. Therefore, we have modified Figure 1 A and B and the main text to describe the source of input and output much clearly, as follows:

      Line 92: Accounting for the stochastic behavior of cellular signaling, information theory provides robust and quantitative tools to analyze complex communication channels. A fundamental metric of information theory is entropy, which determines the amount of disorder or uncertainty of variables. In this respect, cellular signaling pathways having high variability of the initiating input signals (e.g. stimulants) and the corresponding highly variable output response (i.e. cellular signaling) can be characterized as a high entropy. Importantly, input and output can have mutual dependence and therefore knowing the input distribution can partly provide the information of output distribution. If noise is present in the communication channel, input and output have reduced mutual dependence. This mutual dependence between input and output is called mutual information. Mutual information is, therefore, a function of input distribution and the upper bound of mutual information is called channel capacity (SI section 1) (Cover and Thomas, 2012). In this report, a communication channel describes signal transduction pathway of C-type lectin receptor, which ultimately lead to NF-κB translocation and finally GFP expression in the reporter model (Fig. 1A). To quantify the signaling information of the communication channels, we used channel capacity. Importantly, the channel capacity isn’t merely describing the resulting maximum intensity of the reporter cells. The channel capacity takes cellular variation and activation across a whole range of incoming stimulus of single cell resolved data into account and quantifies all of that data into a single number.

    1. Author Response

      Reviewer #1 (Public Review):

      The software presented in this paper is well documented and represents a significant achievement that breaks new ground in terms of what is possible to render and explore in the web browser. This tool is essential for the exploration of SC2 data, but equally useful for the tree of life and other tree-like data sets.

      Thank you for reviewing my work and for this generous assessment.

      Reviewer #2 (Public Review):

      This manuscript describes a web-based tool (Taxonium) for interactively visualizing large trees that can be annotated with metadata. Having worked on similar problems in the analysis and visualization of enormous SARS-CoV-2 data sets, I am quite impressed with the performance and "look and feel" of the Taxonium-powered cov2tree web interface, particularly its speed at rendering trees (or at least a subgraph of the tree).

      Thank you for the kind words.

      The manuscript is written well, although it uses some technical "web 2.0" terminology that may not be accessible to a general scientific readership, e.g., "protobuf" (presumably protocol buffer) and "autoscaling Kubernetes cluster". The latter is like referring to a piece of lab equipment, so the author should provide some sort of reference to the manufacturer, i.e., https://kubernetes.io/.

      Thank you for flagging this. I have now replaced the colloquial "protobuf" with "protocol buffer". I have now provided a URL for Kubernetes. It is always difficult to judge how much to explain technical terms. I certainly agree that many people will be unfamiliar with, for instance, protocol buffers, but an explanation of what they are (which may not be particularly important for understanding Taxonium) can sometimes overshadow more important details. So my preference in that particular case is for an interested reader to research the unfamiliar term.

      In other respects, the manuscript lacks some methodological details, such as exactly how the tree is "sparsified" to reduce the number of branches being displayed for a given range of coordinates.

      This is an important point also raised by Reviewer 3. I have added a new section in the Materials and Methods which discusses this in some detail.

      Some statements are inaccurate or not supported by current knowledge in the field. For instance, it is not true that the phylogeny "closely approximates" the transmission tree for RNA viruses.

      I agree that this was an overly broad claim, and have softened it, now saying:

      "The fundamental representation of a viral epidemic for genomic epidemiology is a phylogenetic tree, which approximates the transmission tree and can allow insights into the direction of migration of viral lineages."

      Mutations are not associated with a "point in the phylogeny", but rather the branch that is associated with that internal node.

      I have changed this as suggested.

      A major limitation of displaying a single phylogenetic tree (albeit an enormous one) is that the uncertainty in reconstructing specific branches is not readily communicated to the user. This problem is exacerbated for large trees where the number of observations far exceeds the amount of data (alignment length). Hence, it would be very helpful to have some means of annotating the tree display with levels of uncertainty, e.g., "we actually have no idea if this is the correct subtree". DensiTree endeavours to do this by drawing a joint representation of a posterior sample of trees, but it would be onerous to map a user interface to this display. I'm raising this point as something for the developers to consider as a feature addition, and not a required revision for this manuscript.

      I entirely agree with this point. I have added a sentence in the discussion:

      "Even where sequences are accurate, phylogenetic topology is often uncertain, and finding ways to communicate this at scale, building on prior work [Densitree citation] would be valuable."

      The authors make multiple claims of novelty - e.g., "[...] existing web-based tools [...] do not scale to the size of data sets now available for SARS-CoV-2" and "Taxonium is the only tool that readily displays the number of independent times a given mutation has occurred [...]" - that are not entirely accurate. For example, RASCL (https://observablehq.com/@aglucaci/rascl) allows users to annotate phylogenies to examine the repeated occurrence of specific mutations. Our own system, CoVizu, also enables users to visualize and explore the evolutionary relationships among millions of SARS-CoV-2 genomes, although it takes a very different approach from Taxonium. Taxonium is an excellent and innovative tool, and it should not be necessary to claim priority.

      I agree that comparisons with existing tools are difficult and often provide a sense of unnecessary competition. I attempted to be quite careful in the specific section focused on comparison, but may have been less careful earlier on. The intent with this first sentence in the abstract was to provide a succinct description of the gap that Taxonium was developed to fill with "however, existing web-based tools for analysing and exploring phylogenies do not scale to the size of datasets now available for SARS-CoV-2". I have now removed the words "analysing and", focusing on the exploration of phylogenies. I think this new sentence is defensible in that valuable tools such as CoVizu intentionally do not explore a phylogeny directly but instead take a multi-level approach, and this new sentence better matches the comparisons in the paper. In the second sentence, I have removed the phrase "is the only tool that", which I agree adds little and may not be accurate, depending on one's interpretation of "readily". Thank you for these points.

      Although the source code (largely JavaScript with some Python) is quite clean and has a consistent style, there is a surprising lack of documentation in the code. This makes me concerned about whether Taxonium can be a maintainable and extensible open-source project since this complex system has been almost entirely written by a single developer. For example, usher_to_taxonium.py has a single inline comment (a command-line example) and no docstring for the main function. JBrowsePanel.jsx has a single inline comment for 293 lines of code. There is some external documentation (e.g., DEVELOPMENT.md) that provides instructions for installing a development build, but it would be very helpful to extend this documentation to describe the relationships among the different files and their specific roles. Again, this is something for the developers to consider for future work and not the current manuscript.

      This is an entirely fair comment. The version of Taxonium presented in the manuscript is "2.0", which is a new version built from scratch with considerably less technical debt than the version that preceded it. Its technical strengths are that (with the exception of the backend) it is relatively well-modularised into functional components. But the limitations that the reviewer notes with respect to commenting are entirely fair. What I would say is that in the time since this manuscript was submitted, several important features have been added by an external collaborator, Alex Kramer, most notably the Treenome Browser (https://www.biorxiv.org/content/10.1101/2022.09.28.509985v1). I hope that the ability of Alex to add these features with little need for support provides some evidence of Taxonium's extensibility. But I acknowledge there is room for improvement.

      Reviewer #3 (Public Review):

      The paper succinctly provides an overview of the current approaches to generating and displaying super-large phylogenies (>10,000 tips). The results presented here provide a comprehensive set of tools to address the display and exploration of such phylogenies. The tools are well-described and comprehensive, and additional online documentation is welcome.

      The technical work to display such large datasets in a responsive fashion is impressive and this is aptly described in the paper. The author rightly decides that displaying large phylogenies is not simply a matter of rendering "more nodes", and so in my eyes, the major advancement is the approach used to downsample trees on-the-fly so that the number of nodes displayed at one time is manageable. This is detailed only briefly (Results section, 1st paragraph, 2 sentences). I would like to see more discussion about the details of this approach.

      Thank you for this point, also raised by Reviewer 2. I have now added a lengthy section on this in the Materials and Methods, which I hope is helpful. The approach is not especially sophisticated, but it does the job and runs quickly.

      Examples that came up while exploring the tool: the (well implemented) search functionality reports results from the entire tree (e.g. in Figure 4, the number of red circles is not a function of zoom level), how does this interact with a tree showing only a subset of nodes?

      Yes, this is an important feature which I perhaps did not do justice to in the write-up. I have included in the new section in the Materials and Methods a paragraph discussing search results:

      "In order to ensure that search results are always comprehensive, but at the same time to avoid overplotting, we take the following approach::

      ● Searches are performed across every single node on the tree to select a set of nodes that match the search. The total number of matches is displayed in the client.

      ● If fewer than 10,000 matches are detected, these are simply displayed in the client as circles

      ● If more than 10,000 matches are detected, the results are sparsified using the method above, and then displayed.

      ● Upon zooming or panning, the sparsification is repeated for the new bounding box."

      How is the node order chosen with regards to "nodes that would be hidden by other nodes are excluded" and could this affect interpretations depending on the colouring used?

      This perhaps was slightly sloppy language which did not directly describe the implementation. I have now rephrased this to "only nodes that overlap other nodes are excluded", as we don't in fact consider a notion of z-index when doing this. The way the sparsification works (now better described) means that the nodes excluded are determined essentially by position and I don't foresee this introducing particular biases, but this was an insightful point to raise.

      Taxonium takes the approach of displaying all available data (sparsification of nodes notwithstanding). Biases in the generation of sequences, especially geographical, will therefore be present (especially so in the two main datasets discussed here - SARS-CoV-2 and monkeypox). This caveat should be made explicit.

      This is certainly true. I have added this new paragraph in the Discussion:

      "A further challenge is the vastly different densities of sampling in different geographic regions. Because Cov2Tree does not downsample sequences from countries which are able to sequence a greater proportion of their cases, the number of tips on a tree is not indicative of the size of an outbreak and in some cases even inferences of the directionality of migration may be confounded. There would be value in the development of techniques that allow visual normalisation of trees for sampling biases, which might allow for less biased phylogenetic representations without downsampling."

      Has the author considered choosing which nodes to exclude for sparsified trees in such a way as to minimise known sampling biases?

      The last sentence of the new paragraph above alludes to a sort-of-similar approach. I hadn't directly considered the approach the reviewer suggests. It is an interesting idea. The downsampling algorithm has to be very computationally inexpensive but it would be interesting to explore ways to do this. I am tracking this in https://github.com/theosanderson/taxonium/issues/437.

      Interoperability between different software tools is discussed in a technical sense but not as it pertains to discovering the questions to ask of the data. As an example, spotting the specific mutations shown in figure 3 + 4 is not feasible by checking every position iteratively; instead, the ability to have mutations flagged elsewhere and then seamlessly explore them in Taxonium is a much more powerful workflow. This kind of interoperability (which Taxonium supports) enhances the claim of "providing insights into the evolution of the virus".

      Thank you for flagging this point -- I am very excited by the growing ecosystem of interoperable tools. You are absolutely right that most of the insights Taxonium can bring into evolution rely also on this broader ecosystem. I have added a florid sentence in the concluding paragraph: "It forms part of an ecosystem of open-source tools that together turn an avalanche of sequencing data into actionable insights into ongoing evolution."

      The prosaic reason I don't discuss Taxonium's interoperability features in more detail in this manuscript is that it aims to describe the version of Taxonium I initially developed, and these features were developed collaboratively by a broader group later on (and after deposition of this preprint).

      Taxonium has been a fantastic resource for the analysis of SARS-CoV-2 and this paper fluently presents the tool in the context of the wider ecosystem of bioinformatic tools in use today, with the interoperability of the different pieces being a welcome direction.

    1. Author Response

      Reviewer #1 (Public Review):

      “This manuscript reports the results of studies on the effects of an ActRIIB-Fc ligand trap inhibitor of myostatin on muscle contractures that develop when brachial plexus nerve roots are severed at 6 after birth. One component of this pathological response seems to be a failure to add sarcomeres as the skeleton grows resulting in short muscles. The authors use a carefully performed set of animal studies to test the effects of the ligand trap on denervation-induced limitations in range of motion in young mice. They also investigate several biochemical mechanisms that might contribute to contractures and be modified by the ligand trap. Finally, the test for gender discordance in the protective effect of a proteasome inhibitor against contractures. The major finding of these studies is that the ligand trap improved the range of motion at the elbow and shoulder in female mice but not in males. The major caveat to interpreting the data is that group sizes are relatively small such that the study may have been underpowered to detect smaller effects on a range of motion and biochemical endpoints.”

      Thank you very much for your thoughtful review of our manuscript. We have taken your feedback regarding the interpretation of our data into consideration, and revised our manuscript accordingly.

      We appreciate the reviewer’s careful scrutiny of our group sizes. As mentioned in the Statistical analysis section of our Materials and Methods, we included at least 6 mice per group for all range of motion and physiological endpoints. Based on an a priori power analysis, this is the number of mice per group necessary to detect a 10° difference in contractures and a 0.2 µm difference in sarcomere lengths at 80% power between experimental conditions. However, the small size of the forelimb muscles, especially following denervation, precluded the investigation of all biochemical parameters in each muscle. Therefore, we used we used smaller subgroup sizes for certain biochemical endpoints (Akt, Smad2/3, and Atrogin-1). In our revised Discussion, we acknowledge our study may be underpowered to detect smaller effects in these parameters of protein dynamics.

      Discussion (lines #461-468): First, the small size of our denervated muscles precluded the use of the same muscles for all analyses, instead requiring smaller subgroup sizes as well as different muscles for certain biochemical endpoints (Akt, Smad2/3, MuRF1, and Atrogin-1). We therefore acknowledge that our study may be underpowered to detect smaller effects in certain parameters of protein dynamics, specifically signaling proteins and ubiquitin ligases. We also acknowledge that the precision of our findings would be further enhanced with the use of the same muscle type across all of our morphological, physiological, and biochemical analyses.

      Reviewer #2 (Public Review):

      “The manuscript by Emmert et al. describes an original and straightforward study demonstrating the utility of targeted therapy in a neonatal brachial plexus injury (NBPI) mouse model. The authors sought to investigate whether pharmacologic inhibition of MSTN signaling using a soluble decoy receptor (ACVR2B-Fc) could preserve longitudinal muscle growth and prevent contractures after NBPI. More specifically, through in vivo experiments using wild-type female and male mice, the authors assessed the impact of inhibiting the MSTN signaling in basal and pathophysiological conditions, on developmental, morphological, and biomechanical parameters, and on several biochemical markers of protein synthesis, protein degradation, and their associated signaling pathways, in forelimb skeletal muscle.

      The authors provide multiple lines of compelling evidence that ACVR2B-Fc improves skeletal muscle biology and function in NBPI mice, provokes hypertrophy, rescues longitudinal growth, and impedes neuromuscular contractures in denervated muscles. Rather than improving the condition independently of the sex, it appears selective to the muscles of female mice showing thus a sex-specific improvement, and therefore the discovery of a sex dimorphism. The experiments also try to provide a mechanistic explanation, though it is incompletely clear why and how it is happening at the end.

      Overall, the study details a promising intervention in NBPI mice and begins to highlight a pathway that can be exploited for this goal. While the reviewer did enjoy the manuscript, and the conclusions of this paper are mostly well supported by data, there are certain deficiencies that cannot be overlooked.

      Strengths:

      A) This study includes a clear-cut demonstration leading to a coherent narrative of a potential intervention for children affected by NBPI, which is well supported by prior literature mentioning the effects of palliative mechanical solutions and investigating the effects of pharmacologic strategies for the prevention of muscle contractures.

      B) This study uses a pharmacologic chronic treatment, in vivo, on female and male neonatal mice to investigate the effects and relevant mechanisms of the MSTN signaling inhibition, using a soluble decoy receptor (ACVR2B-Fc), from the whole organism into the skeletal muscle and further into cellular signaling pathways.

      C) This study provides promising data about the effects of the MSTN signaling inhibition on developmental, morphological, and biomechanical parameters, as well as biochemical markers in the NBPI mice.

      D) This study underlines the importance of using female and male mice during experimental procedures, clearly showing that sex dimorphism can produce very different results.

      E) The manuscript is well written, well organized, and cogent.

      Weaknesses and Limitations:

      A) This study attempts to provide mechanistic information to support and explain the results observed. However, the analysis remains superficial and should go further into detail especially in investigating completely the different molecular pathways considered, and the non-canonical alternatives.

      B) The use of different muscles for biochemical analyses compared to the muscles used for developmental, morphological, and biomechanical parameters limits the interpretation of data, which could be due to muscle differences instead for example.

      C) The interpretation of the findings should be done carefully, knowing that it is an MSTN/Activin A signaling blockade and not an MSTN inhibition alone.

      D) The conclusion would be reinforced with data obtained at later time points (8 and/or 12 weeks).”

      Thank you very much for your comprehensive and insightful review. Your detailed comments and suggestions have not only allowed us to improve our current manuscript with greater clarity and additional data, but they also reinforce our plan to elucidate biochemical mechanisms more completely in future studies. In this revision, we have provided additional experiments to strengthen our analysis of known pathways downstream of MSTN signaling, addressed the use of different muscles as well as the four-week time point, and discussed the potential implications stemming from the broad specificity of the ligand trap. We certainly share your enthusiasm about dissecting the different molecular pathways and non-canonical alternatives. Indeed, we intend to interrogate these mechanistic underpinnings with the same rigor with which we obtained our physiological and translational findings, which cannot be completed within the scope of the current study. Follow-up studies will focus on exploring non-canonical alternatives and investigating long-term effects at skeletal maturity and beyond.

      Reviewer #3 (Public Review):

      “This timely manuscript describes the sex dimorphisms in neonatal development as it applies to muscle injury and denervation. More and more studies are identifying sex differences in skeletal muscle function and dysfunction. This is one more study to point out differences. A missing piece to the field and this study are the mechanistic links between skeletal muscle function/dysfunction and sex differences. This paper starts to point to a mechanism highlighting the non-canonical AKT pathway. This is a very wellwritten manuscript with a clear experimental plan and workflow. I have no major concerns.

      My biggest question is the molecular mechanism linking sex differences and skeletal muscle function and dysfunction. However, this is perhaps a follow-up study to the already complete study the authors present.”

      Thank you very much for your kind words and enthusiasm! We likewise find it important to improve our understanding of sex differences in muscle function/dysfunction, and are committed to unraveling the molecular mechanism(s) that link them in future studies.

    1. Author Response

      Reviewer #1 (Public Review):

      According to the space-time wiring hypothesis proposed by (Kim, Greene et al. 2014), the BC-off SAC circuit mimics the structure of a Reichardt detector; BCs closer to SAC soma have slower dynamics (they can be more sustained, have a delay in activation or slower rise time), while BCs further away are more transient. Later studies confirmed the connectivity and expanded the model on SACs (Ding, Smith et al. 2016, Greene, Kim et al. 2016). However, physiological studies that used somatic recordings to assess the BC properties at different dendritic distances were inconclusive (Stincic, Smith et al. 2016, Fransen and Borghuis 2017). Here, the authors used iGluSnFR, a glutamate sensor to measure the signals impinging on SAC dendrites. Their experimental findings align with the space-time wiring hypothesis, revealing sustained responses closer to SAC soma (mediated by prolonged release from type 7 BCs, and only slightly affected by amacrine cells), which according to their simulated SAC should produce a substantial increase in direction selectivity (DS).

      I find the work to be clear and well presented. However, I do have some reservations with the findings:

      Main points:

      1) Very low number of cells examined in the key experiment presented in the first figure. The authors used a viral approach to express flex- iGluSnFR in SACs in Chat-Cre mice. Sometimes (apparently twice) the construct was expressed in individual SACs - this is a very underpowered experiment! The low number of successes precludes adequately judging the validity of the findings.

      We agree with the reviewer that measuring iGluSnFR signals from single starburst dendrites is a powerful approach to confirm space-time wiring hypothesis. To bolster our data, we doubled our n number (updated Figure 1C and D, n = 66 ROIs; 20 dendrites and 4 retinas/FOVs). It should be noted that the results from these experiments are also validated on a larger scale across the starburst plexus (Figure 2).

      2) The model doesn't represent key known properties of BC-SACs and the interactions within SAC dendrites. First, the authors decided to construct a ball and stick model that doesn't consider the dendritic morphology of the starburst cell. A stimulus moving over a SAC is expected to engage multiple dendrites with complex spatiotemporal patterns that are expected to have a substantial effect on the voltages recorded on the investigated dendrite (Koren, Grove et al. 2017). For example, the dendrites in the orthogonal orientation will be activated at about the same time as the proximal dendrites; how such strong input will affect dendritic integration is unclear but should be taken into account in the model. Second, the authors assume a similar peak BC drive between proximal and distal inputs. However, a recent study found an enhanced glutamate release from proximal BCs, mediated by cholinergic SAC drive ((Hellmer, Hall et al. 2021); not cited). How different release amplitude would affect the conclusions of the model?

      It is well established that individual starburst dendritic sectors are relatively electrically isolated from each other (Miller & Bloomfield, 1983; Euler et al., 2002; Tukker et al., 2004, Poleg-Polsky et al., 2018) and thus we used a simple ball and stick to model direction selectivity in starburst dendrites.

      Related to the Reviewer’s point, in the paper we explicitly acknowledge that the simple ball and stick model will not capture important network interactions that are expected to impact direction selectivity (e.g. SAC-SAC inhibition). We suggest this as a future line of investigation.

      The idea of different synaptic weights across the starburst dendrite is an interesting one. If the proximal inputs are stronger relative to distal ones as the Reviewer suggests, it might be expected that the direction selectivity will be further enhanced. However, in a preliminary analysis, we did not find strong evidence for directionselectivity or sensitivity to MLA, to support the idea of cholinergic modulation.

      3) Another reason for including an accurate dendritic morphology is in the differences in the number of BCs that target a cell. Because SAC dendrites cover the entire receptive field area, type 7 BCs, which occupy the proximal third of the dendrites (Ding, Smith et al. 2016, Greene, Kim et al. 2016), are expected to cover only 11% of the area covered by SAC dendrites (1/3 x 1/3 = 1/9) and correspondingly mediate just 11% of the BC drive. A nonbifurcating model presented here would dramatically overrepresent their contribution to SAC responses. ??

      We have estimated BC numbers directly from the connectomics data which takes into account starburst morphology (Ding et al., 2016). To capture the heterogeneity of BCs that might be encountered at the level of single dendrites, in the revised manuscript, we have averaged responses over many trials in which the precise BC numbers varied according to the probability density functions observed in the connectomics data set (Ding et al., 2016). The details of the model parameters are now provided in the Methods section.

      4) (Fransen and Borghuis 2017) found that off-SACs have a more pronounced distinction in the time to peak than on-SACs. I found it surprising that given the large body of work demonstrating the effectivity of the viral approach in expressing iGluSnFR in off BC (Borghuis, Marvin et al. 2013, Franke, Berens et al. 2017, Szatko, Korympidou et al. 2020, Gaynes, Budoff et al. 2021, Strauss, Korympidou et al. 2021), that the authors did not compare between on and off SAC populations.

      It is possible that the kinetic differences are more pronounced for inputs to OFF starbursts. However, we observed a weaker iGluSnFR expression in the OFF starburst layer and the S/N was below what was required for our analysis. Therefore, we focused on the ON starburst.

      5) Recent work (Gaynes, Budoff et al. 2021) suggests that BCs' responses to motion and to static flashes have distinct dynamics. However, the current manuscript tests responses to flashed stationary stimuli experimentally, and then combines them in a simulation modeling a moving stimulus. This potential limitation of the study should at least be discussed.

      The Reviewer correctly points out that static and motion stimuli might have distinct dynamics (especially ‘emerging’ stimuli). We now describe this limitation of our study and discuss the findings of Gaynes et al. (2022)

      We have revised our model to take BC release rates truncated according to stimulus velocity and size to more appropriately represent the duration of the stimulus.

      Reviewer #2 (Public Review):

      The authors present a nice series of imaging experiments confirming previous anatomical and electrophysiological evidence for the "space-time wiring" model for directionally selective responses in SAC dendrites. Fluorescence measurements with a genetically encoded glutamate indicator show that excitatory inputs to proximal SAC dendrites are more sustained than distal dendrites. Although the signals are shaped by surround inhibition, the fundamental differences persist with inhibition blocked, suggesting intrinsic differences in the synaptic release processes in different cone bipolar cell types.

      The authors examine iGluSnFR dynamics in individual SACs (Figure 1) and in a population of SACs (Figure 2). The latter is possible because distal inputs to all SACs occur deeper in the IPL and so can be imaged separately from the proximal inputs, and it permits the measurement of many more synapses in each experiment. The former approach is particularly powerful, however, because it allows careful mapping of the different types of inputs along the dendritic axis of individual SACs. This experiment was performed in only seven dendrites in two retinas, however; consequently, the confidence intervals for any spatial fitting would be quite broad. This experiment would be strengthened with additional data from more dendrites.

      We have now the increased n number for Figure 1 in the revised manuscript (updated Figure 1C and D, n = 66 ROIs; 20 dendrites and 4 retinas/FOVs). Please see response to the Reviewer #1.

      It is very interesting that white noise stimuli do not pull out the kinetic differences - interesting enough to merit inclusion in the primary figures rather than as a supplement. These results seem valuable to our understanding of DS processing, but the implications remain unclear. Is it really the case that DS is eliminated - or even substantially degraded - when motion stimuli are presented atop some background (i.e., conditions in which the circuit is continuously stimulated)? Are the distinct kinetics are brought about by abrupt, large changes in luminance - if so, wouldn't one expect much weaker DS in response to drifting sinusoidal gratings?

      The question of how direction is encoded in the natural scene is very interesting but beyond the scope of this study. We presented a preliminary white noise analysis to show that our recordings are consistent with other recent reports (e.g. Strauss et al., 2022) which cast doubt on the space-time wiring model, rather than to directy address this specific issue. It should also be noted that complementary inhibitory and/or other intrinsic dendritic mechanisms may ensure that dendrites continue to remain DS in regimes in which BC mechanisms appear to be ineffective.

      In the introduction (p. 3A) the authors suggest that space clamp errors could distort EPSC kinetics, causing EPSCs arriving distally to appear more transient than those arriving more proximally. This seems contrary to what one would typically expect: cable theory would predict that more distal inputs ought to be filtered more, therefore appearing more prolonged, not more transient, than proximal inputs. It does not seem necessary to cast doubt on the previous results (Fransen and Borghuis, 2017) to motivate sufficiently the present experiments. One might simply point out that electrophysiological recordings do not provide precise information regarding the anatomical location of synaptic inputs.

      In the revised manuscript, we changed the text according to the Reviewer’s suggestion.

      Reviewer #3 (Public Review):

      In the study "Spatiotemporal properties of glutamate input support direction selectivity in the dendrites of retinal starburst amacrine cells", Srivastava, deRosenroll, and colleagues study the role of excitatory inputs in generating direction selectivity in the mouse retina. Computational and anatomical studies have suggested that the "space-time-wiring" model contributes to direction-selective responses in the mammalian retina. This model relies on temporally distinct excitatory inputs that are offset in space, thereby yielding stronger responses for motion in one versus the other direction. Conceptually, this is similar to the Reichardt detector of motion detection proposed many decades ago. So far, however, there is little functional evidence for the implementation of the space-time-wiring model. Here, Srivastava, deRosenroll and colleagues use local glutamate imaging in the ex-vivo mouse retina combined with biophysical modeling to test whether temporally distinct and spatially offset excitatory inputs might generate direction-selective responses in starburst amacrine cells (SACs). Consistent with the space-time-wiring model, they find that glutamatergic inputs at proximal SAC dendrites are more sustained than inputs at distal dendrites. This finding was consistent across different sizes of stationary, flashed stimuli. They further linked the sustained input component to the genetically identified type 7 bipolar cell and showed that the difference in temporal responses across proximal and distal inputs was independent of inhibition, but rather relied on excitatory interactions. By estimating vesicle release rates and building a simple biophysical model, the authors suggest that next to already established mechanisms like asymmetric inhibition, excitatory inputs with distinct kinetics contribute to direction-selective responses in SACs for slow and relatively large stimuli.

      In general, this study is well-written, the data is clearly presented and the conclusion that (i) the temporal kinetics of excitatory inputs varies along SAC dendrites and that (ii) this might then contribute to direction selectivity is supported by the data. The study addresses the important question of how excitation contributes to the generation of direction-selective responses. There have been several other studies published on this topic recently, and I believe that the results will be of great interest to the visual neuroscience community.

      However, the authors should address the following concerns:

      • They should demonstrate that differences in response kinetics between proximal and distal dendrites are unrelated to differences in signal-to-noise ratio.

      In response to the Reviewer’s comment, we have now added new plots to supplementary Figure S1 (A, B) that show that the response kinetics are not strongly related to signal strength.

      • To demonstrate consistency across recordings/mice, the authors should indicate data points from different recordings (e.g. Fig. 2C).

      In the updated Figure 2C-E, we have now added the average values for each recording to indicate the consistency/variability in the data.

      • The authors mention in the introduction that the space-time-wiring model is conceptually similar to other correlation-type motion detectors that have been experimentally verified in different species. It would be great to expand on the similarity and differences of the different mechanisms in the Discussion, especially focusing on Drosophila where experimental evidence at the synaptic level exists.

      It should be noted that the results of the influential Nature paper describing the spacetime wiring of inputs to T4 DS neurons in the fly system were not reproduced by the same group. A new paper from Axel Borst’s group, however, shows a distinct source of spatially offset excitation (glutamate-mediated by disinhibition) may underlie the multiplicative operation. Nevertheless, to our knowledge, no studies have mapped out the spatiotemporal properties of inputs across single T4/T5 DS neurons as we have done for the starburst. In the revised manuscript we briefly summarize the fly literature in response to the Reviewer’s suggestion.

      • The authors use stationary spot stimuli of different sizes to characterize the response kinetics of excitatory inputs to SACs. I suggest the authors add an explanation for choosing only stationary stimuli for studying the role of excitatory inputs in direction selectivity/motion processing.

      Please see the above response to Reviewer #1.

      In addition, the authors use simulated moving edges to stimulate the model bipolar cells. They should provide details about the size of the stimulus and the rationale behind using this size, given their previous results.

      For simulation experiments, bipolar cell inputs were triggered by a 400 µm wide bar moving over a range of velocities (0.1 – 2 mm/s). We have now added more details in the Methods section and main text in the revised manuscript.

      • Using the biophysical model, the authors show that converting sustained bipolar cell inputs to transient ones reduces direction selectivity in SACs. I suggest the authors also do the opposite manipulation/flip the proximal and distal inputs or provide a rationale why they performed this specific manipulation.

      Thanks for the suggestion. We have now updated Figure 6B showing DSi vs velocity plots for several different bipolar cell input distributions – (i) sustained-transient, (ii) transient-sustained, (iii) all transient, and (iv) all sustained.

      • In each figure, the authors should note whether traces show single trial responses or mean across how many trials. If the mean is presented (e.g. Suppl. Fig. 2a), the authors should include a measure of variability - either show single ROIs in addition and/or add an s.d. shading to the mean traces.

      In the revised manuscript, we have now indicated the mean and number of trials for each figure. We have also added S.E.M values to the mean traces in the figures.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript by Kim et al., the authors use live-cell imaging of transcription in the Drosophila blastoderm to motivate quantitative models of gene regulation. Specifically, they focus on the role of repressors and use a 'thermodynamic' model as the conceptual framework for understanding the addition and placement of the repressor Runt, i.e. synthetic insertion of Runt repressor sites into the Bicoid-activated hunchback P2 enhancer. Coupled with kinetic modeling and live-cell imaging, this study is a sort of mathematical enhancer bashing experiment. The overarching theme is measuring the input/output relationship between an activator (bicoid), repressor (runt), and mRNA synthesis. Transcriptional repression is understudied in my opinion. One finding is that the inclusion of cooperativity between trans-acting factors is necessary for understanding transcriptional regulation. Most, if not all, of the tools used in this paper have been published elsewhere, but the real contribution is a deep, quantitative dissection of transcriptional regulation during development. As such, the only real questions for this referee are whether the modeling was done rigorously to produce some general biological conclusions. By and large, I think the answer is yes.

      We thank the reviewer for this thoughtful evaluation of our work. We agree with the reviewer’s assessment that transcriptional repression, especially the quantitative dissection of transcriptional repression, is understudied compared to transcriptional activation.

      Comments:

      Fig. 6 was the most striking figure for this referee, specifically that different placements of Runt molecules on the enhancer lead to distinct higher order interactions. I am wondering if the middle data column in Fig. 6 represents a real difference from the other two, and if so, it seems that the positioning - as opposed to simply the stoichiometry - is essential in cooperativity. This conclusion implies that transcriptional regulation is more precise than what some claim is just a mushy ball of factors close to a promoter. In other words, orientation may matter. Proximity may matter. Interactions in trans matter.

      We thank the reviewer for pointing out a feature of our data that we did not emphasize enough originally. Indeed, the construct in the middle column, which we termed [101], could be better recapitulated with the simplest model of zero free parameters than the other two constructs. As the reviewer pointed out, this raises an interesting question about the “grammar” of an enhancer: the placement and orientation of binding sites for transcription factors might matter yet we do not have a clear understanding of the logic. We have now incorporated a discussion of this topic in the Discussion section.

      There needs to be at least one prediction which is validated at the level of smFISH / mRNA in the embryo. Without detracting from the effort the authors have expended in looking directly at transcription, if the effects can't be felt by the blastoderm at the level of mRNA/cell, it becomes difficult to argue for the relevance to development. Also, I feel there is little chance that these measurements can be quantitatively replicated unless translated to the level of total protein or mRNA. Such a measurement (orthogonal quantitative confirmation of the repressor cooperativity result) would also assuage my concern about the time averaging as shown in Fig. S3.

      Our study focused on predicting the initial rate of transcription because it is the measurable quantity that most directly relates to the binding and action of the transcriptional activators and repressors used in this study. We argue that the action of transcription factors would be more accurately assessed by monitoring the rate of transcription, rather than the accumulated mRNA, which could be confounded by the dynamics of the whole transcription cycle—initiation, elongation and termination—as well as nuclear export, diffusion and degradation of transcripts. We are, of course, excited to eventually be able to predict a whole pattern of cytoplasmic mRNA over space and time from knowledge of the enhancer sequence. However, if we cannot predict the initial rate of RNA polymerase loading dictated by an enhancer, we argue that there is little hope in predicting such cytoplasmic patterns. We emphasized this point in the Discussion (Line XX-YY). Regardless, to assuage the reviewer’s concern, we have performed additional analyses to assess the effect of repression at the level of accumulated mRNA.

      First, we have quantified the accumulated mRNA during nuclear cycle 14, which is the time window that we have focused on in this study. To make this possible, we have integrated the area under the curve of MS2 time traces which has been already shown to be a reporter of the total amount of mRNA produced by FISH (Garcia et al., Current Biology 23:2140, 2013;Lammers et al., PNAS 17:836, 2020). This integration reporting on accumulated mRNA is now shown for all constructs in the presence and absence of Runt protein in the new Figure S17. This figure clearly shows that the consequences of repression are present in the blastoderm, not just at the level of transcriptional initiation, but also at the level of accumulated mRNA.

      We then compared the accumulated mRNA profiles shown in Figure S17 to the initial rate of RNAP loading at each position of the embryo along the anterior-posterior axis for all constructs in the presence and absence of Runt protein. These new results are shown in a new figure, Figure S19. Interestingly, we saw a good correlation (Pearson correlation coefficient of 0.90) between these two metrics. Thus, we argue that our conclusion that higher-order cooperativity is necessary to account for the initial rate of RNA polymerase loading would still hold for predicting the accumulated mRNA.

      Reviewer #3 (Public Review):

      The authors have presented results from carefully planned and executed experiments that probe enhancer-drive expression patterns in varying cellular conditions (of the early Drosophila embryo) and test whether standard models of cis-regulatory encoding suffice to explain the data. They show that this is not the case, and propose a mechanistic aspect (higher order cooperativity) that ought to be explored more carefully in future studies. The presentation (especially the figures and schematics) are excellent, and the narrative is crisp and well organized. The work is significant because it challenges our current understanding of how enhancers encode the combinatorial action of multiple transcription factors through multiple binding sites. The work will motivate additional modeling of the presented data, and experimental follow-up studies to explore the proposed mechanisms of higher order cooperativity. The work is an excellent example of iterative experimentation and quantitative modeling in the context of cis-regulatory grammar. At the same time, the work as it stands currently raises some doubts regarding the statistical interpretation of results and modeling, as outlined below.

      We thank the reviewer for noting the significance of our work. We tried our best to address the concerns of the reviewer regarding the statistical interpretation of results and theoretical modeling throughout our responses below.

      The results presented in Figure 5 are used to claim that the data support (i) an unchanging K_R regardless of the position of the Runt site in the enhancer and (ii) an \omega_RP that decreases as the site goes further away from the promoter, as might be expected from a direct repression model. This claim is based on only testing the specific model that the authors hypothesize and no alternative model is compared. For instance, are the fits significantly worse if \omega_RP is kept constant and the K_R allowed to vary across the three sites. If different placements of the Runt site can result in puzzling differences in RNAP-promoter interaction, it seems entirely possible that the different site placements might result in different K_R, perhaps due to unmodeled interference from bicoid binding. Due to these considerations, it is not clear if the data indeed argue for a fixed K_R and distance-dependent \omega_RP.

      We apologize for the lack of justification in assuming that Kr remains constant and wrp varies depending on the position of the Runt binding sites. Following the reviewer’s suggestion, we tested the alternative scenarios where we either fix or vary different combinations of wrp and Kr for our one-Runt binding site constructs. The result is now shown in a new figure, Figure S16. In short, as reported by the Akaike Information Criterion (AIC) in Figure S16F, the MCMC fit explains the data best in the scenario of fixed Kr and different wrp values for one-Runt binding site constructs. Furthermore, we also performed the MCMC inference in the case where we varied both Kr and wrp values across constructs. From this analysis, we obtained similar values of Kr while having different values of wrp across constructs as shown in Figure S16G. Overall, we believe that this evidence strongly supports our assumption of having consistent Kr values but different wrp values for the one-Runt binding site constructs.

      Results presented in Figure 6 make the case that higher order cooperativity involving two DNA-bound molecules of Runt and the RNAP is sufficient to explain the data. The trained values of such cooperativity in the three tested enhancers appear orders of magnitude different. As a result, it is hard to assess the evidence (from model fits) in a statistical sense. Indeed, if all of the assumptions of the model are correct, then using the high-order cooperativity is better than not using it. To some extent, this sounds statistically uninteresting (one additional parameter, better fits). It is not the case that the new parameter explains the data perfectly, so some form of statistical assessment is essential.

      The inferred cooperativity values are indeed orders of magnitude different. However, the cooperativity terms can be also written as “w = exp(-E/(kBT))”, where the E is the interaction energy, kB is the Boltzmann constant, and T is the temperature. As a result, we should compare the magnitude of the different cooperativities on a log-scale. In brief, the interaction energies wrr from the three two-Runt binding site constructs range between 0 and 1kBT, and the higher-order cooperativity wrrp has an energy between -2 and 4kBT. Interestingly, these energies are of the same order of magnitude as the interaction energies typically reported for bacterial transcription factors (e.g., Dodd et al., Genes and Development 18:344-54, 2004). It is important to note that our inferred interaction energies could be either positive or negative, suggesting that both cooperativity and anti-cooperativity can be at play depending on the architecture of the two Runt binding sites. We now report on these interactions in the language of energies Table S1 and elaborate on this in the Discussion section (Line XX-YY).

      Finally, following the reviewer’s suggestion on statistical assessment of whether addition of parameters indeed explains the data better, we adopted the Akaike Information Criterion (AIC) as a metric to compare different models used in Figure 6 and now show the results in a new panel, panel G. Briefly, AIC is calculated by assessing the model’s ability to explain the data while penalizing for having more parameters. The smaller the AIC value is, the better the model explains the data. As we have claimed, the AIC showed a dramatic decrease when adopting the higher-order cooperativity as shown in Figure 6G. Thus we argue that the addition of higher-order cooperativity, while not being able to completely explain the data, is indeed capable of increasing the agreement between experiments and theory across all our two-Runt site constructs.

      Moreover, it is not the case that the model structure being tested is the only obvious biophysics-driven choice: since this is the first time that such higher order effects are being tested, one has to be careful about testing alternative model structures, e.g., repression models that go beyond direct repression and pairwise cooperativity that goes beyond the traditional approach of a single (pseudo)energy term.

      We agree with the reviewer that alternative models with different mechanisms of repression should be mentioned. We have clarified this point further in Discussion (Line XX -YY). In summary, we tested both “competition” and “quenching” models of repression as proposed in Gray et al, (Genes and Development 8:1829, 1994). Interestingly, Figure S5 shows that the “competition” model gives a worse fit compared to the “direct repression” and “quenching” models for the one-Runt binding site cases. We further tried to test these alternative models in the case of two-Runt binding sites constructs. The result is shown in Figure S7 (competition) and S8 (quenching). These figures also reveal that the “competition” model underperformed compared to the “direct repression” or “quenching” models. For the “quenching” model to fit the data, we also had to invoke higher-order cooperativity that is beyond pairwise cooperativity. Thus, we believe that the requirement of higher-order cooperativity holds regardless of the choice of the specific model. Of course, our models of repression are very likely an oversimplification of how repressors actually work. However, given that these simple models have been a prevalent choice of proposed mechanisms for repression in the field of transcriptional repression for the past decades, we believe that the significance of our work lies in the fact that we challenged these models by turning them into precise mathematical statements (in the form of widespread thermodynamics models) and confronting them with quantitative data.

      The general theme seen in Figure 6 is seen again in Figure 7, when a 3-site construct is tested: model complexities inferred from all of the previous analyses are insufficient at explaining the new data, and new parameters have to be trained to explain the results. The authors do not seem to claim that the higher order cooperativity terms (two parameters) explain the data, rather that such terms may be useful.

      We agree that our previous approach was confusing. Figure 7A indeed incorporated all inferred parameters from the previous rounds of inference (Kb, wbp, p, R, as well as Kr, wrp, wrr, and wrrp). However, it is clear that this set of parameters, even including the higher-order cooperativity from two-Runt binding sites cases, was not enough to explain the data from three-Runt binding sites case. Thus, we had to invoke another free parameter, which we termed wrrrp, to explain the data. We have revised Figure 7B such that it is now showing the “best” MCMC fit which explains the data quite well (instead of just showing the “improvement” of fits).

    1. Author Response

      Reviewer #1 (Public Review):

      Xu et al show that mutants in three DNA replication proteins, Mcm2, Pole3, and Pole4 have defects in differentiation in a mouse embryonic stem cell (ESC) model. The Mcm2 mutant (called Mcm2-2A), which specifically blocks the interaction of Mcm2 with histones, has defects in multilineage differentiation and neural differentiation, despite having minimal effect on ESC proliferation or gene expression. Mcm2-2A fails to fully silence ESC genes and activate appropriate differentiation genes. Chromatin profiling analyses show Mcm2 binds many promoters. During differentiation, the Mcm2-2 mutant retains K3K27me3 at differentiation gene promoters and reduced accessibility, consistent with the observed defects in gene expression.

      The findings that Mcm2-2A has minimal effect on proliferation and gene expression in ESCs, but impairs differentiation are interesting, particularly since this mutant seems to separate the histone binding roles of Mcm2 and its roles in DNA replication. Furthermore, the fact the histone binding function is only necessary when cells exit the pluripotent state is of interest. The studies were reasonably thorough and generally support the conclusions that Mcm2 is important for reshaping histone modifications during differentiation, although the details by which this occurs are not clear. Although the authors used two different strategies for identifying the direct binding sites of Mcm2 on chromatin, Mcm2 enrichment at individual loci was relatively weak, suggesting Mcm2 may localize somewhat diffusely. This somewhat weakens the conclusions about the direct vs indirect effects of Mcm2 on chromatin structure and gene expression.

      Overall, this paper reports an interesting set of findings that have a few caveats/limitations regarding how Mcm2 mediates these effects on chromatin during ESC differentiation.

      My biggest question is about the Mcm2 CUT&RUN data, which appears to have low signal-to-noise. The authors appear to be aware of this issue, as they also used an Mcm2-FLAG line for CUT&RUN studies, with similarly low signal to noise. To be clear, this may be due to the binding properties of Mcm2, which may bind chromatin relatively broadly, causing few highly enriched peaks to be observed (similar to cohesin complex in the absence of CTCF). However, it makes the Mcm2 binding data difficult to interpret. First, most Mcm2 peaks seem to be near promoters. Promoters often have a small amount of signal in negative control (IgG or irrelevant antibody) CUT&RUN experiments, presumably due to their MNase accessibility. It is not clear to what extent Mcm2 peaks exceed background because no negative control CUT&RUN was performed. The high correlation of FLAG and Mcm2 CUT&RUN libraries might still be evident if some of this signal is due to background at TSSs. Second, the authors call 13,742 peaks, but browser tracks of some example peaks at the Pax6 and Nanog promoters show minimal enrichment relative to surrounding regions (Fig. 5I, 5S1B). I have concerns that some of these peaks called statistically significant are not biologically meaningful.

      We thank the reviewer for his/her time to review this story and for his/her positive comments. We shared the reviewers’ concern about low signal to noise for Mcm2 CUT&RUN. However, the Mcm2 CUT&RUN signals most likely reflect Mcm2 binding.

      Reviewer #2 (Public Review):

      It is established that different histone chaperones not only facilitate the assembly of DNA into nucleosomes following DNA replication and transcription but also are essential to stem cell maintenance and differentiation. Here the authors Xiaowei Xu et al. propose a novel role for Mcm2 DNA helicase, a subunit of the origin licensing complex Mcm2-7 in stem cell differentiation in addition to or in connection to maintaining genomic integrity in DNA replication. This study is a continuation of the authors' previously published work implicating Mcm2-Ctf4-Polα axis in the parental histone H3-H4 transfer to lagging strands. The present study is elegantly executed with a systemic analysis of the role of Mcm2 in the ES differentiation to neuronal lineage.

      We thank the reviewer for his/her time to review the manuscript and for his/her positive comments.

      Major questions

      1) Mouse ES cells with a mutation at the histone binding motif of Mcm2 (Mcm2-2A) grew normally, but exhibited defects in differentiation. Also, the Mcm2-2A mutation linked global changes in gene expression, chromatin accessibility and histone modifications were not apparent to the similar degree in mouse ES cells compared to NPCs. The authors suggest that the excessive amount of Mcm2 in ES cells, similar to DNA replication, safeguards the chromatin accessibility and gene expression in mouse ES cells resulting in Mcm2-2A mutant ES cells being able to restore the symmetric distribution of parental histones before cell division. What is underlying the mechanism of this difference since overabundant Mcm2 is present in both ES cells and NPCs?

      This is an excellent good question that we can only speculate. As discussed above and below, our results indicate that Mcm2 functions with Asf1a to resolve the bivalent chromatin domains during pluripotency exit. Therefore, it is highly likely that Mcm2’s role in differentiation is independent of its role in DNA replication. Therefore, in the revised manuscript, we downplayed this possibility and suggested that the differentiation defects in Mcm2-2A mutant cells may arise from the involvement of Mcm2 in resolving bivalent chromatin domains (p24).

      2) CAF-1, Asf1a, and Mcm2 partake in similar or redundant chromatin regulation during differentiation with silencing of pluripotent genes and induction of lineage-specific genes. These processes were found commonly dysregulated in both Mcm2-2A cells and Asf1a KO ES cells, albeit with varying degrees. How can authors exclude the possibility of Mcm2 affecting the differentiation via Asf1 with which it forms a complex, as a potentially redundant mechanism in the deposition of newly synthesized or recycled histones?

      To address this question, we performed the following experiment. First, we overexpressed Asf1a in both WT and Mcm2-2A mutant ES cells and determined whether Asf1a overexpression suppress the differential defects in Mcm2-2A mutant cells (Figure 2- figure supplement 2A). We observed that Asf1a overexpression did not rescue the differential defects of Mcm2-2A mutant cells based on analysis of cell morphology (Figure 2- figure supplement 2B) as well as the expression of Oct4 and lineage specific genes during differentiation (Figure 2- figure supplement 2C-E).

      Second, we knocked out Asf1a in both WT cells and Mcm2-2A mutant cells using CRISPR/Cas9 (Figure 2- figure supplement 2F and 2G). and compared the effects of Asf1a KO, Mcm2-2A and Mcm2-2A Asf1a KO double mutation on differentiation. As detailed above, these results indicate that Mcm2’s function in the induction of lineage specific genes is dependent on Asf1a. However, Mcm2 also has independent role on the regulation of pluripotency genes which might through its unique roles on parental histone deposition and gene expression regulation. We discussed these points in the results (p10-13) and discussion (p22-23).

      It is known that CAF-1 and Mcm2 are involved in deposition of new H3-H4 and parental H3-H4, respectively. Further, there is little evidence that CAF-1 interacts with Mcm2 in the literature. Therefore, we did not analyze the relationship between CAF-1 and Mcm2 during differentiation. In the revised manuscript, we discussed these points to address the reviewer’s concern.

      Can authors test potential redundancy between Mcm2 and other histone chaperones and modifiers? Can the authors rescue the NPC phenotype induced by Mcm2 -2A mutant? Can the authors rescue the Mcm2-2A phenotype by overexpression of another histone chaperone or modifier?

      As stated above, we overexpressed Asf1a, which is known to interact with Mcm2, and found that overexpression of Asf1a did not rescue differentiation defects of Mcm2-2A mutant cells. On the other hand, overexpression of Mcm2 in Mcm2-2A cells did rescue defects in differentiation (Figure 2E-G). As discussed above, our results indicate that Mcm2 and Asf1a function in the same pathway for resolving bivalent chromatin domains based on analysis of Asf1a KO Mcm2-2A double mutant as well as RNA-seq datasets of Asf1a KO and Mcm2-2A during differentiation. However, the defects of Mcm2-2A on silencing of Oct4 was not observed in Asf1a mutant cells. Together, these results indicate that the defects in differentiation of Mcm2-2A cells are, at least in part, due to a reduced interaction with Asf1a. Furthermore, Mcm2 also has its unique role in promoting the silencing of pluripotency genes.

      3) Authors argue that Mcm2 may regulate the deposition of newly synthesized or recycled histones via the ability to recycle 1. parental H3.1 and H3.3, 2. via binding directly H3-H4, and/or via 3. Pol II transcription. Which of these mechanisms may be more unique to Mcm2 compared to the other histone chaperones and modifiers?

      This is a very interesting, but challenging question to address for the following reasons. First, while Mcm2-2A mutant showed defects in binding to both H3.1 and H3.3, it is almost impossible to identify a Mcm2 mutant that bind H3.1 and H3.3 differently. Based on our recent studies, our results indicate that the defects in the induction of lineage specific genes are likely due to a loss of Asf1a interaction. However, the defects in silencing of pluripotent gene such as Oct4 is unlikely due to a loss of interaction with Asf1a. Therefore, we suggest that defects in silencing of pluripotent genes in Mcm2-2A mutant cells are likely due to Mcm2’s role in parental histone transfer and/or gene transcription. In the revised manuscript, we dramatically modified the discussion section to reflect the new results as well as to further mitigate the concerns of the reviewer.

      4) Authors observed that in the ES cells the majority of Mcm2 CUT&RUN peaks were enriched with H3K4me3 CUT&RUN signals and ATAC-seq peaks and a small fraction of Mcm2 CUT&RUN peaks were engaged at the bivalent chromatin domains (H3K4me3+ and H3K27me3+). In contrast, in wild-type NPCs all the Mcm2 peaks co-localized with H3K4me3 and ATAC-seq peaks (H3K4me3+, H3K27me3-). The authors thus argued that Mcm2 binding to chromatin is rewired during differentiation citing this differential engagement of Mcm2 with the bivalent chromatin domains in ES and NPCs. What is the mechanism of Mcm2 differential engagement with the bivalent chromatin domains?

      As stated above, the original discussion may be misleading. In the revised manuscript, we dramatically rewrote the discussion based on the new results indicating that Mcm2 and Asf1a function similarly for the induction of lineage specific genes marked by bivalent promoters during pluripotency exit.

      5) Authors indicated that in mouse ES cells Mcm2 CUT&RUN peaks exhibited low densities at the origins. DNA replication origins are licensed by the MCM2-7 complexes, with most of them remaining dormant. Dormant origins rescue replication fork stalling in S phase and ensure genome integrity. It is reported that ESs contain more dormant origins than progenitor cells such as NPCs and that may prevent the replication stress. Also, partial depletion of dormant origins does not affect ECs self-renewal but impairs their differentiation, including toward the neural lineage. Moreover, reduction of dormant origins in NPCs impairs their self-renewal due to accumulation of DNA damage and apoptosis. Can authors exclude the role of reduced dormant origins reflected in the reduced density of Mcm2 at the origins in the differentiation to neuronal lineages?

      Thank the reviewer for excellent suggestions. We have now discussed these points about the potential role of Mcm2 in dormant origins and differentiation defects in the discussion (p24). However, I would like to point out that based on the new results, this is an unlikely mechanism. Supporting this idea, it is known that Mcm2-2A mutant cells from yeast and mouse ES cells are not sensitive to replication stress, such as HU (Foltman, Evrin et al. 2013, Huang, Stromme et al. 2015).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper introduces a new statistical framework to study cellular lineages and traits. Several new measures are introduced to infer selection strength from individual lineages. The key observation is that one can simply relate cumulants of a fitness landscape to population growth, and all of this can be simply computed from one generating function, that can be inferred from data. This formalism is then applied to experimental cell lineage data.

      I think this is a very interesting and clever paper. However, in its current form the paper is very hard to read, with very few explanations beyond the mathematical observations/definitions, which makes it almost unreadable for people outside of the field in my opinion. Some more intuitive explanations should be given for a broader audience, on all aspects : definitions of fitness « landscape », selection strength(s), connections between cumulants and other properties (including skewness) etc... There are many new definitions given with names reminiscent of classical concepts in evolutionary theory, but the connection is not always obvious. It would be great to better explain with very simple, intuitive examples, what they mean, beyond maths, possibly with simple examples. Some of this might be obvious to population geneticists, and in fact some explanations made in discussion are more illuminating, but earlier would be much better. I give more specific comments below.

      We thank the reviewer for calling our attention to the lack of accessible explanations on the significant terms and quantities in this framework. Following the suggestion in the comments below, we added Box 1, providing intuitive and plain explanations on the terms of fitness, fitness landscape, selection, selection strength, and cumulants. In each section, we explain the standard usage of these terms in evolutionary biology and clarify the similarities and differences in this framework. We also added a figure to Box 1 and provided a schematic explanation of the relationships among chronological and retrospective distributions, fitness landscapes, and selection strength. We believe that these explanations and a figure would better clarify the meanings and functions of these quantities.

      Major comments :

      1) the authors give names to several functions, for instance before equation (1) they mention « fitness landscape », then describe « net fitness » , which allows the authors to define « fitness cumulants ». Later on, a « selection » is defined. Those terms might mean different things for different authors depending on the context, to the point there are sometimes almost confusing. For instance, why is h a « landscape » ? For me, a landscape is kind of like a potential, and I really do not see how this is connected to h. « fitness cumulants » is particularly jargonic. There are also two kinds of selection strengths, which is very confusing. I would recommend that the authors make a glossary of the term, explain intuitively what they mean and maybe connect them to standard definitions.

      We appreciate the suggestion of making a glossary of the terms. Following the suggestion, we added Box 1 to provide intuitive and plain explanations of the terms used in this framework.

      In Box 1, we explain why we called h(x) a fitness landscape, referring to its standard usage in evolutionary biology. In evolutionary biology, fitness landscapes (also called adaptive landscapes) are visual representations of relationships between reproductive abilities (fitness) and genotypes. The height of landscapes corresponds to fitness. Since constructing "genotype space" is usually difficult, fitness is often mapped on an allele frequency or phenotype (trait) space to depict a "landscape." Fitness landscapes introduced in our framework are analogous to those in evolutionary biology in that fitness differences are mapped on trait spaces. Although fitness landscapes in evolutionary biology are usually metaphorical or conceptual tools for understanding evolutionary processes, the landscapes in our framework are directly measurable from division count and trait dynamics on cellular lineages.

      We also explain "selection" and "selection strength" in Box 1. As pointed out, we define three kinds of selection strength measures. These three measures share a similar property of reporting the overall correlations between traits and fitness. However, they also have critical differences regarding additional selection effects they represent: S_KL^((1)) for growth rate gain, S_KL^((2)) for additional loss of growth rate under perturbations, and their difference S_KL^((2))-S_KL^((1)) for the effect of selection on fitness variance. We restructured the sections in Results and clarified these important meanings of the different selection strength measures.

      We removed the term "fitness cumulants" as this is non-general and might cause confusion to readers. We now rephrased this more precisely as "cumulants of a fitness landscape (with respect to chronological distribution)." Besides, we added a general explanation of "cumulants" to Box 1 and clarified what first, second, and third-order cumulants represent about distributions.

      2) Along the same line, it would be good to give more intuitive explanations of the different functions introduced. For instance I find (2) more intuitive than (1) to define h . I think some more intuition on what the authors call selection strengths would be super useful . In Table 1 selection strengths are related to Kublack Leibler divergence (which does not seem to be defined), it would be good to better explain this.

      In addition to Box 1, we included more intuitive explanations on fitness landscapes and selection strength where they first appear in the Theoretical background section. As pointed out, descriptions of the linkage between the selection strength measures and Kullback-Leibler divergence were only in the Supplemental Information in the original manuscript. We now explicitly show this linkage where we first define the selection strength.

      Following this comment, we also changed the definition of a fitness landscape from the original one to h(x)≔τΛ+ln⁡〖Q_rs (x)/Q_cl (x)〗 (Eq. 1), using the chronological and retrospective distributions introduced in the preceding paragraph. This definition is mathematically equivalent to the previous one, but we believe it is more intuitive.

      3) It seems to me the authors implicitly assume that, along a lineage, one would have almost stationary phenotypes (e.g. constant division rate) . However, one could imagine very different situations, for instance the division rates could depend on interactions with other cells in the growing population, and thus change with time along a lineage. One could also have some strong random components of division rate over time . I am wondering how those more complex cases would impact the results and the discussion

      We thank the reviewer for pointing out our insufficient explanation of an essential feature of this framework. As we now explain in the "Examples of biological questions" section (L62-65) and Discussion (L492-493), this framework does not assume stationary phenotypes (traits) on cellular lineages. On the contrary, we developed this framework so that one can quantify fitness and selection strength even for non-stationary phenotypes (traits) due to factors such as non-constant environments and inherent stochasticity.

      In fact, if traits are stationary in cellular lineages, this framework becomes essentially identical to the individual-based evolutionary biology framework (see ref. 26, for example). Our framework assumes a cell lineage as a unit of selection and any measurable quantities along cellular lineages as lineage traits, whether they are stationary or non-stationary. Therefore, our framework can evaluate fitness landscapes and selection strength without explicitly taking the environmental conditions around cells into account. This means that h(x) and S[X] in this framework extract the correlations between the traits of interest and division counts among various factors that could potentially influence division counts. On the other hand, this framework has a limitation due to this design: it cannot say anything about the influence of factors such as non-quantified traits and potential variations in environmental conditions. We now explain these important points explicitly in the revised manuscript (L493-496).

      Likewise, stochasticity in division rate does affect division count distributions, and its influence appears as differences in the selection strength of division count S[D]. As stated in the text, S[D] sets the maximum bound for the selection strength of any lineage trait (L143-145). Therefore, S_rel [X]≔S[X]/S[D] reports the relative strength of the correlation between the trait X and lineage fitness in a given level of S[D] in each condition.

      To clarify the influence of stochasticity in division rate, we present a cell population model in which cells divide stochastically according to generation time (interdivision time) distributions in Appendix 2 (we moved this section from the Supplemental Information with modifications). We can confirm from this model that the shapes of generation time distributions influence the selection strength S[D]. Importantly, one can understand from this model that stochasticity in generation times constantly introduces selection to cell populations and modulates the growth rate and selection strength even in the long-term limit. We now clarify this important point in the Discussion (L519-526).

      4) « Therefore, in contrast to a common assumption that selection necessarily decreases fitness variance, here we show that under certain conditions selection can increase fitness variance among cellular ». This is a super interesting statement, but there is such a lack of explanations and intuition here that it is obscure to me what actually happens here.

      When a decrease in fitness variance by selection is mentioned in evolutionary biology, an upper bound and inheritance of fitness across the generations of individuals are usually assumed. In such circumstances, selection drives the fitness distribution toward the maximum value, and the selection eventually causes fitness variance to decrease. However, even in this process, a decrease is not assured for every step; whether selection reduces fitness variance at each step depends on the fitness distribution at that time.

      In our argument, we compared fitness variances between chronological and retrospective distributions. We showed both theoretically and experimentally that there are cases where the variances of the retrospective distributions (distributions after selection) become larger than those of the chronological distributions (distributions before selection). The direction of variance change depends on the shape of chronological distributions, primarily on the skewness of the distributions (positive skew for increasing the variance and negative skew for decreasing the variance). The direction of variance changes can also be probed by the difference between the two selection strength measures S_KL^((2))-S_KL^((1)). Notably, we can demonstrate that there are cases where retrospective fitness variances are larger than chronological fitness variances even in the long-term limit, as shown by a cell population model in Appendix 2.

      We now explain what kind of situations are usually premised when reduction of fitness variance is mentioned and clarify that, in our framework, we compare the fitness variances between chronological and retrospective distributions (L542-548). We also explain that a selection effect on fitness variance generally depends on fitness distribution and that a larger fitness variance in retrospective distribution is possible even in the long-term limit (L548-557).

      Reviewer #2 (Public Review):

      The paper addresses a fundamental question: how do phenotypic variations among lineages relate to the growth rate of a population. A mathematical framework is presented which focuses on lineage traits, i.e. the value of a quantitative trait averaged over a cell lineage, thus defining a fitness landscape h(x). Several measures of selection strengths are introduced, whose relationships are clarified through the introduction of the cumulant generating function of h(x). These relationships are illustrated in analytical mathematical models and examined in the context of experimental data. It is found that higher than third order cumulants are negligible when cells are in early exponential phase but not when they are regrowing from a stationary phase.

      The framework is elegant and its independence from mechanistic models appealing. The statistical approach is broadly applicable to lineage data, which are becoming increasingly available, and can for instance be used to identify the conditions under which specific traits are subject to selection.

      We appreciate the reviewer for the positive evaluation. We will reply to your specific comments below.

      Reviewer #3 (Public Review):

      In this work the authors have constructed a useful mathematical framework to delineate contributions leading to differences in lineages of populations of cells. In principle, the framework is widely applicable to exponentially growing populations. An attractive feature is that the framework is not tailored to particular growth models or environmental conditions. I expect it will be valuable for systems where contributions from phenotypic heterogeneity overwhelm contributions from intrinsic stochasticity in cellular dynamics.

      I am generally very positive about this work. Nevertheless, a few specific concerns:

      1) In here, lineages are considered as fitter if they have more division events. But this consideration neglects inherent stochasticity in division events. Even in a completely homogeneous population, the number of division events for different lineages is different due to intrinsic stochasticity, but applying the methods discussed in this manuscript may lead to falsely assigning different fitness levels to different lineages. The reason why (despite having different number of division events) these lineages ought be assigned the same fitness level is that future generations of these cells will have identical statistics, in contrast with those of cells that are phenotypically different. Extending the idea to heterogeneous populations, the actual difference in fitness levels may be significantly different from what is obtained from the mathematical framework presented here, depending on the level of inherent stochasticity.

      We thank the reviewer for the comment on the point of which our explanation was insufficient in the original manuscript. Intrinsic stochasticity in interdivision time (generation time) is, in fact, critical for selection. For example, if a cell divides with a generation time shorter than the average due to stochasticity, this cell is likely to have more descendant cells in the future population on average than the other cells born at the same timing, even if the descendants follow identical statistics. Therefore, the properties of intrinsic stochasticity, including shapes of generation time distributions and transgenerational correlations, significantly affect the overall selection strength S_KL^((1)) [D] (and also S_KL^((2)) [D]). We now explain this important point in the Results section, referring to the analytical model in Appendix 2 (L327-334), and also in Discussion (L519-524).

      Importantly, even when cell division processes seem purely stochastic, different states in some traits might underlie these variations in generation times. In such cases, evaluating h(x) and S_rel [X] can still unravel the correlations between the trait values and fitness. Especially, the relative selection strength S_rel [X]≔S_KL^((1) ) [X]/S_KL^((1) ) [D] extracts the correlation of the trait values in a given level of division count heterogeneity in each condition. We now clarify this important aspect of the framework in Discussion (L524-526).

      When a cell population is composed of heterogeneous subpopulations each of which follows a distinct statistical rule, our framework evaluates the combined effects from the heterogeneous rules and the inherent stochasticity of each subpopulation. Untangling these two contributions is generally challenging unless we have appropriate markers for distinguishing the subpopulations. However, when the subpopulations follow significantly distinct statistics, the division count distribution should become skewed or multimodal, and the difference between the two selection strength measures S_KL^((2) ) [D]-S_KL^((1) ) [D] can suggest the existence of such subpopulations. Therefore, detailed analyses using all the selection strength measures and the fitness landscapes can provide insights into cell populations’ internal structures and selection.

      We now explain the effect of inherent stochasticity in generation times (L327-334 and L519-524) and discuss how we can probe the existence of subpopulations based on the selection strength measures (L508-512). Please also refer to our reply to the comment 3 of reviewer #1.

      2) In one of the sections the authors mention having performed analytical calculations for a cellular population in which cells divide with gamma distributed uncorrelated interdivision times. It's unclear if 1) within specific sub-populations, cells with the sub-population divide with the same division time, and the distribution of division times is due to the diverse distribution of sub-populations; or 2) if there are no such sub-populations and all cells stochastically choose division time from the same distribution irrespective of their past lineage. If the latter, then I do not see the need for a lineage-based mathematical formulation when the problem can dealt with in much simpler traditional ways which so not keep track of lineages.

      We dealt with the situation of 2) in this model. As noted by the reviewer, we can calculate the chronological and retrospective mean fitness and the population growth rate by a simpler individual-based age-structured population model (see ref. 10, for example). However, applying this framework to this model can clarify the utility of the cumulant generating function, the meaning of the differences between these fitness measures, and the effect of statistical properties of intrinsic stochasticity on long-term growth rate and selection. Therefore, we kept this model in Appendix 2 (the section is moved from Supplemental Information) with additional clarification of our motivation for analysis and the implication of the results.

      3) The analytical calculations provided seem to be exact only for trajectories of almost infinite duration (or in practice, duration much greater than typical interdivision time). For example, if the observation time is of the order of division time, this would create significant artifacts / artificial bias in the weights of lineages depending on whether the cell was able to divide within the observation time or not. Thus, the results claiming that contributions of higher order cumulants become significant in the regrowth from a late stationary phase are questionable, especially since authors note that 90% of cells showed no divisions within the observation time.

      We thank the reviewer for an insightful comment. It is true that the duration of observation influences the results. In the regrowing experiments with E. coli, we aimed to compare the two cell populations regrowing from different stages of the stationary phase. Therefore, it is appropriate to fix the time windows between the two conditions. Even though a significant fraction of cell lineages remains undivided, the regrowing cells already divide several times within this time window. Therefore, the results are valid if we compare and discuss the selection levels in this time scale. However, clarification of the selection in the longer time scales requires a more detailed characterization of lag time distributions under both conditions.

      We now clarify the range of validity of the results and the limitations on prediction for the long-term selection without knowing the details of the lag time distributions in Discussion (L536-539).

    1. Author Response

      Reviewer #2 (Public Review):

      There is emerging evidence that connexin43 hemichannels localized to mitochondria can influence their function. Here the authors demonstrated using an osteocyte cell model that connexin43 is localized to mitochondria and that this is enhanced in response to oxidative stress. Several lines of evidence were presented showing that mitochondrial connexin43 forms functional hemichannels and that connexin43 is required for optimal mitochondrial respiration and ATP generation. These aspects were major strengths of the study.

      The authors also show that connexin43 is recruited to mitochondria in response to oxidant stress, as a cell protective mechanism. This was primarily done using hydrogen peroxide to generate oxidant stress; primary osteocytes from Csf-1+/- mice, which are prone to Nox4 induced oxidant stress, also show enhanced mitochondrial connexin43 when compared with wild type osteocytes.

      Several approaches were used to demonstrate that connexin43 interacts with the ATP synthase subunit, ATP5J2, suggesting a direct role for connexin43 in the control of ATP synthesis by mediating mitochondrial ion homeostasis. Several experiments were done using a series of pHluorin fusion protein constructs as a proton sensor, these experiments hint at a potential role for connexin43 in regulating H+ permeability to support ATP production. However, the effects of inhibiting connexin43 on pH were modest, suggesting that additional roles for mitochondrial connexin43 in ATP generation should be considered.

      Thank you for your positive and thoughtful comments. We agree that additional roles for mitochondrial Cx43 may be possible. As an example, we consider that there may be a change in the stability of ATP synthase that occurs after mtCx43 deficiency. This and other possible roles of mtCx43 ought to be investigated in the future.

      Reviewer #3 (Public Review):

      This manuscript should be of broad interest to readers not only in the field of gap junction (GJ) mediated cell-to-cell communication but also to scientists and clinicians working on the function of mitochondria and metabolism. Their data elucidates a new function of Cx43 in regulating the energy (ATP) generation of mitochondria, e.g., under oxidative stress.

      The canonical function of gap junctions is in direct cell-to-cell communication by forming plasma membrane traversing channels that electrically and chemically connect the cytoplasms of adjacent cells. These channels are assembled from connexin proteins, connexin 43 (Cx43). However, more recently new, non-canonical cellular locations and functions of Cx43 have been discovered, e.g. mitochondrial Cx43 (mtCx43). However, very little is known about where Cx43 transported into mitochondria is derived from, how Cx43 is transported into mitochondria, where it is located in mitochondria, in which form Cx43 is present in mitochondria, (polypeptides, hemi-channels (HCs), complete GJ channels), and what the function of mtCx43 is. The authors addressed the latter question. The authors provide convincing evidence that mtCx43 modulates mitochondrial homeostasis and function in bone osteocytes under oxidative stress. Together, their study suggests that mtCx43 hemi-channels regulate mitochondrial ATP generation by mediating K+, H+, and ATP transfer across the mitochondrial inner membrane by directly interacting with mitochondrial ATP synthase (ATP5J2), leading to an enhanced protection of osteocytes against oxidative insult. These findings provide important information of a role of Cx43 functioning directly in mitochondria and not at the canonical location in the plasma membrane. While most of the functional assays presented in Figures 2-8 appear solid, the mitochondrial localization of Cx43, its translocation into mitochondria under oxidative stress, and its configuration as hemi-channels (Figure 1) is less convincing. I have five general comments that should be addressed:

      1) This study was performed in MLO-Y4 osteocyte cells. Is the H2O2 induced increase of mitochondrial Cx43 MLO-Y4 cell type or osteocyte specific, or is Cx43 playing a more general role in mitochondrial function, e.g. under oxidative stress? Osteoblasts such as MC3T3-E1 and MG63, and many other cell types endogenously express Cx43, and oxidative stress is a general physiological stressor, not only for osteocytes and bone cells. Attending to this question would address the generality of the findings for mitochondrial function.

      We thank the reviewer for bringing up these valid points; seeing the phenotype displayed in secondary cell types, such as osteoblasts, would be of great relevance and interest. To address this, we conducted new experiments on MC3T3-E1 cells (Figure 1-figure supplement 2). After 2 hrs of H2O2 treatment, Cx43 accumulated on the mitochondria, marked by Mitotracker. Statistical analysis also showed a significant increase of the localization between Cx43 and Mitotracker (Figure 1-figure supplement 2B). The colocalization coefficient is higher in the Ctrl group in MC3T3-E1 cells when compared with the MLO-Y4 Ctrl group, indicating a different response level in other cell lines. Osteoblasts seemed to be more sensitive to redox interference. Overall, proving the point that under oxidative stress, mtCx43 may display a similar phenotype, across multiple cell lines, although the degree of sensitivity may differ.

      2) The images of MLO-Y4 cells (Figure 1A) and the primary osteocytes isolated from Csf-1+/- and control mice (Figure 8) do not show visible gap junctions. I guess this is due to the fact that slides were stained with the Cx43(E2) antibody. I feel, staining of these cells in addition with the Cx43(CT) antibody would be helpful to get a better understanding on the distribution of Cx43 in gap junctions and undocked/un-oligomerized Cx43 in these cells.

      Thank you for the suggestion. To get a better understanding of the distribution of Cx43, either in GJ or HC form, we performed additional experiments in MLO-Y4 cells using the Cx43(CT) antibody and data are shown below. With Cx43(CT) staining, we observed more signals in the cells and on the plasma membrane. After H2O2 treatment, we observed increased and stronger signals localized on the mitochondria compared with the untreated control group. Stronger signals observed in the plasma membrane indicate the gap junction stained by Cx43(CT) antibody.

      3) The images of cells presented in Figure 1A are quite fussy. No mitochondria are visible, and the Cx43 staining is hazy and does not localize to any subcellular structures. Also, it is not clear if the higher resolution image presented in Figure 1C actually represents a mitochondrion. A good DIC image, or co-staining with another mitochondrial marker such as MitoTracker (as shown in Figure 4-S1) would make the localization and translocation of Cx43 into mitochondria upon oxidative stress more convincing. This is especially important as the translocation, although statistically significant, increases only by about 10% or less (Figure 1B). Such a small difference (also represented in the Western analyses presented in Figure 1D) could easily be artefactual, depending on how the correlation coefficient was generated. Of note in this respect is that control cells in Figure 1A appear larger (compare the size of the nuclei) and are spread out more than the H2O2 treated cells. Better, more clear images would make the mitochondrial localization/translocation more convincing.

      The reviewer made great points. To improve the image clarity, we redid the staining/imaging and determined the colocalization of SDHA and MitoTracker Deepred. The result (shown below) suggested that under normal conditions without H2O2 treatment, SDHA and MitoTracker merged perfectly, while after H2O2 treatment for 2 hrs, mitochondria became fragmented and the SDHA signal exhibited a more dotted pattern compared to the MitoTracker. Overall, we feel that MitoTracker represents the distribution of mitochondria better. SDHA is a subunit of mitochondrial complex II, and the images we presented in Figure 1C were captured from isolated mitochondria under a confocal microscope with SDHA and Cx43(CT) co-staining. Considering the specificity of SDHA (see images below), we believe the Cx43 signal we captured demonstrates the mitochondrial localization/translocation. After using MitoTracker as a mitochondrial marker and higher magnificent images, the correlation coefficient increased from 0.35 to 0.47, a 32% increment with statistical significance. As to the nuclei size, some cells indeed have smaller sizes, which may be affected by varied local cell density. The new images represented in Figure 1A are much more consistent in the nuclei size.

      4) How pure are the mitochondria that were probed for Cx43 by Western shown in Figure 1D? The preparation method described is relatively simple, collecting the 10,000xg supernatant (here 9,000xg supernatant) as mitochondrial fraction. Is it possible that the Cx43 signal, at least in part, is derived from other, contaminating membranes, such as PM, Golgi, or ER? Testing the mitochondrial preparation by Western with marker proteins specific for these compartments would strengthen the author's results.

      The reviewer made a great suggestion. To address this, we did a western blot to test the mitochondrial purity. Indeed, this method using centrifugation is simple, and as expected there were some contamination of ER (marked by PDI) and Golgi (marked by STX6). However, to further confirm the purity of the mitochondrial fraction, fluorescent dyes for mitochondria (MitoTracker Deepred), ER (ER-Tracker Blue-White), and nuclei (Hochest) were used. The organelle-specific dyes indicated most parts of the fraction were mitochondria. There were some contaminations with ER fragments and minimal nuclear contamination. Combining our western blot and immunofluorescence data, it can be concluded that our Cx43 signal is primarily derived from mitochondria.

      5) The authors rely on previous studies to postulate that Cx43 in mitochondria forms hemichannels in their system, is localized in the inner membrane, and is oriented with the Cx43 C-termini facing the inter-membrane space (as schemed in Figure 8C). The authors use lucifer yellow (LY) dye transfer and carbenoxolone, but both are not hemi-channel specific probes. They are transferred by, and block GJ channels as well. Experiments, using hemi-channel specific probes would be more convincing. This is important, as the information cited is based on only two references (Boengler et al., 2009; Miro-Casas et al., 2009), and it still is highly unclear how a membrane protein that is co-translationally inserted into the ER membrane, then traffics through the Golgi to be inserted into the plasma membrane is actually imported into mitochondria and in which state (monomeric, hexameric). Why the Cx43(CT) specific antibody traverses the outer mitochondrial membrane and reaches the Cx43CT while the Cx43(E2) specific antibody is not described and clear either. Where are these mitochondria permeabilized with Triton X-100 as described in M&M?

      We edited the Methods section. We did not use Triton X-100 to permeate mitochondria. PMP appeared to preserve mitochondrial inner membrane integrity allowing us to assess the localization of Cx43(CT) antibody on mitochondria. We showed these new immunofluorescence images in Figure 5- figure supplement 2. PMP used as a plasma membrane permeabilizer has a 6x affinity with MOM compared with MIM. Meanwhile, no Cx43(E2) Ab signal was detected in mitochondria, suggesting the extracellular loop of Cx43 faces the matrix and cannot be accessed by Cx43(E2) antibody.

      The translocation of Cx43 to mitochondria was reported to involve the chaperone Hsp90-dependent TOM complex pathway (Rodriguez-Sinovas et al., 2006). After the translocation, if mtCx43 forms gap junctions in mitochondria is unclear. Lucifer yellow is widely used in hemichannel-mediated dye uptake or gap junction-mediated dye transfer. In our case, considering the channel orientation, mtCx43 should form hemichannels, and Cx43(CT) Ab could be used as a specific Cx43 HCs blocker like the study reported in cardiomyocytes (Lillo et al., 2019).

    1. Author Response:

      Reviewer #1 (Public Review):

      Here, Servello et al explore the role of temperature and the temperature-sensing neuron AFD in promoting protection against peroxide damage. Unlike many other environmental threats, peroxide toxicity is expected to be temperature-dependent, since its chemical reactivity should be enhanced by higher temperatures. The authors convincingly and rigorously show that transient exposure to 25C, a condition of mild heat stress in C. elegans, activates animals' defenses against peroxides but potentially not other agents. Interestingly, this response requires the temperature-sensing AFD neurons, though whether temperature-dependent AFD activity is itself involved in this regulation is not explored. Further, the authors find that temperature regulates AFD's expression of the insulin ins-39 and provide evidence supporting the idea that repression of ins-39 at 25C contributes to enhanced peroxide defense. The authors use transcriptomic approaches to explore gene expression changes in animals in which AFD neurons are ablated, providing evidence that the FoxO-family transcription factor DAF-16 potentiates AFD signaling. However, because AFD ablation triggers effects broader than transient 25C exposure, the significance of these findings for temperature-dependent peroxide defense is somewhat unclear. Additionally, the possibility that DAF-16 (as well as another protective factor, SKN-1) function in parallel to temperature stress is consistent with many of the results shown but is not as thoroughly considered. Together, these studies identify a fascinating example of pre-emptive threat response triggered by the detection of a potentiator of that threat, a phenomenon they term "enhancer sensing." While some predictions of the specificity of this phenomenon remain untested, the paper provides intriguing insight into the potential mechanisms by which it may occur.

      Major issues:

      The dependence of the enhancer-sensing phenomenon on AFD leads the authors to conclude that the 25C stimulus is sensed by AFD itself, but this needs to be directly tested. To do this, they could ask whether tax-4 function is required in AFD, or use mutants in which AFD's thermosensory function is compromised.

      We thank the reviewer for suggesting these experiments. As requested, we determined whether previously identified mechanisms for temperature perception by the AFD neurons were required for the temperature-dependent regulation of peroxide resistance using gcy-18 gcy-8 gcy-23 triple mutants and the respective single mutants. The findings from the new experiments lead us to conclude that temperature perception by AFD via the GCY-8, GCY-18, and GCY-23 receptor guanylate cyclases, which are exclusively expressed in the AFD neurons, contributes to the temperature-dependent regulation of peroxide resistance in C. elegans. These experiments are detailed in the following new paragraph in the results section:

      “Last, we determined whether previously identified mechanisms for temperature perception by the AFD neurons were required for the temperature-dependent regulation of peroxide resistance. The AFD neurons sense temperature using receptor guanylate cyclases, which catalyze cGMP production, leading to the opening of TAX-4 channels (Goodman and Sengupta, 2019). Three receptor guanylate cyclases are expressed exclusively in AFD neurons: GCY-8, GCY-18, and GCY-23 (Inada et al., 2006; Yu et al., 1997) and are thought to act as temperature sensors (Takeishi et al., 2016). Triple mutants lacking gcy-8, gcy-18, and gcy-23 function are behaviorally atactic on thermal gradients and fail to display changes in intracellular calcium or thermoreceptor current in the AFD neurons in response to temperature changes (Inada et al., 2006; Ramot et al., 2008; Takeishi et al., 2016; Wang et al., 2013; Wasserman et al., 2011). We found that when grown and assayed at 20°C, gcy-23(oy150) gcy-8(oy44) gcy-18(nj38) triple null mutants survived 43% longer in the presence of tBuOOH than wild-type controls (Figure 3J). In contrast, at 25°C, the gcy-23 gcy-8 gcy-18 triple mutants showed a 12% decrease in peroxide resistance relative to wild-type controls (Figure 3K). Therefore, the three AFD-specific receptor guanylate cyclases influenced the temperature dependence of peroxide resistance, lowering peroxide resistance at 20°C and slightly increasing it at 25°C. At 20°C, the gcy-8(oy44), gcy-18(nj38), and gcy-23(oy150) single mutants increased peroxide resistance by 10%, 51%, and 21%, respectively, relative to wild-type controls (Figure 3L). Therefore, each of the three AFD-specific receptor guanylate cyclases regulates peroxide resistance. We conclude that temperature perception by AFD via GCY-8, GCY-18, and GCY-23 enables C. elegans to lower their peroxide resistance at the lower cultivation temperature.”

      The enhancer-sensing model is fascinating, but as it stands it is somewhat oversold. The authors could tone down the writing, indicating that this model is suggested rather than shown. Alternatively, they could more carefully test some of its predictions - for example by exploring the response to other threats (e.g. some of the toxicants described in Fig. S5) at 20C and 25C in WT and AFD-ablated animals.

      We edited the manuscript and expanded the manuscript’s discussion to address these concerns as well as similar concerns from reviewer #3. In the paper we show that the regulation of the induction of H2O2 defenses in C. elegans is coupled to the perception of temperature (an inherent enhancer of the reactivity of H2O2). To understand the significance of this finding in an evolutionary context, and to explain why such a regulatory system would evolve, we introduced in the discussion a new conceptual framework, “enhancer sensing,” and devoted a section of the discussion to demonstrating that the phenomenon that we observed could not be adequately explained by existing frameworks used to understand the evolutionary origins of the regulatory systems for defense responses.

      We now realize that we did not sufficiently and clearly explain the scope for the criterion for establishing a phenomenon represents enhancer sensing, leading to incorrect predictions by reviewer’s 1 and 3 about (a) whether what we observed in C. elegans is an instance of enhancer sensing (or more proof is needed) and (b) what the enhancer sensing model for the coupling of temperature perception to H2O2 defense would predict about how temperature and the AFD neurons would affect resilience to other chemicals. We regret failing to adequately explain the model’s scope and predictions and believe that we have now explicitly addressed the scope of what constitutes enhancer sensing and the predictions of the model. In particular, we previously did not spell out (a) the distinction between the enhancer sensing strategy and the mechanistic implementation of that strategy; and, importantly, (b) we did not discuss what the enhancer sensing strategy coupling temperature perception to H2O2 defense in C. elegans predicted (and did not predict) about whether a similar strategy would be expected to be used by C. elegans to deal with other temperature-dependent threats. We now address these issues in two new paragraphs in the discussion that read:

      “We show here that C. elegans uses an enhancer sensing strategy that couples H2O2 defense to the perception of high temperature. We expect this strategy’s output (the level of H2O2 defense) to provide the nematodes with an evolutionarily optimal strategy across ecologically relevant inputs (cultivation temperatures) (Kussell and Leibler, 2005; Maynard Smith, 1982; Wolf et al., 2005). This strategy is implemented at the organismic level through the division of labor between the AFD neurons, which sense and broadcast temperature information, and the intestine, which responds to that information by providing H2O2 defense (Figure 9D). Ascertaining that C. elegans relies on this enhancer sensing strategy does not depend on the temperature information broadcast by AFD exclusively regulating defense responses to temperature-dependent threats, because the regulation of defenses towards temperature-insensitive threats could affect defenses towards temperature-dependent threats; for example, suppressing defenses towards a temperature-insensitive threat would be beneficial if those defenses interfered with H2O2 defense or depleted energy resources contributing to H2O2 defense.

      As with any sensing strategy, enhancer sensing strategies are more likely to evolve when sensing is informative and responding is beneficial. In their natural habitat, C. elegans encounter many environmental chemicals that, like H2O2, are inherently more reactive at higher temperatures. It will be interesting to determine the extent to which C. elegans uses enhancer sensing strategies coupling temperature perception to the induction of defenses towards those chemicals, and whether those strategies rely on temperature perception and broadcasting by the AFD neurons. We expect that sensing strategies regulating defense towards those chemicals would be more likely to evolve when those chemicals are common, reactive, and cause consequential damage.”

      We note that our ability to predict survival to other toxicants, such as those that trigger specific gene-expression responses that are AFD-dependent but are unaffected between 20C and 25C (as proposed by the reviewer), is limited not only by our lack of knowledge about the specific mechanisms that protect worms from those toxicants, but also by our lack of knowledge about whether defense towards hydrogen peroxide interferes (or synergizes) with defense towards each of those toxicants and whether defense towards those toxicants interferes (or synergizes) with H2O2 defense. We therefore think that those experiments would be better addressed in future studies.

      The role of ins-39 remains somewhat speculative. Fig 4F shows that ins-39 mutants have a reduced induction of peroxide defense, but it seems that this could be the result of a ceiling effect. The authors' model predicts that overexpression of ins-39, particularly at 25C, should sensitize animals to peroxide damage, a prediction that should be tested directly. Further, the authors seem to assume that AFD is the relevant site of ins-39 function, but this needs to be better supported.

      As requested by all three reviewers, we determined whether ins-39 gene expression in AFD was sufficient to lower peroxide resistance by restoring ins-39(+) gene expression only in the AFD neurons using the AFD-specific gcy-8 promoter. As predicted by the reviewer, these worms were more sensitive to peroxide than wild-type worms. The findings from this experiment lead us to conclude that expression of ins-39 in the AFD neurons was sufficient to regulate the nematode’s peroxide resistance. The new section reads:

      “Next, we determined whether the INS-39 signal from AFD regulated the nematode’s peroxide resistance. The tm6467 null mutation in ins-39 deletes 520 bases, removing almost all the ins-39 coding sequence (Figure 5A), and inserts in that location 142-bases identical to an intervening sequence located between ins-39 and its adjacent gene. In nematodes grown and assayed at 20°C, ins-39(tm6467) increased peroxide resistance by 26% relative to wild-type controls (Figure 5F). To determine whether ins-39 gene expression in AFD was sufficient to lower peroxide resistance, we restored ins-39(+) expression only in the AFD neurons using the AFD-specific gcy-8 promoter (Inada et al., 2006; Yu et al., 1997) in ins-39(tm6467) mutants. Expression of ins-39(+) only in AFD eliminated the increase in peroxide resistance of ins-39(tm6467) mutants (Figure 5F). Notably, the peroxide resistance of the two independent transgenic lines was 28% and 30% lower than that of wild-type controls, likely due to overexpression of the gene beyond wild-type levels. We conclude that the gene dose-dependent expression of ins-39 in the AFD neurons regulated the nematode’s peroxide resistance.”

      The temperature-shift experiments in figure 5G (formerly 4F) indicated that the effect on peroxide resistance at 20C of growth at 25C and of the ins-39 mutation were non additive. We interpreted this epistatic interaction to be due to action in a common pathway. It is possible that while growth at 25C increases the subsequent peroxide resistance at 20C, it could limit the nematodes’ subsequent peroxide resistance at 20C (beyond those peroxide-resistance increasing effects) when in combination with another intervention, even if those interventions acted via parallel mechanisms—a ceiling effect, as proposed by the reviewer. We favor the alternative interpretation, that the mechanisms act sequentially, because of our findings that ins-39 gene expression within AFD was lower at 25C than at 20C, leading us to propose the sequential model in figure 5H (formerly 4G).

      Most of the daf-16 and skn-1 experiments are carried out in AFD-ablated animals, making the relevance of these findings for the 25C-dependent induction of peroxide defense somewhat unclear. As the authors show, AFD ablation causes much more extensive changes than transient 25C exposure, clearly seen in slope of the line in 3C. Further, unlike 25C exposure, AFD ablation is a chronic and non-physiological state. It would be useful for the authors to be cautious in their interpretation of these findings and to be clearer about how strongly they can connect them to the "enhancer sensing" phenomenon. Along these lines, the potentiation idea could be toned down a bit. Much of the data is consistent with parallel function for daf-16 (and skn-1) - for example, Fig 5C indicates additive effects of daf-16 and 25C exposure; 6C shows that AFD ablation still has a clear effect on peroxide sensitivity in the absence of both daf-16 and skn-1; and Fig S8a shows that much of the transcriptional response to AFD ablation (along PC1) is intact in daf-16 animals.

      We have made several adjustments in the text to address these concerns. As the reviewer noted, the experiments with skn-1 were performed only in AFD ablated worms. We have renamed the section heading to “SKN-1/NRF and DAF-16/FOXO collaborate to increase the nematodes’ peroxide resistance in response to AFD ablation” to make that clear.

      In contrast, the peroxide resistance experiments with daf-16 were done also in worms grown at 25C and then shifted to 20C during the peroxide resistance assay. The connection of daf-16 with the temperature dependent regulation of peroxide resistance was established in temperature shifts experiments in daf-16 single mutants (Figure 6C, formerly 5C) and in transgenic worms rescuing the daf-16 mutant only in the intestine (Figure 6F). In the revised text we make it clearer that the effect of the daf-16 mutation is bigger when the nematodes are shifted from 25C to 20C: “The daf-16(mu86) null mutation decreased peroxide resistance in nematodes grown at 25°C and assayed at 20°C by 35%, a greater extent than the 21% reduction in peroxide resistance induced by that mutation in nematodes grown and assayed at 20°C (Figure 6C).”

      As the reviewer noted, daf-16 and skn-1 have a role in peroxide resistance when the AFD neurons are not ablated (albeit a smaller one than when those neurons are ablated). We have made several changes and additions to the text to make that explicit. Most notably, the revised last paragraph of the SKN-1 section now reads: “We propose that when nematodes are cultured at 20°C, the AFD neurons promote signaling by the DAF-2/insulin/IGF1 receptor in target tissues, which subsequently lowers the nematode’s peroxide resistance by repressing transcriptional activation by SKN-1/NRF and DAF-16/FOXO. However, this repression is not complete, because both daf-16(mu86) and skn-1(RNAi) lowered peroxide resistance at 20°C when the AFD neurons were present. It is also likely that DAF-16 and SKN-1 are not the only factors that contribute to peroxide resistance in AFD-ablated nematodes at 20°C, because AFD ablation increased peroxide resistance in daf-16(mu86); skn-1(RNAi) nematodes, albeit to a lesser extent than in daf-16(+) or skn-1(+) backgrounds.”

      The potentiation idea was specific to the effects of DAF-16 on gene expression. As the reviewer noted, much of the transcriptional response to AFD ablation is intact (albeit reduced in magnitude) in AFD-ablated daf-16 mutants, leading to a shift in the PC1 score for the mutant. At the level of the expression of individual genes, we quantified those effects in Figure 8G (formerly 7D). When we did the RNAseq experiments we had expected that lack of daf-16 would eliminate either all the changes in gene expression induced by AFD ablation or eliminate those changes for a subset of genes. Instead, what we found was much more subtle, and unexpected: the size of the gene expression change induced by AFD ablation was reduced by the daf-16 mutation, and that reduction was systematic. Specifically, we found that the bigger the change in gene expression induced by AFD ablation, the bigger the effect of daf-16 in the AFD ablated animals (that is, potentiation), leading to a change in the slope in the regression line in Figure 8G. We revised the paper to ensure we only used the word potentiation in this context (gene expression), even though formally DAF-16 also potentiated the effects of AFD ablation (and temperature shift from 25C to 20C) on peroxide resistance.

      Reviewer #3 (Public Review):

      This paper offers novel mechanistic insights into how pre-exposure to warm temperature increases the resistance of C. elegans to peroxides, which are more toxic at warmer temperature. The temperature range tested in this study lies within the animal's living conditions and is much lower than that of heat shock. Therefore, this study expands our understanding of how past thermosensory experience shapes physiological fitness under chemical stress. The paper is technically sound with most experiments or analyses carried out rigorously, and therefore the conclusions are solid. However, it challenges our current understanding of the role of the C. elegans thermosensory system in coping with stress. The traditional view is that the AFD thermosensory neuron is activated upon sensing temperature rise, and that temperature sensation through AFD positively regulates systemic heat shock response and promotes longevity in C. elegans. Thus, it is quite unexpected that AFD ablation activates DAF-16 and improves peroxide resistance. It also appears counterintuitive that genes upregulated at 25 degrees overlap extensively with those upregulated by AFD ablation at 20 degrees. I feel that it is premature to coin the term "enhancer sensing" for such a phenomenon, as their work does not rule out the possibility that AFD ablation increases resistance to other stresses that are independent of temperature regarding their toxicity or magnitude of hazard. Additional work is necessary to clarify these issues.

      1. Whether the role of AFD in inhibiting peroxide resistance is related to AFD activity needs further clarification. AFD activity depends on the animal's thermosensory experience. As animals in this study are maintained at 20 degrees unless indicated specifically, the AFD displays activities starting around 17 degrees and peaks around 20 degrees. Under such condition, the AFD displays little or no activity to thermal stimuli around 15 degrees. It will be important to test whether cultivation of animals at 20 degrees improves peroxide resistance at 15 degrees, compared to 15 degrees-cultivation/15 degrees peroxide testing. The authors should also test whether AFD ablation further improves survival under peroxides at 15 degrees for animals grown at 20 degrees, whose AFD should show little or no activities at 15 degrees.

      The reviewer raises an interesting point about the relation between the mechanisms that determine AFD activity in response to temperature and those that enable AFD to regulate peroxide resistance. In the revised manuscript we tested whether known mechanisms enabling AFD to sense changes in temperature acutely (receptor guanylate cyclases GCY-8, GCY-18, and GCY-23) played a role in the temperature dependence of peroxide resistance. We found that they did, as detailed in our response to reviewer #1’s point 1.

      As noted by reviewer #2 in their point 1, and in our reply to that comment (and in a new discussion paragraph in the revised manuscript), the relationship between the known mechanisms the acutely regulate the activity of AFD in response to temperature and the mechanisms by which constant cultivation temperature regulates gene expression in AFD (and therefore the expression of peroxide resistance regulating signals like INS-39) is not well understood. Therefore, it is difficult to predict which temperatures will cause induction of peroxide defenses via AFD-dependent mechanisms, or via other mechanisms. While we agree with the reviewer that it will be interesting to characterize the extent to which other cultivation temperatures besides 25C lead to increased peroxide resistance at lower temperatures (including the proposed shifts from 20C to 15C), we think that those questions will be better addressed in future studies.

      2. The importance of the thermosensory function of AFD should be verified. In the current study, the tax-4 mutation was used to infer AFD activity, but tax-4 is expressed in sensory neurons other than AFD. In addition to AFD, AWC can sense temperature and it also expresses tax-4. Therefore, influence on AFD from other tax-4-expressing neurons cannot be excluded. On the other hand, ablation of AFD removes all AFD functions, including those that are constitutive and temperature-independent. Therefore, the authors should test the gcy-18 gcy-8 gcy-23 triple mutant, in which the AFD neurons are fully differentiated but completely insensitive to thermal stimuli. These three thermosensor genes are exclusively expressed in AFD. Compared to the tax-4 mutant that is broadly defective in multiple sensory modalities, this triple gcy mutant shows defects specifically in thermosensation. They should see whether results obtained from the AFD ablated animals could be reproduced by experiments using the gcy-18 gcy-8 gcy-23 triple mutant. The authors are also recommended to investigate ins-39 expression in AFD and profile gene expression patterns in the gcy-18 gcy-8 gcy-23 triple mutant.

      We thank the reviewer for this suggestion. We have performed the requested experiments, as detailed in our response to reviewer #1’s point 1. Briefly, we determined found that gcy-18 gcy-8 gcy-23 triple mutants increased peroxide resistance at 20C but not at 25C, and found that the respective gcy single mutants affected peroxide resistance at 20C. In light of these findings, we concluded that temperature perception by AFD via GCY-8, GCY-18, and GCY-23 enables C. elegans to lower their peroxide defenses at the lower cultivation temperature.

      3. The literature suggests that AFD promotes longevity likely in part through daf-16 (Chen at al., 2016) or independent of daf-16 (Lee & Kenyon, 2009). Whatever it is, various studies show that activation of AFD and daf-16 promote a normal lifespan at higher temperature, and AFD ablation shortens lifespan at either 20 or 25 degrees. Therefore, the finding that DAF-16-upregulated genes overlap extensively with those upregulated by AFD ablation is quite unexpected (Figure 5B). The authors should perform further gene ontology (GO) analysis to identify subsets of genes co-regulated by DAF-16 and AFD ablation, whether these genes are reported to be involved in longevity regulation, immunity, stress response, etc.

      We thank the reviewer for this interesting comment about the complex mechanisms by which AFD regulates longevity. We note that AFD also has additional temperature-dependent roles in lifespan regulation, as Murphy et al. 2003 found that RNAi of gcy-18 increased lifespan in wild-type worms at 20C but not at 25C. Therefore, AFD-specific interventions can also be lifespan extending at 20C.

      We performed WormCat analysis, which is similar to gene ontology, in Figure 8-figure supplement 2 (formerly Figure S8G), which we described in the results section: “we found that the extent to which AFD ablation affected the average expression of sets of genes with related functions (Higgins et al., 2022; Holdorf et al., 2020) was systematically lower in daf-16(mu86) mutants than in daf-16(+) nematodes (R_2 = 86%, slope = 0.67, _P < 0.0001, Figure 8—figure supplement 2).” Visual inspection of the plot and the very high coefficient of determination of 86% indicate that the size of the effect of AFD ablation on gene expression was systematically smaller when the contribution of DAF-16 to gene expression was removed.

      In the revised manuscript we also moved the three panels quantifying the expression of DAF-16 targets and daf-16-regulated genes from the supplement to the main figure. One of those panels (Figure 8F) shows that genes upregulated by daf-16(+) in daf-2 mutants were disproportionally affected by lack of daf-16 in AFD-ablated worms, as we described in the results section: “In addition, in AFD ablated nematodes, lack of daf-16 lowered the expression of genes upregulated in a daf-16-dependent manner in daf-2(-) mutants (Murphy et al., 2003) to a greater degree than in unablated nematodes (Figure 8F).”

      4. I feel that "enhancer sensing" is an overstatement, or at least a premature term that is not sufficiently supported without further investigations. The authors should explore whether AFD ablation or pre-exposure to warm temperature specifically enhances resistance to a stressor the toxicity of which is increased at higher temperature, but does not affect the resistance to other temperature-insensitive threats.

      We edited the manuscript and expanded the manuscript’s discussion to address these concerns as well as similar concerns from reviewer #1. For clarity, we repeat much of our response to reviewer #1’s point 2 here, with the last paragraph of this response specific to this reviewer’s comment.

      In the paper we show that in C. elegans the regulation of the induction of H2O2 defenses is coupled to the perception of temperature (an inherent enhancer of the reactivity of H2O2). To understand the significance of this finding in an evolutionary context, and to explain why such a regulatory system would evolve, we introduced in the discussion a new conceptual framework, “enhancer sensing,” and devoted a section of the discussion to demonstrating that the phenomenon that we observed could not be adequately explained by existing frameworks used to understand the evolutionary origins of the regulatory systems for defense responses.

      We now realize that we did not sufficiently and clearly explain the scope for the criterion for establishing a phenomenon represents enhancer sensing, leading to incorrect predictions by reviewer’s 1 and 3 about (a) whether what we observed in C. elegans is an instance of enhancer sensing (or more proof is needed) and (b) what the enhancer sensing model for the coupling of temperature perception to H2O2 defense would predict about how temperature and the AFD neurons would affect resilience to other chemicals. We regret failing to adequately explain the model’s scope and predictions and believe that we have now explicitly addressed the scope of what constitutes enhancer sensing and the predictions of the model. In particular, we previously did not spell out (a) the distinction between the enhancer sensing strategy and the mechanistic implementation of that strategy; and, importantly, (b) we did not discuss what the enhancer sensing strategy coupling temperature perception to H2O2 defense in C. elegans predicted (and did not predict) about whether a similar strategy would be expected to be used by C. elegans to deal with other temperature-dependent threats. We now address these issues in two new paragraphs in the discussion that read:

      “We show here that C. elegans uses an enhancer sensing strategy that couples H2O2 defense to the perception of high temperature. We expect this strategy’s output (the level of H2O2 defense) to provide the nematodes with an evolutionarily optimal strategy across ecologically relevant inputs (cultivation temperatures) (Kussell and Leibler, 2005; Maynard Smith, 1982; Wolf et al., 2005). This strategy is implemented at the organismic level through the division of labor between the AFD neurons, which sense and broadcast temperature information, and the intestine, which responds to that information by providing H2O2 defense (Figure 9D). Ascertaining that C. elegans relies on this enhancer sensing strategy does not depend on the temperature information broadcast by AFD exclusively regulating defense responses to temperature-dependent threats, because the regulation of defense towards temperature-insensitive threats could affect defenses towards temperature-dependent threats; for example, suppressing defenses towards a temperature-insensitive threat would be beneficial if those defenses interfered with H2O2 defense or depleted energy resources contributing to H2O2 defense.

      As with any sensing strategy, enhancer sensing strategies are more likely to evolve when sensing is informative and responding is beneficial. In their natural habitat, C. elegans encounter many environmental chemicals that, like H2O2, are inherently more reactive at higher temperatures. It will be interesting to determine the extent to which C. elegans uses enhancer sensing strategies coupling temperature perception to the induction of defenses towards those chemicals, and whether those strategies rely on temperature perception and broadcasting by the AFD neurons. We expect that sensing strategies regulating defense towards those chemicals would be more likely to evolve when those chemicals are common, reactive, and cause consequential damage.”

      We note, in the first of the new discussion paragraphs, that the existence of an enhancer sensing strategy is not contingent on whether the AFD neurons (that implement the temperature sensing and temperature-information broadcasting functions regulating peroxide defenses) also do not regulate defense responses to temperature-insensitive threats. For example, it may be beneficial to an animal facing high concentrations of environmental peroxides to suppress defense against a temperature-insensitive threat when those defenses are detrimental towards defense towards hydrogen peroxide. This could occur, for example, because there is an energetic trade off when mounting multiple defense responses, or because specific defenses towards temperature-insensitive threats interfere with peroxide defense. As we noted in our response to reviewer #1’s point 2, our ability to predict survival to threats other than H2O2 (including temperature-independent threats) is limited not only by our lack of knowledge about the specific mechanisms that protect worms from those threats, but also by our inability to predict the extent to which defenses towards different threats operate independently, constructively, or destructively with those that provide hydrogen peroxide defense. We therefore think that those experiments would be better addressed in future studies.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript investigates the gene regulatory mechanisms that are involved in the development and evolution of motor neurons, utilizing cross-species comparison of RNA-sequencing and ATAC-sequencing data from little skate, chick and mouse. The authors suggest that both conserved and divergent mechanisms contribute to motor neuron specification in each species. They also claim that more complex regulatory mechanisms have evolved in tetrapods to accommodate sophisticated motor behaviors. While this is strongly suggested by the authors' ATAC-seq data, some additional validation would be required to thoroughly support this claim.

      Strengths of the manuscript:

      1) The manuscript provides a valuable resource to the field by generating an assembly of the little skate genome, containing precise gene annotations that can now be utilized to perform gene expression and epigenetic analyses. The authors take advantage of this novel resource to identify novel gene expression programs and regulatory modules in little skate motor neurons.

      2) Cross-species RNA-seq and ATAC-seq data comparisons are combined in a powerful approach to identify novel mechanisms that control motor neuron development and evolution.

      Weaknesses:

      1) It is surprising that the analysis of RNA-seq datasets between mouse, chick, and little skate only identified 5 genes that are common between the 3 species, especially given the authors' previous work identifying highly conserved molecular programs between little skate and mouse motor neurons, including core transcription factors (Isl1, Hb9, Lhx3), Hox genes and cholinergic transmission genes. This raises some questions about the robustness of the sequencing data and whether the genes identified represent the full transcriptome of these motor neurons.

      To address reviewer #1’s questions, we have generated RNA sequencing data with mouse forelimb MNs and re-analyzed the RNA-seq data using only the homologous MN populations (Figure 3) among different species. As a result, many genes (1038 genes) are commonly expressed in MNs in different species, including many known MN marker genes. In the result section, we have added the following:

      “The evolution of genetic programs in MNs was investigated unbiasedly by comparing highly expressed genes in pec-MNs (percentile expression > 70) of little skate with the ones from MNs of mouse and chick, two well-studied tetrapod species. In order to compare gene expression with homologous cell types from each species, we performed RNA sequencing on forelimb MNs of mouse embryos at embryonic day 13.5 (e13.5) and wing level MNs of chick embryos at Hamburger-Hamilton (HH) stage 26–27…”

      We have also compared our re-analysis with previous results in Figure 2–figure supplement 1, shown above. Most of the fin MN genes (21/24) are highly expressed in pecMNs (percentile > 70), consistent with the previous in situ experiments. In the Results we have added the following:

      “Although the total number of DEGs are different from the previous data (592 vs. 135 genes in pec-MN DEGs), which might be caused by different statistical analysis with different reference genome, previous RNA-seq data based on de novo assembly and annotation using zebrafish was mostly recapitulated in our DEG analysis based on our new skate genome (21 out of 24 previous fin MN marker genes have the expression level ranked above 70th percentile in Pec-MNs; Figure 2‒figure supplement 1).”

      2) The authors suggest based on analysis of binding motifs in their ATAC-seq data that the greater number of putative binding sites in the mouse MNs allows for a higher complexity of regulation and specialization of putative motor pools. This could certainly be true in theory but needs to be further validated. The authors show FoxP1 as an example, which seems to be more heavily regulated in the mouse, but there is no evidence that FoxP1 expression profile is different between mouse and skate. It is suggested in Fig.5 that FoxP1 might be differentially regulated by SnaiI in mouse and skate but the expression of SnaiI in MNs in either species is not shown.

      We have added further discussion and data about differential expression of Foxp1 in mouse and little skate in Figure 5–figure supplement 16 and have discussed as follows:

      “Foxp1, the major limb/fin MN determinant appears to be differentially regulated in tetrapod and little skate. Although Foxp1 is expressed in and required for the specification of all limb MNs in tetrapods, Foxp1 is downregulated in Pea3 positive MN pools during maturation in mice (Catela et al., 2016; Dasen et al., 2008). In addition, preganglionic motor column neurons (PGC MNs) in the thoracic spinal cord of mouse and chick express half the level of Foxp1 expression than limb MNs. Although PGC neurons have not yet been identified in little skate, we tested the expression level of Foxp1 using a previously characterized tetrapod PGC marker, pSmad. We observed that Foxp1 is not expressed in MNs that express pSmad (Figure 5‒figure supplement 3). Since there is currently no known marker for PGC MNs in little skate, our conclusion should be taken with caution.”

      As for Snai1, in the revision we performed a motif enrichment analysis with an unbiased gene list where Snai1 didn’t show up. However, when we performed an RNA in situ hybridization experiment for Snai1 (Figure 5–figure supplement 3), we found that Snai1 is expressed in MNs of both mouse and little skate, but not in chick, which has been shown previously (Cheung et al., 2005). In order to examine the function of Snai1 in the regulation of Foxp1 expression, we ectopically expressed Snai1 in chick spinal cord by performing in ovo electroporation. However, we did not detect any changes in Foxp1. Instead we observed an increase in the number of neurons and abnormal MN exits from the spinal cord, which is the reminiscent of a previous observation (Zander et al., 2014). Although we did not detect any changes in Foxp1 expression, we cannot rule out the possibility that Snai1 regulates Foxp1 in mouse and little skate, which may require a gene knock out experiment. Because binding sites of Snai1 were not enriched in the new gene sets that we analyzed in the revision, we have not further discussed the Snai1 in the text.

      3) In their discussion section the authors state that they found both conserved and divergent molecular markers across multiple species but they do not validate the expression of novel markers in either category beyond RNA-seq, for example by in situ or antibody staining.

      We have added RNA in situ hybridization results in Figure 3C and Figure 3–figure supplement 1 and 2. Most of the genes were expressed in tissues in accordance with the sequencing results (6 out of 9 common MN genes; 4 out of 6 mouse specific genes; 5 out of 7 skate specific genes). Specifcally, Uchl1, Slc5a7, Alcam, and Serinc1 are expressed in MNs of all three species; Coch, Ppp1rc, Ctxn1, and Clmp are expressed in MNs of mouse but not in MNs of other species; Eya1, Etv5, Dnmbp, and Spint1 are expressed in MNs of skate but not in MNs of other species. In the result section, we have summarized the results as follow:

      “These results were validated by performing RNA in situ hybridization in tissue sections on a subset of species-specific genes …”

    1. Author Response

      Reviewer #1 (Public Review):

      Switching between epithelial and mesenchymal populations is an important stage for cancer growth and metastasis but difficult to study as the cells in this transition are rare. In this study Xu et al investigate changes the splicing regulator environment and changes in specific splice events by monitoring colon cancer cell populations that have epithelial and mesenchymal properties (so are potentially in transition) compared their epithelial partners. Using these potentially transitioning cells should reveal new insights into the causative changes occurring during EMT, a key life threatening step in colon cancer progression, and other cancers too.

      The authors were trying to establish if changes in the splicing environment occurred between epithelial and quasi-mesenchymal cells and to what extent this is important for colon cancer in establishing gene expression programs and cell behavior related to metastasis. The take home message is that these more "plastic" mesenchymal cells are expressing the mesenchymal transcription factor ZEB1 and reducing expression of the epithelial splicing factor ESRP1 (as well as some other RBPs). The FACS analysis showing that over-expression of ESRP1 alone can switch cell population ratios is very clear and indicates that reduction of this RBP plays a key role in making cells more metastatic. The lentiviral overexpression of CD44s and NUMB2/4 had very dramatic effects on increasing metastatic cellular properties. The clinical stratification analysis of splice isoforms and ZEB1/ESRP1 expression was very informative for understanding what is happening in actual tumors. The methods used and results from these studies are likely to have an impact on understanding the gene expression changes that take place during EMT.

      Strengths: The authors have used cell lines that model switching cells between epithelial and quasimesenchymal, based on expression of the markers Epcam (epithelial cell adhesion molecule expressed in epithelial cells) and CD44. The study utilizes shRNA-mediated knockdown and lentiviral overexpression of

      ESRP1 and splice isoforms, and monitors endogenous mRNA splice isoforms by RNAseq and qRTPCR, protein isoforms by western, cell surface expression of EpCAM and CD44 using FACS and metastatic potential using a mouse model, and patient gene expression data from TCGA.<br /> Weaknesses: Some of the data here might be novel for colon cancer, but the roles of these RNA binding proteins and ESRP1 target exons are better known in other cancers. Both CD44 and NUMB are known ESRP1 targets already in cells undergoing plasticity (e.g. PMID: 30692202). RBM47 is already known to be downregulated in EMT and quaking upregulated (PMID: 28680090; PMID: 27044866). There is also a lot of literature on ESRP1 expression in cancer and EMT. This should be better discussed.

      Out of the 3 references mentioned, 2 are already discussed in the submitted manuscript, while the third (Rokavec et al.) has now been added to the Discussion. As specified above, we never claimed to be the first to report on these RBPs and downstream AS targets. Unfortunately, it is not clear how the reviewer wants us to improve on these aspects (“should be better discussed” is rather vague) but we have now tried to extend the discussion relative to these issues in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to establish canine tissue-specific organoids for propagation, storage and potential use in biomedical and translational medicine.

      Strengths - The project is ambitious in aim, seeking to raise 6 tissue-specific, stem cell-derived organoid lines.

      Weaknesses -

      1) While the manuscript refers to stem cell lines, no evidence of progressive organoid morphogenesis has been shown from undifferentiated single stem cells or stem cell clusters. This omission makes it difficult to distinguish true organoids from surviving pieces of parental tissue that the authors actually include within their cultures. The authors infer that high order tissue complexity can be generated within in short term 3D cultures. For example, their kidney organoids contained glomeruli, renal tubules and a Bowman's'capsule. These remarkable findings contrast with a previous study by Chen et al 2019 that showed kidney organoids had restricted morphogenic capacity, forming only simple epithelial dome-like structures (Chen et al 2019). Although the Chen study was cited, the major differences in study findings were not discussed. In the current study, no compelling evidence is provided for the integrated assembly of the glomerular microvascular capillary network, the glomerular epithelial capsule and complex tubular epithelial collecting ducts, during organoid growth.

      Thank you, clarification was made regarding the differences between Chen et al. 2019 and our organoids in our revision of the manuscript (Lines 445-447). The sentence regarding glomeruli, renal tubules, and Bowman's capsules was modified to specifically state the morphological resemblance our organoids have to these structures. We further clarified in the text that we are not stating these structures are complete (glomerular microvascular capillary) (Line 236), as our data are too preliminary to support this statement. However, in future publications we are excited to complete more in-depth characterization and investigation, along with functional assessment. To aid in characterization, three immunohistochemistry (IHC) antibodies were added to Figure 2.

      2) The potential of the organoids for freezing, storage and re-culture is unclear from the data presented.

      We did not present data regarding the re-culturing of organoids from this manuscript. However, we are working on additional publications which have already thawed and regrown multiple cell lines from this manuscript leading us to believe it is possible for all lines cultured in this manuscript. Further investigation into long term expression changes after thawing is warranted in future investigations.

      3) Organoid capacity for regenerative growth in xenograft models has not been tested.

      We did not investigate this in the current manuscript, from our understanding, manuscripts which describe a new organoid model typically do not utilize xenografts to confirm the regenerative growth capacity. The use of organoids in xenograft models is an exciting avenue to explore in the future.

      4) Figure 4 lacks appropriate positive and negative tissue controls.

      Please see Figure 5-figure supplement 1 for all negative control images. The tissues of origin in Figure 4 (now Figure 5) were used as positive controls for the antibody.

      5) Gene expression differences between tissues and organoids are inadequately explained.

      Our apologies for the lack of clarity of our original manuscript. Differences of gene expression between tissues and organoids were compared in the revised Results section of each organ. To better describe the differences seen between tissues and organoids, information was added to the discussion elaborating on cell types present and missing from our samples (Lines 315-321).

      6) Methodological detail is sparse. It is not clear how tissue biopsies are obtained, what size they are and how they are processed for organoid preparation.

      Our apologies. Information regarding biopsies was added in Experimental Procedures specifically in the Tissue collection section (Lines 509-510, 519-535). Additional details on protocols for organoid preparation and culturing were added to the Experimental Procedures and are cited in Gabriel et al. 2022 (Lines 535-548).

      7) The manuscript as a whole is poorly focussed and difficult to follow. The introduction is repetitive with only weak relevance to the main experiments.

      We appreciate the reviewer’s concern. To better focus this manuscript, we re-ordered the introduction to be more linear and improve the focus regarding the main experiments. We hope our revisions will satisfy the reviewer.

      Appraisal - The lack of morphogenesis and xenograft data undermines confidence that the authors have achieved their aims. The above concerns are also likely to hamper utility of the methods for the scientific community.

      We appreciate the listed concerns. These novel organoid models are not limited to applications pertaining to xenografts. Our aim was to develop novel organoid lines that we believe can be of use to a variety of fields including pharmacology, virology, and basic research. The testing of these organoids in xenograft models is outside the scope of the current manuscript.

      Reviewer #2 (Public Review):

      Zydryski et al. develop a comprehensive toolbox of organ-specific canine organoids. Building on previous work on kidney, urinary bladder, and liver organoids, they now report on lung, endometrium, and pancreatic organoids; all six organoid lines are derived from two canines. The authors attempt to benchmark these organoids via histological, transcriptomic, and immunofluorescence characterization to their cognate organs. These efforts are a welcome development for the organoid field, broaden the scope of use to studies with canine models, and seek to establish robust standards. The organ specific RNAseq dataset is also likely to be useful to other researchers working with the canine model.

      A key methodological advance would appear to be that the authors culture these organ-specific organoids using a common cell culture media. This is not the typical protocol in the organoid field; however, the authors do not provide enough information in the manuscript to evaluate if this is a good choice. Furthermore, it is likely that the authors were successful because they included additional tissue components in the co-culture for the organoids which might have provided the necessary tissue specific cues, but the methodological details to reproduce this and the technical evaluation of this approach are missing.

      This is an excellent point and details were added to the methods section to better explain the embedding process in our revision of the manuscript (Lines 519-532). Your hypothesis about the tissue-specific cues is very intriguing and something we should explore in the future. Previous publications have isolated ECM from the native tissue (Giobbe et al. 2019), this may be a similar mechanism as you stated.

      The authors also directly compare the transcriptional responses of the organoids with the organs, but this is a challenging enterprise given that the organoid models do not incorporate resident immune cells and typically are composed only of epithelial cells. This lack of an 'apples to apples' comparison might explain why in many cases the organoids and organs are highly divergent; however, it could also be that the common cell culture media did not lead to specific maturation of cell types.

      We agree, this manuscript aimed to derive epithelial organoids, and we acknowledge the lack of all cell types present in the tissues. The comparison was meant to identify similarities (epithelial cells) and the current limitations of the organoid model. We added to the Discussion, specifically the Insights into organ-specific genes section to further clarify this point (Lines 315-321).

      Reviewer #3 (Public Review):

      Zydrski et al. describe the generation and characterization of multiple adult tissues from canines. While canine derived organoids could potentially be advantageous over murine and human organoids, the novelty of generation and characterization is limited, as organoid systems are now being rapidly genetically editing using CRISPR technologies and modeled within immunocompetent environments. Certain points limit my enthusiasm.

      First, the authors do not support the use of serum (FBS) in their media and why they include the same growth and differentiation factors across all tissue types.

      We added a sentence to the Discussion (Canine organoids as biomedical models) to further clarify the reasoning behind the inclusion of the same growth factors for all tissue types

      “The use of the same media composition lends itself to future applications of co-culture or use in assembloid models where multiple organoid lines are combined and continued growth in a shared media is required”

      As this media is based on canine intestinal organoid media, the FBS was included in case of potential applications require the co-culture of intestinal organoids.

      Second, while bulk RNA sequencing data shows similarity per certain genes to the corresponding tissue, there is a lack of detailed characterization of what passage these organoids were harvested and how they change over time. Do they become more stem like and are they genetically stable?

      The passage number of samples when they were harvested are listed in Supplemental file 1. The question of being genetically stable is an excellent point regarding organoids. We have not examined that yet in these canine organoids; however, we can leverage previous publications regarding organoids and how they are genomically stable over time regarding chromosome number and base pair changes, we added these citations into the introduction (Line 48). However, this current manuscript focuses on the derivation and initial characterization; future work will focus on the re-growth, genetic stability, and functional assessment of canine organoid lines.

      Third, it would be important to demonstrate that these organoids can be genetically manipulated or be exposed to drugs and how they might be beneficial over murine and human organoids.

      The genetic editing of twelve organoid lines is outside the scope of this paper and we plan to include this element in future publications. We believe that the organoids can be useful for veterinary medicine as well as being an important model or human disease as canines typically better represent humans better than mice (Lines 28-34, 77-95).

      Fourth, the organoid complexity is not clear and cannot be ascertained from bulk RNA sequencing- for example, do kidney organoids recapitulate canonical markers at the protein level of proximal tubules, distal convoluted tubules, etc. Are different lung cells represented (AT1/AT2/club) and what is the composition of these cells? Why are these cells selected for?

      Thank you. We agree that bulk RNA sequencing has its limitations when it comes to heterogenous cell populations. This was meant to give a first insight into whether the organoid lines resemble their tissue of origin, the addition of single cell RNA-sequencing in the future is worth investigating.

      Fifth, as the authors note, methodically these canine organoids have been developed before from other tissues. For these reasons, my enthusiasm is diminished and unfortunately many of the necessary experiments for further consideration appear out of the scope of the study.

      Three of these organoid lines have previously been published in canines. However, the growth and characterization of three novel organoid lines is included in this manuscript, while typically a manuscript focuses on one novel organoid line. Furthermore, unique to this study is the multi-organ comparisons of expression across both tissues and organoids from the same animal, with a biological replicate being a related individual which is unique to this study. In human and murine field, the organoid media must be adjusted to each individual organ the stem cells are isolated from. We show that our media composition, which is similar to that which previously supported hepatic and intestinal canine organoids, can now support organoids from six different tissues, bringing a novel approach to the field. To our knowledge, this is the most comprehensive comparison across tissue types of canine organoids. Additionally we have not seen any literature of the comparison of six different organoid lines from the same individual, with a related biological replicate in any other species.

    1. Author Response

      Reviewer #1 (Public Review):

      This study examines whether the D2 receptor antagonist amisulpride and the mu-opioid receptor antagonist naltrexone bias model-based vs model-free behavior in a well-established two-step task of behavioral control. The authors find that amisulpride enhances model-based choices, which is further supported by computational modeling of the data, revealing an increase in the relative contribution of model-based control of behavior. Naltrexon on the other hand had no reliable effect on model-based behavior.

      Overall, this is a very nice study with many strengths, including the task and data analysis. A particular strength of the design is the combination of a between-subject drug administration protocol with two within-subject (baseline vs. drug) sessions. This reduces between-subject variability in baseline model-based vs model-free behavior and enhances the power to detect drug effects.

      The introduction could do a better job articulating the rationale for testing the effect of these two specific drugs. Currently, the rationale is that both transmitter systems targeted by these drugs are involved in drug addiction, which is characterized by an imbalance in model-based vs. habitual control of behavior. This appears somewhat indirect.

      Blood draws were used to determine serum levels for amisulpride and naltrexone but these data are not included as covariates in the analysis.

      We thank the reviewer for the high acclaim of our study, and for the constructive comments to improve it. We acknowledge that the introduction did not motivate the main research goal of the manuscript clearly enough. We have now extended this section and provided further insight into our reasoning behind the study design. Beyond the involvement of opioid and dopamine promoting drugs in addiction, there is abundant evidence from experimental studies showing comparable effects of manipulating receptors of both systems in model-free processes such as reinforcement, and habit formation. Based on this overlap one may predict that both neurotransmitter systems disrupt habit formation in a similar fashion, and that blocking their respective receptors will improve the ability to behave in a model-based manner. However, as we now elaborate in the manuscript, an argument against this could be that disrupting model-free processes might not be enough to promote model-based behaviour, as such behaviour relies heavily on cognitive control. It is therefore especially interesting to compare opioid antagonists, that do not enhance cognitive function, with a D2 antagonist at a dosage that has been shown to increase cognitive control as well as increase the desire to exert cognitive effort.

      This is expressed in the following paragraphs of the Introduction (p.2 §3 and p.3 §1):

      “Opiates, psychostimulants, and most other drugs of abuse increase the release of dopamine along the mesolimbic pathway (Chiara, 1999; Koob & Bloom, 1988), a circuit that plays a central role in reinforcement learning (Schultz, Dayan, & Montague, 1997). On top of this, the reinforcing properties of addictive drugs also depend on their ability to activate the μ opioid receptors (Becker, Grecksch, & Kraus, 2002; Benjamin, Grant, & Pohorecky, 1993; Le Merrer, Becker, Befort, & Kieffer, 2009). This suggests that both the dopamine and the opioid systems might be particularly relevant in model-free reinforcement learning processes that drive the formation of habitual behaviour. Studies in rodents show that activating receptors of both systems across the striatum increases cue-triggered wanting of rewards (Peciña & Berridge, 2013; Soares-Cunha et al., 2016). Conversely, inhibition of both D1-type and D2-type of dopamine receptors (referred to as D1 and D2 from here on) as well as opioid receptors reduces motivation to obtain or consume rewards (Laurent, Leung, Maidment, & Balleine, 2012; Peciña, 2008; Soares-Cunha et al., 2016). This data raises the hypothesis that the drift towards habitual control is enabled by dopamine and opioid receptors via a common neural pathway. Recent work in humans provides some evidence in this direction, whereby systemic administration of opioid and D2 dopamine receptor antagonists causes a comparable reduction of cue responsivity and reward impulsivity (Weber et al., 2016) and decreases the effort to obtain immediate primary rewards (Korb et al., 2020). This suggests that when allocating control between the model-based and model-free system, dopamine or opioid receptor antagonists might comparatively disrupt model-free behavioural strategies and increase model-based behaviour. Yet, no study in humans has directly investigated this. Furthermore, disrupting habit formation might not in itself lead to increased model-based control, without either increasing the perceived value of applying cognitive control or making it easier to do so.”

      We also mention the implications of this direct comparison of the two compounds in the Discussion (p.8 §1):

      “Our findings provide initial evidence for a divergent involvement of the dopamine and opioid neurotransmitter systems in the shift between habitual and goal-directed behaviour. The lack of effects of naltrexone on the model-based/model-free trade-off also provides some support for the notion that simply disrupting neurobiological systems that subserve habitual behaviour might not be enough to increase goal-directed behaviour in this task. An increase in the model-based/model-free weight following amisulpride administration advocates for dopamine playing a decisive role in flexibly applying cognitive control to facilitate model-based behavior and highlights the specific functional contribution of the D2 receptor subtype.”

      Reviewer #3 (Public Review):

      I think this is an interesting study on an important topic. I agree that there is not enough research to understand how the dopaminergic system interfaces with goal-directed planning, and I like the focus on specific types of dopamine receptors. It is interesting that they seem to find a specific effect on just the dopamine antagonist. I also appreciate the clarity with which the authors describe this field of research and their results. However, I also feel that there are several concerns with this paper, both in terms of framing and in terms of the experimental design and analysis. For completeness, I must note that I am not a dopamine expert.

      I felt that the introduction of the paper did not sufficiently motivate the focus on the comparison between neurotransmitters systems, and (for the dopaminergic system) the distinction between D1/D2 receptors. Why is the mapping between stability/flexibility and D1/D2 receptors important? How does this relate to model-based control? Why do the authors predict that model-based control would increase when D2 receptors are blocked? If the hypothesis is about contrasting the contribution of D1 and D2 receptors to goal-directed control, why did the authors not use antagonists directly targeting these two systems?

      In addition, the predictions that are more explicit, for example, that blocking D2 receptors increases MB control by stabilizing goal-relevant information, are fairly specific. However, the current version of the two-step task is not amenable to testing such a specific hypothesis, because it doesn't allow us to measure the specific components of planning (e.g., maintaining goals, the representation of the structure, prospective reasoning). Moreover, MB control in this version of the two-step task is marked by flexibility, because it requires the agent to be sensitive to switching starting states.

      The predictions for the opioid system are also lacking. Why are the authors targeting this system? Why are they comparing the effects of the D2 antagonist with the opioid agonist? Why do the authors predict that amisulpride should have a stronger effect than naltrexone? In my opinion, these predictions were not sufficiently laid out, which made it difficult to appreciate the authors' motivation to run the study.

      We thank the reviewer for their critical take on the manuscript and for clearly pointing out the weaknesses in argumentation. In particular, we appreciate the reviewer’s comment on the lack of clarity in describing why the comparison of dopamine and opioid antagonists’ effects on MB/MF behaviour might be particularly interesting and why we focused on D2 and not D1 receptors. We now extended the introduction section to clarify our rationale for comparing these two compounds (p.2-3). In short, apart from the fact that both systems are implicated in addiction, there is also abundant experimental evidence from human and non-human animal studies that the two systems are involved in processes related to forming habitual responses to primary and secondary rewards. This suggests that blocking receptors of either system might comparatively affect the MB/MF trade-off by impairing model-free processes. We therefore proceeded to compare opioid and dopamine antagonists.

      As we note, using D1 antagonists would likely be detrimental to cognitive control related processes, and therefore more likely to decrease model-based performance. We therefore chose to compare opioid antagonists to D2 receptor antagonists. Another important reason for comparing the effects of opioid and D2 dopamine antagonists is the reasoning that it is not clear whether blocking model-free processes is in itself enough to promote model-based behaviour, without boosting cognitive control related processes. Given the recent evidence for D2 antagonists increasing cognitive effort (Westbrook et al., 2020) and the proposed role of prefrontal D2 receptors in destabilising prefrontal representations (according to the dual state theory of prefrontal dopamine function proposed by Durstewitz & Seamans, 2008)) we reasoned that D2 receptor blockade might also boost the ability (or willingness) to keep the mapping between spaceships and planets online while making choices.

      We incorporated these arguments in the revised Introduction (p.2-3):

      “Opiates, psychostimulants, and most other drugs of abuse increase the release of dopamine along the mesolimbic pathway (Chiara, 1999; Koob & Bloom, 1988), a circuit that plays a central role in reinforcement learning (Schultz et al., 1997). On top of this, the reinforcing properties of addictive drugs also depend on their ability to activate the μ opioid receptors (Becker et al., 2002; Benjamin et al., 1993; Le Merrer et al., 2009). This suggests that both the dopamine and the opioid systems might be particularly relevant in model-free reinforcement learning processes that drive the formation of habitual behaviour. Studies in rodents show that activating receptors of both systems across the striatum increases cue-triggered wanting of rewards (Peciña & Berridge, 2013; Soares-Cunha et al., 2016). Conversely, inhibition of both D1-type and D2-type of dopamine receptors (referred to as D1 and D2 from here on) as well as opioid receptors reduces motivation to obtain or consume rewards (Laurent et al., 2012; Peciña, 2008; Soares-Cunha et al., 2016). This data raises the hypothesis that the drift towards habitual control is enabled by dopamine and opioid receptors via a common neural pathway. Recent work in humans provides some evidence in this direction, whereby systemic administration of opioid and D2 dopamine receptor antagonists causes a comparable reduction of cue responsivity and reward impulsivity (Weber et al., 2016) and decreases the effort to obtain immediate primary rewards (Korb et al., 2020). This suggests that when allocating control between the model-based and model-free system, dopamine or opioid receptor antagonists might comparatively disrupt model-free behavioural strategies and increase model-based behaviour. Yet, no study in humans has directly investigated this. Furthermore, disrupting habit formation might not in itself lead to increased model-based control, without either increasing the perceived value of applying cognitive control or making it easier to do so. Crucially, there are important differences in how each of the two neurochemical systems relate to cognitive control that is pivotal for model-based behaviour. Across a wide range of studies using various dosing schemes, opioid receptor antagonists did not have an effect on tasks that require cognitive control, such as working memory (Del Campo, McMurray, Besser, & Grossman, 1992; File & Silverstone, 1981; Volavka, Dornbush, Mallya, & Cho, 1979), sustained attention(Zacny, Coalson, Lichtor, Yajnik, & Thapar, 1994), or mathematical problem-solving (Del Campo et al., 1992) (see (van Steenbergen, Eikemo, & Leknes, 2019) for a review). Dopaminergic circuits, on the other hand, play a central role in higher cognitive functions and goal-directed behaviour (Brozoski, Brown, Rosvold, & Goldman, 1979). In particular, D1 dopamine receptors in the prefrontal cortex enable maintenance of goal-relevant information and working memory(Goldman-Rakic, 1997; Sawaguchi & Goldman-Rakic, 1991; van Schouwenburg, Aarts, & Cools, 2010; Williams & Goldman-Rakic, 1995), while the D2 dopamine receptor activity disrupts prefrontal representations(Durstewitz & Seamans, 2008). In support of this, decreased working memory performance was observed after blocking prefrontal D1, but not prefrontal D2 receptors (Arnsten, 2011; Sawaguchi & Goldman-Rakic, 1991; Seamans & Yang, 2004). In humans, systemic administration of D2 antagonism increased the ability to maintain and manipulate working memory representations (Dodds et al., 2009; Frank & O’Reilly, 2006) and increased the value of applying cognitive effort (Westbrook et al., 2020). This data suggests that blocking D2 receptors, in contrast to blocking opioid receptors, could further facilitate model-based behaviour through enabling or encouraging flexible use of cognitive control.”

      Another important point that the reviewer stresses is that the two-step task we use does not allow us to draw any conclusions through which mechanisms amisulpride increases model-based behaviour. Although we base our hypothesis that D2 might promote model-based behaviour (on top of disrupting habit formation) on previous work showing D2 blockade increasing cognitive effort and the ability to manipulate working memory representations, we completely agree that our setup does not give any definite answers about which of these cognitive processes mediated the increase in model-based weights. In the discussion we try to interpret our findings in the context of the dual-state hypothesis framework and within the framework of striatal control of adaptive behaviour (p.8 §3-4), whereby we centre our argumentation around dopaminergic circuits that subserve one or the other mechanism.

      We agree with the reviewer that the task requires a high degree of flexible planning and that the dual-state theory might not be enough to account for our effects. We mention this in the Discussion (p. 8 §3):

      “The effects of D2 antagonism on model-based/model-free behaviour in our study can be interpreted within this [dual-state] framework to result from increased ability to maintain prefrontal representation of the mapping between the spaceships and the planets online. However, this is difficult to reconcile with the fact that model-based behaviour in dynamic learning paradigms, such as the one used here, also requires flexible updating of action values.”

      We also elaborate on the general limitations of drawing inference about the underlying cognitive/computational mechanisms in the Discussion (p. 14 §2):

      “Importantly, it should also be acknowledged that the behavioural setup in our study does not allow us to draw definite conclusions about the mechanisms that mediate amisulpride’s effects on model-based or model-free behaviour. For example, it is not clear whether amisulpride increases the perceived benefit of applying cognitive control, or whether it increases the participant’s ability to do so through various possible complementary processes, such as goal maintenance or planning abilities. Future studies should further elucidate the mechanistic contributions of dopamine receptors to the distinct coding and utilisation of task relevant representations (Langdon, Sharpe, Schoenbaum, & Niv, 2018; Stalnaker et al., 2019).”

      Related to this, I felt that the introduction was a bit too quiet on the genetic markers. Their discussion in the results was a bit surprising, and it wasn't quite clear why the authors decided to investigate these interaction effects.

      We appreciate this comment as we were quite uncertain ourselves on how much weight to give to those data. Previous research had indeed shown profound variability in MB/MF behaviour across genotypes related to baseline dopamine function. The main purpose of the genetic analysis was to control for potential baseline differences and to explore the drug genotype interactions. However, including the serum data as a covariate in analyses, as suggested by the other reviewers, made most results relating to the genetic analysis disappear, even when using less conservative priors that likely understate the variance of posterior distributions of group effects. We have therefore opted to keep coverage of the genetic data to a minimum, but still report the results and make the data available online for future studies.

      I found some of the core results confusing. Most importantly, why does amisulpride make people less like to stay after a reward when the first-stage state is the same? When first-stage states repeat, both an MB agent and an MF agent will be more likely to stay after a reward. To me, this kind of behavior doesn't seem particularly model-based. Why does this behavior occur under amisulpride? I was surprised that the authors did not really address it.

      We agree that these results have been somewhat difficult to reconcile. However, adding amisulpride serum levels to our analyses now allow us to get a better understanding. It seems that across both serum groups model-based behaviour was increased, however, only in the high serum group did we additionally observe increased exploration. We also note that increased exploration was related to a reduced effect of previous points in the first same state trials, whereas the interaction term (effect of previous points in diff vs. same state trials) was more strongly associated with the model-based weight. In the manuscript this is described in the results section and in the discussion.

      The following text is included in the Results (p.6):

      “We first observed that the more model-based choices the participants made, the more money they earned (r = 0.65, 95% CI [0.53, 0.76]). This serves as a validity check of the task, which was designed to make cognitive control pay off (literally)45. We then looked at how the model parameters relate to the random slopes from the behavioural analysis of staying behaviour and found that the participant-level (random effect) slope for the effect of previous points on staying behaviour in different vs. same first state trials was most strongly related to ω (d = 0.493, P < 10e-3) and negatively related to the inverse temperature parameter η (d = -0.328, P < 10e-3), and the slope for trials with same first states was mostly related to η (d = 0.822, P < 10e-3), and less so to ω (d = 0.235, P < 10e-3).”

      The following text is included in the Discussion (p.8 §2):

      “Interestingly, amisulpride also increased choice stochasticity parametrised by the softmax inverse temperature parameter. In a paradigm with two choice options, it cannot be definitively determined whether this indicates higher decision-noise or increased exploration of alternative choices. We can however speculate that increased decision noise would lead to overall detrimental effects on learning in both trial types with same and different consecutive first stage states, which we do not observe in our data. The effect on the choice stochasticity parameter was only present in participants with a higher effective dose75, suggesting that the effect was more likely to be post-synaptic. Similarly, in the same effective dose group, we found some evidence that amisulpride reduces response stickiness indicating increased switching between actions. This is well in line with a prominent model of the cortico-striatal circuitry implicating post-synaptic D2 receptors in exploration/exploitation65 and supported by empirical data. In animal studies, activation of D2 receptors was shown to lead to choice perseverance and more deterministic behaviour, whereas D2 receptor inhibition increases the probability of performing competing actions and increases randomness in action selection76. In humans, a recent neurochemical imaging study showed that D2 receptor availability in the striatum correlated with choice uncertainty parameters across both reinforcement learning and active inference computational modelling frameworks77. Increased choice uncertainty was also observed in a social and non-social learning tasks in a study using 800 mg of sulpiride, a dose that is known to exert post-synaptic effects54,78. We note, however, that the evidence for the difference in exploration between the low and high serum groups was not robust (p=0.066). Furthermore, it has been suggested that increased striatal dopamine is also related to tendency for stochastic, undirected exploration79,80, arising due to overall uncertainty across available options79 or through increasing the opportunity cost of choosing the wrong option68,71. This suggests that the same biological signature that leads to increased cognitive effort expenditure also promotes choice exploration. In line with this, both prior studies that investigated the effect of increasing dopamine availability with L-DOPA on model-based/model-free behaviour observed increase choice exploration as well as increased model-based behaviour (although in one it was only present in individuals with a higher working memory capacity)55,58.”

      With regards to the design, it is unfortunate that the order of drug administration is not counterbalanced. As far as I understand, model-based control is always measured without a drug in the first session, and then with the drug (or placebo) in the second. The change between sessions is then tested for all three conditions. Of course, it is possible that the increase in model-based control in the amisulpride condition is only driven by the drug. However, given the lack of counterbalancing, it's also possible that amisulpride increases model-based control only after the experience with the task. That is, if the authors had counterbalanced the drug effect, they may have found that amisulpride had a different effect if it was administered in the first session. That would have changed their interpretation quite a bit! As it stands, they are unable to verify their (admittedly simpler) hypothesis that there is only a main effect.

      We thank the reviewer for this comment. Indeed, a full within-subject design would have been statistically more powerful and would have enabled us to exclude the possibility that amisulpride’s effect on model-based behaviour is indirect. We have now included the following paragraph in the discussion that aims to highlight the limitation of not counterbalancing the drug administration (p.10):

      “One of the strengths of our design is a baseline measure, and the fact that the participants were all introduced to the task under no administration, thus avoiding potential effects of the treatment on task training. Although this design allowed to reduce between-subjects variability, we cannot completely exclude order effects. Although unlikely, it is possible that the effects of the treatment that we observe come indirectly from the effects of the two drugs on either skill transfer from the previous session, or simply on the effect of the drugs on the part of the experiment that preceded the task. For instance, participants under amisulpride could be less tired from other tasks and therefore more willing to exert effort in the task presented here. Speaking against this is the observation that we found no differences in mood between amisulpride and placebo regardless of low or high serum levels.”

    1. Author Response

      Reviewer #1 (Public Review):

      This study presents a series of experiments that investigate maternal control over egg size in honey bees (Apis mellifera). Honey bees are social insects in which a single reproductive female (the queen) lays all the eggs in the colony. The first set of experiments presented here explore how queens change their egg size in response to changes in colony size. Specifically, they show that queens have relatively larger eggs in smaller colonies, and that egg size changes when queens are transplanted into colonies of a different size (i.e. confirming that egg size is a plastic trait in honey bee queens). The second set of experiments investigates candidate genes involved in egg size determination. Specifically, it shows that Rho1 plays a role in determining egg size in honey bee queens.

      In principle, we agree with this summary, although we find the experimental demonstration that perceived colony size affects egg size (first set of experiments) and the overall proteomic comparison of ovaries that produce small and large eggs (second set of experiments that indicate the upregulation of metabolism, protein transport, cytoskeleton organization, and a few other processes in large egg-producing ovaries) also important.

      A strength of the study is that it combines both manipulative field (apiary) experiments and molecular studies, and therefore attempts to consider broadly the mechanisms of plasticity in egg size. The link between these two types of dataset in the manuscript, however, is not strong. While the two parts are related, the molecular experiments do not follow from the conclusions of the field experiments but rather run in parallel (both using the same initial treatments of queens from large v small colonies).

      We would welcome suggestions on how to further strengthen the integration between the field experiments and our molecular studies. We sought to explore the molecular basis of the observed plasticity in reproductive behavior and thus focused on samples from the first set of experiments for our proteome comparisons, realizing that every additional field experiment could have entail a similar molecular follow-up. We attempted to bring molecular studies and field experiments back together with the RNAi-mediated knock-down of Rho1 in queens that produce eggs in differently-sized colonies under realistic apicultural conditions. There may be better, additional opportunities for a closer integration of molecular and field experiments, but we could not conceive of them.

      Another strength of the study is the focus on social cues for egg size control in a social insect. Particularly interesting is data showing that queens suddenly exposed to the cues of a larger colony (even where egg-laying opportunities did not actually increase) will decrease their egg size, in the same way as queens genuinely transplanted to larger colonies. That honey bee queens can control their egg sizes in response to cues in the colony is not unexpected, given that queens are known to vary egg size based on the cell type they are laying into (queen, drone or worker cell). Nevertheless, it is interesting to show that worker egg sizes over time are also mediated by social cues.

      We thank the reviewer for this positive assessment and want to highlight that this experiment not only controls for egg laying opportunities, but also for potentially greater resource availability in larger colonies. These results are therefore important for the key argument that egg size is actively regulated by honey bee queens.

      A weakness of the study is that the consequence of egg size on egg development and survival in honey bees is not made clear. The assumption is that larger egg size compensates for smaller colonies in some way. Do smaller eggs (i.e. those laid in large colonies) fare worse in smaller colonies than they do in large colonies? Showing that the variation in egg size is biologically relevant to fitness is an important piece of the puzzle.

      We agree that the consequences of egg size variation are important to address beyond our previously published data set and the benefits demonstrated in other contexts by other authors. However, to comprehensively resolve the consequences requires considerable additional experiments that exceed the scope of our current study, which is primarily focused on the causes of the queens’ reproductive plasticity.

      Also, the relationship between egg number and egg size in honey bees remains rather murky. Does egg size depend at least in part on daily egg laying rate (which is sure to be greater in larger colonies)? The study makes an effort to explore this by preventing queens from laying for two weeks and then comparing their egg size when they resume to those that did not have a pause in laying. Although egg size did not vary between the groups in this case, it is unclear whether the same effect would be seen if queens had simply been restricted from laying at such high rates (e.g. if available empty brood cells had been reduced rather than removed entirely).

      We agree that the relation between egg number and egg size is complicated. We have added more data that show that egg laying rates can be higher in larger colonies than in smaller colonies. We also report now that the egg size is negatively correlated to egg number, although not in all instances, which partially supports (and partially contradicts) our previous findings (Amiri et al. 2020). We have modified the discussion of our results to account for the additional results and point out the limitation of the experiment with caged queens. It is important to realize though that the queens were caged on comb and not restricted in typical, small queen cages that are used for queen transport. It is not clear whether this treatment resulted in a downregulation of the reproductive efforts and/or the resorption of eggs.

      Overall this study makes new contributions to our understanding of maternal control over egg size in honey bees. It provides stepping stones for further investigation of the molecular basis for egg size plasticity in insects.

      We agree that we could not resolve everything in this study and that more investigations are needed.

      Reviewer #2 (Public Review):

      This paper builds on recent work showing that honeybee queens can change the size of the eggs they lay over the course of their life. Here the authors identified an environmental condition that reversibly causes queens to change their egg sizes: namely, being in a relatively small or large colony context. Recently published work demonstrated the existence of this egg size plasticity, but it was completely unknown what signaled to the queen. In a series of simple and elegant experiments they confirmed the existence of this egg size plasticity, and narrowed down the set of environmental inputs to the queen that could be responsible for signaling the change in the environment. They also began the work of identifying genes and proteins that might be involved in controlling egg size. They did a comparative proteomic analysis between small-egg-laying ovaries and large-egg-laying ovaries, and then selected one candidate gene (Rho1). They showed that it is expressed during oogenesis, and that when it is knocked down, eggs get smaller.

      This is a good summary, although we think that it is fair to add that the expression of Rho1 is specific to the egg growth stage, and that we found an almost perfect correlation of Rho1 mRNA levels and egg size in two separate experiments (in addition the difference between large and small egg-producing ovaries at the protein level).

      The experiments on honeybee colonies are well-designed, and they provide fairly strong evidence that the queens are reversibly changing egg size and that it is (at least some component of) their perception of colony size that is the signal. One minor but unavoidable weakness is that experiments on honeybees are necessarily done with small sample sizes. The authors were clear about this, however, and it was very effective that they showed all individual data points. Alongside the previous work on which this paper builds, I found their core results to be rather convincing and important.

      We thank the reviewer for this positive evaluation.

      I found the parts of the paper on oogenesis to be useful, but overall less informative in answering the questions that the authors set out for those sections. On balance, I think the best way to interpret the oogenesis results is as "suggestive and exploratory". For instance, the experiment aimed at understanding the relationship between egg-laying rate and egg size does not include a direct measurement of egg-laying rate, but instead puts queens in a place with no suitable oviposition sites. The proteomic analysis was fine, but since they were using whole ovaries, with tissue pooled across all stages of oogenesis including mature oocytes, I would be cautious in interpreting the results to mean that they had identified proteins involved in making larger eggs. These proteins might just as easily be the proteins that are put into larger eggs. In fact, for the one candidate gene that is examined, its transcripts seem as though they are predominantly in the oocyte cell itself rather than in the supporting cells that actually control the egg size (although it is hard to tell from the micrographs without a label for cell interfaces).

      We have added data on the number of eggs produced in the first experiment, which actually show a negative correlation between egg size and egg number. In addition, we have cautioned our wording about the conclusions that can be drawn from the oviposition restriction experiment. Concerning the expression and role of Rho1, we apologize for the lack of a cell membrane marker. However, we share the reviewer’s interpretation that the mRNA is located in the oocyte. While we also agree that egg loading from the nurse cells is important, transport of vitellogenin from the follicle cells may also be quite significant for egg size (Wu et al. 2021 – doi:10.3389/fcell.2020.593613 and Fleig 1995 - doi:10.1016/0020-7322(95)98841-Z), a process that could be controlled by Rho1 in the documented location. We have added to the discussion to clarify this point.

      On that note, with the caveat that the sample sizes are quite small, I agree that there is some evidence that Rho1 is involved in honeybee oogenesis. If this was the only gene they knocked down, and given that it results in a small size change with such a small sample size, it strikes me as a bit of a stretch to say that these results are evidence that Rho1 plays an important role in egg size determination. It is essential to know if this is a generic result of inhibiting cytoskeletal function or a specific function of Rho1. That is beyond the scope of this study, but until those experiments are done, it is hard to know how to interpret these results. For context, in Drosophila, there are lots and lots of genes such that if you knock them down, you get a smaller or differently shaped egg, including genes involved in planar polarity, cytoskeleton, basement membrane, protrusion/motility, septate junctions, intercellular signaling and their signal transduction components, muscle functions, insect hormones, vitellogenesis, etc. This is helpful, perhaps, for thinking about how to interpret the knockdown of just one gene.

      We thank the reviewer for this perspective and have consequently cautioned our wording. The role of Rho1 in regulating the cytoskeletal function has been established in other organisms, but we do not have the tools to study the corresponding pathways and establish causality in honey bees. We have added to the discussion to alert the reader to the point that additional experiments are necessary.

      Overall, I found the results to be technically sound, and there are several clever manipulations on honeybee colonies that will doubtless be repeated and elaborated in the future to great effect. The core result-that queens can change the size of their eggs quickly and reversibly, in response to some perceived signal-was honestly pretty astonishing to me, and it reveals that there are non-nutritive plastic mechanisms in insect oogenesis that we had no idea existed. I look forward to follow-up studies with interest.

      We thank the reviewer for the overall evaluation and encouragement to continue our research.

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, the authors performed single-cell RNA sequencing (scRNA-seq) analysis on bone marrow CD34+ cells from young and old healthy donors to understand the age-dependent cellular and molecular alterations during human hematopoiesis. Using a logistic regression classifier trained on young healthy donors, they identified cell-type composition changes in old donors, including an expansion of hematopoietic stem cells (HSCs) and a reduction of committed lymphoid and myeloid lineages. They also identified cell-type-specific molecular alterations between young and old donors and age-associated changes in differentiation trajectories and gene regulatory networks (GRNs). Furthermore, by comparing the single-cell atlas of normal hematopoiesis with that of myelodysplastic syndrome (MDS), they characterized cellular and molecular perturbations affecting normal hematopoiesis in MDS.

      The present manuscript provides a valuable single-cell transcriptomic resource to understand normal hematopoiesis in humans and the age-dependent cellular and molecular alterations. However, their main claims are not well supported by the data presented. All results were based on computational predictions, not experimentally validated.

      Major points:

      1) The authors constructed a regularized logistic regression trained on young donors with manually annotated cell types and predicted cell type labels of cells from old and MDS samples. As the manual annotation of cell types was implicitly assumed as ground truth in this manuscript, I'm wondering whether the predicted cell types in old and MDS samples are consistent with the manual annotation. They should apply the same strategy used in young samples for manual annotation to old and MDS samples, and evaluate how accurate their classifier is.

      We performed manual annotation for each MDS sample independently, and for the 3 healthy elderly donors integrated dataset. To do so, we performed unsupervised clustering with Seurat and annotated the clusters using the same set of canonical marker genes that we used for the young data. We then analyzed the correspondences between the annotated clusters and the predictions by GLMnet. Results are shown on Figure 1a. We observe that the biggest disagreements between methods occur between adjacent identities, such as HSC and LMPP, GMP and GMP with more prominent granulocytes profile, or MEP, early and late erythroid. When we explore these disagreements along the erythroid branch, we see that they particularly occur close to the border between subpopulations (Figure 1b). This is consistent with the continuous nature of the differentiation and the difficulty to establish boundaries between cell compartments. However, we observe that miss-labeling between different hematopoietic lineages is rare.

      In addition, unsupervised clustering was not always able to directly separate the data in the expected subpopulations. We can see different clusters containing the same cell types (e.g. LMPP1, LMPP2), as well as individual clusters containing cells with different identities (e.g. pDC and monocyte progenitors). This is usually due to sources of variability different to cell identity present in the data Additional, supervised finetuning by local sub clustering and merging would be needed to correct for this. On the contrary, we believe that our GLMnet-based method focusses on gene expression related to identity, resulting in a classification that is better suited for our purpose.

      Figure 1 Comparison between GLMnet predictions and manually annotated clusters A) Heatmaps showing percentages of cells in manually annotated clusters (columns) that have been assigned to each of the cell identities predicted by our GLMnet classification method (rows). The analysis was performed independently for the elderly integrated dataset and for every MDS sample. B) UMAP plots showing disagreements in classification between adjacent cell compartments in the erythroid branch. Cells from one erythroid cluster per patient are colored by the identity assigned by the GLMnet classifier. Cells in gray are not in the highlighted cluster, nor labeled as MEP, erythroid early or erythroid late by our classifier.

      2) The cell-type composition changes in Figures 1 and 4 were descriptively presented without providing the statistical significance of the changes. In addition, the age-dependent cell-type composition changes should be validated by flow cytometry.

      We thank the reviewer for the comment. Significance of the changes is included in Supplementary File 3. In addition, we included the percentage of several cell types we validated by flow cytometry, namely HSCs, GMPs and MEPs, in young and elderly healthy individuals in the manuscript, as Figure 1-figure supplement 3. Similarly to what we detected in our bioinformatic analyses, flow cytometry data demonstrated a significant increase in the percentage of HSCs, as well as an increasing trend in MEPs and a slight decrease in the percentage of GMPs in elderly individuals, corroborating our previous results.

      3) In Figure 2, the authors used two different pseudo-time inference methods, STREAM, and Palantir. It is not clear why they used two different methods for trajectory inference. Do they provide the same differentiation trajectories? How robust are the results of trajectory inference algorithms? It seems to be inconsistent that the pseudotime inferred by STREAM was not used for downstream analysis and the new pseudotime was recalculated by using Palantir.

      We thank the reviewer for the comment. The reason behind using two different methods to perform similar analyses, is that each of them provides specific outputs that can be used to perform a more robust and comprehensive analysis. STREAM allows to unravel the differentiation trajectories in a single cell dataset with an unsupervised approach. Also the visualization provided by STREAM (Figure 2C and 2D) allows for a simple interpretation of the results to the reader. On the other hand, Palantir provides a more robust analysis to dissect how gene expression dynamics interact and change with differentiation trajectories. For this reason, we decided to use this second method to investigate how specific genes were altered in the monocytic compartment.

      As a resource article, the showcase of different methods can be valuable as it provides examples on how each tool can be used to obtain specific results, which can help any reader to decide which might be the best tool for their specific case.

      Just to confirm that pseudotime results are similar, we perform a correlation analysis with the pseudotime values obtained from each method. We observed a correlation coefficient of 0.78 (p.val < 2.2e-16) confirming the similarity among both tools.

      Figure 2. Correlation analysis of pseudotime values obtained with STREAM and PALANTIR.

      4) In Figure 2D, some HSCs seem to be committed to the erythroid lineage. The authors should carefully examine whether these HSCs are genuinely HSCS, not early erythroid progenitors.

      We thank the reviewer for the comment. We have performed a deep analysis regarding the classification of HSCs (See Figure 3). Our analyses reveal that none of the cells classified as HSCs express early erythroid progenitor markers. We have also used STREAM to show the expression of these markers along the obtained trajectory and observed that erythroid markers show expression in the erythroid trajectory but not in the HSC compartment (Figure 4).

      Figure 3 Expression of marker genes in the HSC compartment. Dot plot depicting the normalized scaled expression of canonical marker genes by HSC of the 5 young and 3 elderly healthy donors. Marker genes are colored by the cell population they characterize. Dot color represents expression levels, and dot size represents the percentage of cells that express a gene.

      Figure 4. Expression of erythroid markers in STREAM trajectories. Expression of GATA1 and HBB (erythroid markers) in the predicted differentiation trajectories.

      5) It is not clear how the authors draw a conclusion from Figure 3D that the number of common targets between transcription factors is reduced. Some quantifications should be provided.

      We thank the reviewer for the comment. We have updated the manuscript to better reflect our findings and emphasize that the predicted regulatory networks of HSCs in elderly donors is displayed as an independent network, compared to the young donors. (Page 6, line 36).

      “Overall, we observed that the predicted regulatory network of elderly HSCs (Figure 3d) appeared as an independent network compared to the young GRN. This finding could result in the loss of co-regulatory mechanisms in the elderly donors.”

      6) The constructed GRNs and related descriptions were based solely on the SCENIC analysis. By providing the results of an orthogonal prediction method for GRNs, the authors should evaluate how robust and consistent their predictions are.

      We thank the reviewer for the comment regarding the method to build gene regulatory networks. As a resource article, our manuscript describes a complete workflow to perform different aspects of single cell analyses. These steps go from automated classification, trajectory inference and GRN prediction. All the selected algorithms have already been benchmarked and compared against other tools that perform similar analysis. SCENIC has already been benchmarked against other algorithms (11) and by others (12).

      We do agree with the reviewer that these new predictions could provide strength to our findings, however we believe that these orthogonal predictions would better fit if our article was intended for the Research Article category instead of Tools and Resources.

      7) The observed age-dependent cellular and molecular alterations in human hematopoiesis are interesting, but I'm wondering whether the observed alterations are driven by inflammatory microenvironment or intrinsic properties of a subpopulation of HSCs affected by clonal hematopoiesis (CH). To address this, the authors can perform genotyping of transcriptomes (GoT) on old healthy donors with CH. By comparing the transcriptomes of cells with and without CH mutations, we can evaluate the effects of CH on age-associated molecular alterations.

      We thank the reviewer for the comment. Unfortunately, in order to perform GoT (genotyping of transcriptomes) on the healthy donors, requires modifying the standard 10x Genomics workflow to amplify the targeted locus and transcript of interest. This would require collecting new samples, optimizing the method and performing new analysis from scratch (from sequencing up to analysis). We believe this is not in the scope of the manuscript. On the other hand, we don’t have enough material to create new single cell libraries, this fact would require the addition of new donors and as a result, a complete new analysis to perform the integration.

      Reviewer #3 (Public Review):

      The authors have performed a transcriptional analysis of young/aged hematopoietic stem/progenitor cells which were obtained from normal individuals and those with MDS.

      The authors generated an important and valuable dataset that will be of considerable benefit to the field. However, the data appear to be over-interpreted at times (for example, GSEA analysis does not have "functionality", as the authors claim). On the other hand, a comparison between normal-aged HSC and HSC from MDS patients appears to be under-explored in trying to understand how this disease (which is more common in the elderly) disrupts HSC function.

      A more extensive cross-referencing of other normal HSPC/MDS HSCP datasets from aged humans would have been helpful to highlight the usefulness of the analytical tools that the authors have generated.

      Major points

      1) The authors detail methodology for identification of cell types from single-cell data - GLMnet. This portion of the text needs to be clarified as it is not immediately clear what it is or how it's being used. It also needs to be explained by what metric the classifier "performed better among progenitor cell types" and why this apparent advantage was sufficient to use it for the subsequent analysis. This is critical since interpretation of the data that follows depends on the validation of GLMnet as a reliable tool.

      We thank the review for the comment. We have updated the corresponding section to better describe how GLMnet is used and that the reasoning on why we decided to use GLMnet as our cell type annotation method instead of other available tools such as Seurat, is based on the results of the benchmark described in Figure 1-figure supplement 1. We also described the main differences between our method and Seurat (See Answer to Review 1, Question # 4).

      2) The finding of an increased number of erythroid progenitors and decreased number of myeloid cells in aged HPSC is surprising since aging is known to be associated with anemia and myeloid bias. Given that the initial validation of GLMnet is insufficiently described, this result raises concerns about the method. Along the same lines, the authors report that their tool detects a reduced frequency of monocyte progenitors. How does this finding correlate with the published data on aging humans? Is monocytopenia a feature of normal aging?

      We thank the reviewer for this comment, as changes in the output of HSCs as a consequence of aging are of high interest. According to the literature, there is clear evidence of the loss of lymphoid progeny with age (13,14), which goes in agreement with our results. However, in the case of the myeloid compartment, the effects of aging are not as clear. Studies in mice have indeed observed that the loss of lymphoid cells is accompanied by increased myeloid output, starting at the level of GMPs (Rossi et al. 2005; Florian et al. 2012; Min et al. 2006). But studies on human individuals have not found changes in numbers of these myeloid progenitors (Kuranda et al. 2011; Pang et al. 2011). In addition, in the mentioned studies, myeloid production was measured exclusively by its white blood cells fraction. More recent studies have focused on the other myeloid compartments: megakaryocyte and erythroid cells. Results point towards the increase of platelet-biased HSC with age (Sanjuan-Pla et al. 2013; Grover et al. 2016) and a possible expansion of megakaryocytic and erythroid progenitor populations (Yamamoto et al. 2018; Poscablo et al. 2021; Rundberg Nilsson et al. 2016), which may represent a compensatory mechanism for the ineffective differentiation towards this lineage in elderly individuals. This goes in line with the accumulation of MEPs we see in our data. Finally, and in accordance with the reduced frequency of monocyte progenitors observed, it has been shown that with increasing age, there is a gradual decline in the monocyte count (15).

      Regarding the concerns about our classification method raised by the reviewer, we have performed additional validations that we describe in answers to reviewer 1 comment #4 and reviewer 2 comment #1. To further confirm that the changes in cellular proportions we found are real, we applied two additional classification methods: Seurat transfer and Celltypist (16) to the elderly donors dataset. We obtained a similar expansion in MEPs, together with reduction of monocytic progenitors with the three methods (Figure 5).

      Figure 5 Classification of HSPCs from elderly donors. Barplot showing proportions of every cell subpopulation per elderly donor, resulting from three classification methods: GLMnet-based classifier, Seurat transfer and Celltypist. For the three methods, cells with prediction scores < 0,5 were labeled as “not assigned”.

      3) The use of terminology requires more clarity in order to better understand what kind of comparison has been performed, i.e. whether global transcriptional profiles are being compared, or those of specific subset populations. Also, the young/aged comparisons are often unclear, i.e. it's not evident whether the authors are referring to genes upregulated in aged HSC and downregulated in young HSC or vice versa. A more consistent data description would make the paper much easier to read.

      We thank the reviewer for this comment. We have updated the manuscript to provide more clarity in the description of the different comparisons made in our analyses. Most changes are located in the Transcriptional profiling of human young and elderly hematopoietic progenitor systems sub-section within the Results.

      4) The link between aging and MDS is not explored but could be an informative use of the data that the authors have generated. For example, anemia is a feature of both aging and MDS whereas neutropenia and thrombocytopenia only occur in MDS. Are there any specific pathways governing myeloid/platelet development that are only affected in MDS?

      Thank you for raising this comment. We believe that discriminating events that take place during healthy aging from those associated to MDS will be helpful to understand this particular disease, as it is so closely related to age. This is why, when analyzing MDS, we have considered young and elderly donors as two separate sets of healthy controls, the eldery donors being the most suitable one for comparisons with MDS samples.

      With regards to the comment on myeloid and platelet development, the GSEA analysis gives potentially useful information. MYC targets and oxidative phosphorylation are significantly enriched in the MEP compartment from MDS patients when compared to elderly donors, indicating that these progenitors may recover a more active profile with the disease. Hypoxia related genes, on the other hand, are more active in HSCs and MEPs from healthy elderly donors than in MDS. Hypoxia is known to be implicated in megakaryocyte and erythroid differentiation (17)

      5) MDS is a very heterogeneous disorder and while the authors did specify that they were using samples from MDS with multilineage dysplasia, more clinical details (blood counts, cytogenetics, mutational status) are needed to be able to interpret the data.

      We thank the reviewer for the comment. All the clinical details for each MDS patient are included in Supplementary File 5.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Sims et al. evaluate how system-level brain functional connectivity is associated with cognitive abilities in a sample of older adults aged > 85 years old. Because the study sample of 146 normal older adults has lived into advanced years of age, the novelty here is the opportunity to validate brain-behavioral associations in aging with a reduced concern of the potential influence of undetected incipient neuropsychological pathology. The participants afforded resting-state functional magnetic resonance imaging (rs-fMRI) data as well as behavioral neuropsychological test assessments of various cognitive abilities. Exploratory factor analysis was applied on the behavioral cognitive assessments to arrive at summary measures of participant ability in five cognitive domains including processing speed, executive functioning, episodic memory, working memory, and language. rsfMRI data were submitted to a graph-theoretic approach that derived underlying functional nodes in brain activity, the membership of these nodes in brain network systems, and indices characterizing the organizational properties of these brain networks. The study applies the classification of the various brain networks into a sensory/motor system of networks and an association system of network, with further sub-systems in the latter that includes the frontoparietal network (FPN), the default-mode network (DMN), the cingulo-opercular network (CON), and the dorsal (DA) and ventral (VA) attention networks. Amongst other graph metrics, the study focused on the extent to which networks in these brain systems were segregated (i.e., separable network communities as opposed to a more singular large community network). Evaluation of the brain network segregation indices and cognitive performance metrics showed that in general higher network functional segregation corresponds with higher cognitive performance ability. In particular, this association was seen between the general association system with overall cognition, and the FPN with overall cognition, and processing speed.

      The results worthy of highlighting include the documentation of oldest-old individuals with detectable brain neural network segregation at the level of the association system and its FPN sub-system and the association of this brain functional state notably with general cognition and processing speed and less so with the other specific cognitive domains (such as memory). This finding suggests that (a) apparently better cognitive aging might stem from a specific level of neural network functional segregation, and (b) this linkage applies more specifically to the FPN and processing speed. These specific findings inform the broader conceptual perspective of how human brain aging that is normative vs. that which is pathological might be distinguished.

      We appreciate this comment and we have added these points to the conclusion more explicitly.

      To show the above result, this study defined functional networks that were driven more by the sample data as opposed to a pre-existing generic template. This approach involves a watershed algorithm to obtain functional connectivity boundary maps in which the boundary brain image voxels separate functionally related voxels from unrelated voxels by virtue of their functional covariance as measured in the immediate data. This is also a notable objective and data-driven approach towards defining functional brain regions-of-interest (ROIs), nodes, and networks that are age-appropriate and configured for a given dataset as opposed to using network definitions based on other datasets used as a generic template.

      The sample size of 146 for this age group is generally sufficient.

      For the analyses considering the significance of the effect of the brain network metrics on the cognitive variables, the usage of heirarchical regression to evaluate whether the additional variables (in the full model) significantly change the model fit relative to the reduced model with covariates-only (data collection site, cortical thickness), while a possible approach, might be problematic, particularly when the full model uses many more regressors than the reduced model. In general, adding more variables to regression models reduces the residual variance. As such, it is possible that adding more regressors in a full model and comparing that to a reduced model with much fewer regressors would yield significant changes in the R^2 fit index, even if the added regressors are not meaningfully modulating the dependent variable. This may not be an issue for the finding on the FPN segregation effect on overall cognition, but it may be important in interpreting the finding on the association system metrics on overall cognition.

      Critically, we should note that the correlation effect sizes (justified by the 0.23 value based on the reported power analyses) were all rather small in size. The largest key brain-behavior correlation effect was 0.273 (between DMN segregation and Processing Speed). In the broader perspective, such effects sizes generally suggest that the contribution of this factor is minimal and one should be careful that the results should be understood in this context.

      The recent, highly publicized paper from Marek and colleagues (citation below) offers some support for the assertion that these effect sizes are on the order that would be expected for ‘true’ signals in the brain. While the study reported here is not a “BWAS” as described in the Marek article (BWAS is a brain-wide association study, examining, without a priori hypotheses of brain network, all possible associations), and therefore our study does not fall prey to some of the multiple comparisons issues described in that paper, the general expected effect sizes based on that paper should be relevant here.

      Marek and colleagues suggest that 1) effect sizes in the range of 0.273 are on the order of the larger brain-behavior relationships that can be expected to be replicable, and 2) samples that remove some drivers of individual variability are beneficial to the capability of a study to identify an effect. Relevant to the latter point, by reducing the variability in our sample due to age (our age range is tight) and early signs of neurological disease (these were screened out in our sample), this leaves a sample that is homogeneous along these variables, meaning that brain variability associated with cognitive performance can be more easily pulled from the data.

      Our data have large variability on the behavior end, and large variability on the brain end, allowing better power for seeing effects between them.

      Marek, S., Tervo-Clemmens, B., Calabro, F.J., Montez, D.F., Kay, B.P., Hatoum, A.S., Donohue, M.R., Foran, W., Miller, R.L., Hendrickson, T.J., et al. (2022). Reproducible brain-wide association studies require thousands of individuals. Nature 603, 654–660. 10.1038/s41586-022-04492-9.

      Overall, the findings based on hierarchical regressions that evaluate the network segregation indices in accounting for cognition and the small correlation magnitudes are basically in line with the notion that more segregated neural networks in the oldest-old support better cognitive performance (particularly processing speed). However, the level of positive support for the notion based on these findings is somewhat moderate and requires further study.

      The addition of a control analysis (sensorimotor network) in the newer version of the paper showed that these effects are not present in brain networks not thought to relate to cognition. We agree that further study of these questions is necessary for stronger claims to be made, but the current study advances the field by showing clearly that segregation of the association network and its components relates to behavior even in this oldest old cohort.

      Reviewer #2 (Public Review):

      The authors capitalised on the opportunity to obtain functional brain imaging data and cognitive performance from a group of oldest old with normative cognitive ability and no severe neurophysiological disorders, arguing that these individuals would be most qualified as having accomplished 'healthy ageing'. Combined with the derivation of a cohort-specific brain parcellation atlas, the authors demonstrated the importance of maintaining brain network segregation for normative cognition ability, especially processing speed, even at such late stage of life. In particular, segregation of the frontoparietal network (FPN) was found to be the key network property.

      These results bolstered the findings from studies using younger old participants and are in agreement with the current understanding of the connectomme-cognition relationship. The inclusion of a modest sample size, power analysis, cohort-specific atlas, and a pretty comprehension neuropsychological assessment battery provides optimism that the observed importance of FPN segregation would be a robust and generalisable finding at least in future cross-sectional studies. The fact that FPN segregation is relatively more important to cognition than other associative networks also provides novel insight about the possible 'hierarchy' between age-related neural and cognitive changes, regardless of what mechanisms lead to such segregation at such an advanced age. it is also interesting that processing speed remains to be the 'hallmark' metric of age-related cognitive changes, indirectly speaking to its long assumption fundamental impact on overall cognition.

      As laid out by the authors, if network differentiation is key to normative cognitive ability at old age, intervention and stimulation programs that could maintain or boost network segregation would have high translational value. With advent in mobile self-administrable devices that target behavioural and neural modifications, this potential would have increasing appeal.

      However, I feel that a few things have prevented the manuscript to be a simple yet impactful submission

      1) Interpretation and the major theme of discussion. While the authors attempted to discuss their findings with respect to both the compensatory and network dedifferentiation hypotheses, the results and their interpretation do not readily provide any resolution or reconciliation between the two, a common challenge in many ageing research. The authors did not further elaborate how the special cohort they had may provide further insights to this.

      While the results certainly are in line with the dedifferentiation hypothesis, why 'this finding does not exclude the compensation hypothesis' (Discussion) was not elaborated enough. In particular, the authors seemed to suggest that maintained network specialisation may be in such a role, but the results and interpretations regarding network specialisation were not particularly focused on throughout the manuscript. In addition, both up regulation within a network and cross-network recruitment can both be potential compensatory strategies (Cabeza et al 2018, Rev Nat Neurosci). Without longitudinal data or other designs (e.g. task) it is quite difficult to evaluate the involvement of compensation. For instance, as rightly suggested by the authors, the two phenomena may not be mutually exclusive (e.g., maintenance of the FPN differentiation at such old age could be a result of 'compensation' that started when the participants were younger).

      The reviewer makes some excellent points that we have taken to heart in this revision. We agree that the data as described do not directly address the compensation hypothesis, and therefore de-emphasized our descriptions of that hypothesis in service of a simpler, more impactful manuscript.

      As described above in our response to the essential revisions “In the original submission, we noted relevant literature which describe both the dedifferentiation hypothesis and the compensation hypothesis of aging. Our original aim was to include more of a literature review of cognitive aging theories in the introduction and discussion, but that choice made it too confusing (and honestly left out much important literature). In responding to the reviews we realize that bypassing this cursory literature review here is preferable for the readability of the manuscript. Instead, we cite a literature review, and focus on the dedifferentiation hypothesis.

      “The data we show here addresses the dedifferentiation hypothesis specifically since we are using the segregation metric- a reflection of dedifferentiation of network organization. The reviewers’ comments caused us to do a great deal of thinking on this topic, and we have a forthcoming review with our colleague Ian McDonough that covers this topic in more detail (McDonough, Nolin, and Visscher, 2022). We have substantially rewritten the relevant sections in the discussion (especially section 3.2) to be more clear for readers.”

      As also described in our response to essential revisions 2c, we have added to the discussion regarding the utility of studying the oldest-old; this is in the second paragraph of the discussion, and reproduced above. Additionally, in the Introduction, we also briefly address the importance of this cohort. We state “Prior work has mostly been done in younger-old samples (largely 65-85 years old). Studying the younger-old can be confounded by including pre-symptomatic disease, since it is unknown which individuals may be experiencing undetectable, pre-clinical cognitive disorders and which will continue to be cognitively healthy for another decade. The cognitively unimpaired oldest-old have lived into late ages, and we can be more confident in determining their status as successful agers. A further benefit of studying these successful cognitive agers is that because of their advanced age and the normal aging and plasticity processes associated with it, there is greater variance in both their performance on neurocognitive tasks, and in brain connectivity measures than there is in younger cohorts (Christensen et al., 1994). This increased variance makes it easier to observe across-subject relationships of cognition and brain networks (Gratton, Nelson, & Gordon, 2022). We provide new insight into the relationship between the segregation of networks and cognition by investigating this relationship in an oldest old cohort of healthy individuals.”

      2) Some further clarity about the data and statistical analyses would be desirable. First, since scan length determines the stability of functional connectivity, how long was the resting-state scan? Second, what is the purpose of using both hierarchical regression and partial correlation? While they do consider different variances in the dataset, they are quite similar and the decision looks quite redundant to me as not much further insights have been gained. [the main insight to including a regression is to be able to compare the different networks to each other.]

      The resting-state fMRI scan is 8 minutes in length. This has been added to the text. After considering the redundancy the reviewer notes between hierarchical regression and correlations, we have simplified our statistical approach and only included correlations in the main body of the manuscript. We have put the regressions in the supplemental materials so if interested readers would like to be able to see those results, they are still available.

    1. Author Response

      Reviewer #2 (Public Review):

      Zhukin et al., present the structure of the central scaffold component of the NuA4 complex. They hypothesise how the nucleosome interacting modules not present in the structure could be arranged, based on Alphafold modelling, and comparison of their structure to other complexes that use the same subunits. They show some interesting -albeit fairly preliminary - biochemistry on the binding of the flexible modules, suggesting a role for acetylation affecting H3K4me3 reading.

      While the work builds upon previous structural studies on the Tra1 subunit in isolation and a previous 4.7A resolution structure from another group, there are clear differences and novel findings in this study. The data is presented beautifully and nicely annotated figures make following the many subunits and interactions therein simple. What could have been a very complex manuscript is easy to digest. Some of the figures could do with a couple of additional labels and detailed figure legends to make things a little clearer.

      Overall, a nice study and a wonderfully detailed structure of a large multi-subunit assembly but we would recommend some further experimentation validation to bolster their findings.

      Major comments

      1) All 13 subunits of NuA4 are present by mass spec, however, based on the SDS-page gel (Fig1-1) components of the TINTIN sub-complex seem less than stoichiometric, with Eaf7 and Eaf3 certainly much weaker stained. This is particularly important with reference to Figure 3 and the discussion in the text which assumes the nucleosome interacting modules are all present equally, but too flexible to be observed in the structure.

      Simple peptide numbers from mass spec cannot be used as a measure of protein abundance as this is sensitive to multiple confounding factors.

      We did not identify the locations of individual modules (HAT, TINTIN and Yaf9) within the diffuse density, we merely indicate that this is a likely location for their presents based on the location of connections points and presence of crosslinks in previously published data. We did perform mass photometry analysis of the purified NuA4 sample to better determine the composition of the purified complex (Figure 1-1). We find that the major species peak is center at 1037 kDa, which is very close to the theoretical mass of 1043 kDa. There are a few other minor peaks but none of this would indicate a NuA4 complex lacking TINTIN (Eaf3,5,7) or any other distinct subcomplex.

      2) A major novel biological finding and conclusion from the abstract concerns the binding to modified nucleosomes. However, this seemed somewhat preliminary, especially considering the discussion around the role of acetylation affecting binding to H3K4me3 nucleosomes based solely on the dCypher screen used.

      The discussion on the role of HAT module binding preferential to acetylated and methylated tails concludes that the acetylation liberates the H3 tail from DNA interaction, making H3K4me3 more available for binding by the PHD domain. This is an interesting hypothesis but is stated as fact with very little evidence to make this assertion.

      Whilst others have seen similar results (cited in the paper), no data is presented to disregard an alternative hypothesis that there is some additional acetyl-binding activity in the complex. Indeed, in one of the references they cite the authors do show a direct reading of acetylation as well as methylation.

      TINTIN binding is subject to high background and a fairly minor effect. The biological relevance to these observations while intriguing needs to be proved further.

      We have changed the language of this section to hopefully better leave open other possibilities. As for the TINTIN dCypher results, we do not try to draw too many conclusions, but the data indicates that there is very little (if any) interaction with the histones tails (at least for the modifications present in the panel). One thing we can say is that the TINTIN module does not seem to have any binding preference for H3K36me3 nucleosomes.

      3) There is a large focus on the cross-linking mass spec study from another group and the previously published structure of the NuA4 complex. The authors are fairly aggressive in suggesting the other structure from Wang et al., is incorrect. It is very nice that their built structure shows a better interpretation of previous XL-MS data, but still many of the crosslinks are outside of the modelled density. One possibility that should be entertained is that the two studies are comparing different structures/states of NuA4. The authors of the Wang et al., paper indeed comment that Swc4 and Yaf9 are missing from their purified complex. It is of course possible that both structures are correct as they appear to be biochemically different, with the crosslinking in the Setiaputra paper better reflecting the complex presented here.

      Response given above.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript seeks to identify the mechanism underlying priority effects in a plantmicrobe-pollinator model system and to explore its evolutionary and functional consequences. The manuscript first documents alternative community states in the wild: flowers tend to be strongly dominated by either bacteria or yeast but not both. Then lab experiments are used to show that bacteria lower the nectar pH, which inhibits yeast - thereby identifying a mechanism for the observed priority effect. The authors then perform an experimental evolution unfortunately experiment which shows that yeast can evolve tolerance to a lower pH. Finally, the authors show that low-pH nectar reduces pollinator consumption, suggesting a functional impact on the plant-pollinator system. Together, these multiple lines of evidence build a strong case that pH has far-reaching effects on the microbial community and beyond.

      The paper is notable for the diverse approaches taken, including field observations, lab microbial competition and evolution experiments, genome resequencing of evolved strains, and field experiments with artificial flowers and nectar. This breadth can sometimes seem a bit overwhelming. The model system has been well developed by this group and is simple enough to dissect but also relevant and realistic. Whether the mechanism and interactions observed in this system can be extrapolated to other systems remains to be seen. The experimental design is generally sound. In terms of methods, the abundance of bacteria and yeast is measured using colony counts, and given that most microbes are uncultivable, it is important to show that these colony counts reflect true cell abundance in the nectar.

      We have revised the text to address the relationship between cell counts and colony counts with nectar microbes. Specifically, we point out that our previous work (Peay et al. 2012) established a close correlation between CFUs and cell densities (r2 = 0.76) for six species of nectar yeasts isolated from D. aurantiacus nectar at Jasper Ridge, including M. reukaufii.

      As for A. nectaris, we used a flow cytometric sorting technique to examine the relationship between cell density and CFU (figure supplement 1). This result should be viewed as preliminary given the low level of replication, but this relationship also appears to be linear, as shown below, indicating that colony counts likely reflect true cell abundance of this species in nectar.

      It remains uncertain how closely CFU reflects total cell abundance of the entire bacterial and fungal community in nectar. However, a close association is possible and may be even likely given the data above, showing a close correlation between CFU and total cell count for several yeast species and A. nectaris, which are indicated by our data to be dominant species in nectar.

      We have added the above points in the manuscript (lines 263-264, 938-932).

      The genome resequencing to identify pH-driven mutations is, in my mind, the least connected and developed part of the manuscript, and could be removed to sharpen and shorten the manuscript.

      We appreciate this perspective. However, given the disagreement between this perspective and reviewer 2’s, which asks for a more expanded section, we have decided to add a few additional lines (lines 628-637), briefly expanding on the genomic differences between strains evolved in bacteria-conditioned nectar and those evolved in low-pH nectar.

      Overall, I think the authors achieve their aims of identifying a mechanism (pH) for the priority effect of early-colonizing bacteria on later-arriving yeast. The evolution and pollinator experiments show that pH has the potential for broader effects too. It is surprising that the authors do not discuss the inverse priority effect of early-arriving yeast on later-arriving bacteria, beyond a supplemental figure. Understandably this part of the story may warrant a separate manuscript.

      We would like to point out that, in our original manuscript, we did discuss the inverse priority effects, referring to relevant findings that we previously reported (Tucker and Fukami 2014, Dhami et al. 2016 and 2018, Vannette and Fukami 2018). Specifically, we wrote that: “when yeast arrive first to nectar, they deplete nutrients such as amino acids and limit subsequent bacterial growth, thereby avoiding pH-driven suppression that would happen if bacteria were initially more abundant (Tucker and Fukami 2014; Vannette and Fukami 2018)” (lines 385-388). However, we now realize that this brief mention of the inverse priority effects was not sufficiently linked to our motivation for focusing mainly on the priority effects of bacteria on yeast in the present paper. Accordingly, we added the following sentences: “Since our previous papers sought to elucidate priority effects of early-arriving yeast, here we focus primarily on the other side of the priority effects, where initial dominance of bacteria inhibits yeast growth.” (lines 398-401).

      I anticipate this paper will have a significant impact because it is a nice model for how one might identify and validate a mechanism for community-level interactions. I suspect it will be cited as a rare example of the mechanistic basis of priority effects, even across many systems (not just pollinator-microbe systems). It illustrates nicely a more general ecological phenomenon and is presented in a way that is accessible to a broader audience.

      Thank you for this positive assessment.

      Reviewer #2 (Public Review):

      The manuscript "pH as an eco-evolutionary driver of priority effects" by Chappell et al illustrates how a single driver-microbial-induced pH change can affect multiple levels of species interactions including microbial community structure, microbial evolutionary change, and hummingbird nectar consumption (potentially influencing both microbial dispersal and plant reproduction). It is an elegant study with different interacting parts: from laboratory to field experiments addressing mechanism, condition, evolution, and functional consequences. It will likely be of interest to a wide audience and has implications for microbial, plant, and animal ecology and evolution.

      This is a well-written manuscript, with generally clear and informative figures. It represents a large body and variety of work that is novel and relevant (all major strengths).

      We appreciate this positive assessment.

      Overall, the authors' claims and conclusions are justified by the data. There are a few things that could be addressed in more detail in the manuscript. The most important weakness in terms of lack of information/discussion is that it looks like there are just as many or more genomic differences between the bacterial-conditioned evolved strains and the low-pH evolved strains than there are between these and the normal nectar media evolved strains. I don't think this negates the main conclusion that pH is the primary driver of priority effects in this system, but it does open the question of what you are missing when you focus only on pH. I would like to see a discussion of the differences between bacteria-conditioned vs. low-pH evolved strains.

      We agree with the reviewer and have included an expanded discussion in the revised manuscript [lines 628-637]. Specifically, to show overall genomic variation between treatments, we calculated genome-wide Fst comparing the various nectar conditions. We found that Fst was 0.0013, 0.0014, and 0.0015 for the low-pH vs. normal, low pH vs. bacteria-conditioned, and bacteria-conditioned vs. normal comparisons, respectively. The similarity between all treatments suggests that the differences between bacteria-conditioned and low pH are comparable to each treatment compared to normal. This result highlights that, although our phenotypic data suggest alterations to pH as the most important factor for this priority effect, it still may be one of many affecting the coevolutionary dynamics of wild yeast in the microbial communities they are part of. In the full community context in which these microbes grow in the field, multi-species interactions, environmental microclimates, etc. likely also play a role in rapid adaptation of these microbes which was not investigated in the current study.

      Based on this overall picture, we have included additional discussion focusing on the effect of pH on evolution of stronger resistance to priority effects. We compared genomic differences between bacteria-conditioned and low-pH evolved strains, drawing the reader’s attention to specific differences in source data 14-15. Loci that varied between the low pH and bacteria-conditioned treatments occurred in genes associated with protein folding, amino acid biosynthesis, and metabolism.

      Reviewer #3 (Public Review):

      This work seeks to identify a common factor governing priority effects, including mechanism, condition, evolution, and functional consequences. It is suggested that environmental pH is the main factor that explains various aspects of priority effects across levels of biological organization. Building upon this well-studied nectar microbiome system, it is suggested that pH-mediated priority effects give rise to bacterial and yeast dominance as alternative community states. Furthermore, pH determines both the strengths and limits of priority effects through rapid evolution, with functional consequences for the host plant's reproduction. These data contribute to ongoing discussions of deterministic and stochastic drivers of community assembly processes.

      Strengths:

      Provides multiple lines of field and laboratory evidence to show that pH is the main factor shaping priority effects in the nectar microbiome. Field surveys characterize the distribution of microbial communities with flowers frequently dominated by either bacteria or yeast, suggesting that inhibitory priority effects explain these patterns. Microcosm experiments showed that A. nectaris (bacteria) showed negative inhibitory priority effects against M. reukaffi (yeast). Furthermore, high densities of bacteria were correlated with lower pH potentially due to bacteria-induced reduction in nectar pH. Experimental evolution showed that yeast evolved in low-pH and bacteria-conditioned treatments were less affected by priority effects as compared to ancestral yeast populations. This potentially explains the variation of bacteria-dominated flowers observed in the field, as yeast rapidly evolves resistance to bacterial priority effects. Genome sequencing further reveals that phenotypic changes in low-pH and bacteriaconditioned nectar treatments corresponded to genomic variation. Lastly, a field experiment showed that low nectar pH reduced flower visitation by hummingbirds. pH not only affected microbial priority effects but also has functional consequences for host plants.

      We appreciate this positive assessment.

      Weaknesses:

      The conclusions of this paper are generally well-supported by the data, but some aspects of the experiments and analysis need to be clarified and expanded.

      The authors imply that in their field surveys flowers were frequently dominated by bacteria or yeast, but rarely together. The authors argue that the distributional patterns of bacteria and yeast are therefore indicative of alternative states. In each of the 12 sites, 96 flowers were sampled for nectar microbes. However, it's unclear to what degree the spatial proximity of flowers within each of the sampled sites biased the observed distribution patterns. Furthermore, seasonal patterns may also influence microbial distribution patterns, especially in the case of co-dominated flowers. Temperature and moisture might influence the dominance patterns of bacteria and yeast.

      We agree that these factors could potentially explain the presented results. Accordingly, we conducted spatial and seasonal analyses of the data, which we detail below and include in two new paragraphs in the manuscript [lines 290-309].

      First, to determine whether spatial proximity influenced yeast and bacterial CFUs, we regressed the geographic distance between all possible pairs of plants to the difference in bacterial or fungal abundance between the paired plants. If plant location affected microbial abundance, one should see a positive relationship between distance and the difference in microbial abundance between a given pair of plants: a pair of plants that were more distantly located from each other should be, on average, more different in microbial abundance. Contrary to this expectation, we found no significant relationship between distance and the difference in bacterial colonization (A, p=0.07, R2=0.0003) and a small negative association between distance and the difference in fungal colonization (B, p<0.05, R2=0.004). Thus, there was no obvious overall spatial pattern in whether flowers were dominated by yeast or bacteria.

      Next, to determine whether climatic factors or seasonality affected the colonization of bacteria and yeast per plant, we used a linear mixed model predicting the average bacteria and yeast density per plant from average annual temperature, temperature seasonality, and annual precipitation at each site, the date the site was sampled, and the site location and plant as nested random effects. We found that none of these variables were significantly associated with the density of bacteria and yeast in each plant.

      To look at seasonality, we also re-ordered Fig 2C, which shows the abundance of bacteria- and yeast-dominated flowers at each site, so that the sites are now listed in order of sampling dates. In this re-ordered figure, there is no obvious trend in the number of flowers dominated by yeast throughout the period sampled (6.23 to 7/9), giving additional indication that seasonality was unlikely to affect the results.

      Additionally, sampling date does not seem to strongly predict bacterial or fungal density within each flower when plotted.

      These additional analyses, now included (figure supplements 2-4) and described (lines 290-309) in the manuscript, indicate that the observed microbial distribution patterns are unlikely to have been strongly influenced by spatial proximity, temperature, moisture, or seasonality, reinforcing the possibility that the distribution patterns instead indicate bacterial and yeast dominance as alternative stable states.

      The authors exposed yeast to nectar treatments varying in pH levels. Using experimental evolution approaches, the authors determined that yeast grown in low pH nectar treatments were more resistant to priority effects by bacteria. The metric used to determine the bacteria's priority effect strength on yeast does not seem to take into account factors that limit growth, such as the environmental carrying capacity. In addition, yeast evolves in normal (pH =6) and low pH (3) nectar treatments, but it's unclear how resistance differs across a range of pH levels (ranging from low to high pH) and affects the cost of yeast resistance to bacteria priority effects. The cost of resistance may influence yeast life-history traits.

      The strength of bacterial priority effects on yeast was calculated using the metric we previously published in Vannette and Fukami (2014): PE = log(BY/(-Y)) - log(YB/(Y-)), where BY and YB represent the final yeast density when early arrival (day 0 of the experiment) was by bacteria or yeast, followed by late arrival by yeast or bacteria (day 2), respectively, and -Y and Y- represent the final density of yeast in monoculture when they were introduced late or early, respectively. This metric does not incorporate carrying capacity. However, it does compare how each microbial species grows alone, relative to growth before or after a competitor. In this way, our metric compares environmental differences between treatments while also taking into account growth differences between strains.

      Here we also present additional growth data to address the reviewer’s point about carrying capacity. Our experiments that compared ancestral and evolved yeast were conducted over the course of two days of growth. In preliminary monoculture growth experiments of each evolved strain, we found that yeast populations did reach carrying capacity over the course of the two-day experiment and population size declined or stayed constant after three and four days of growth.

      However, we found no significant difference in monoculture growth between the ancestral stains and any of the evolved strains, as shown in Figure supplement 12B. This lack of significant difference in monoculture suggests that differences in intrinsic growth rate do not fully explain the priority effects results we present. Instead, differences in growth were specific to yeast’s response to early arrival by bacteria.

      We also appreciate the reviewer’s comment about how yeast evolves resistance across a range of pH levels, as well as the effect of pH on yeast life-history traits. In fact, reviewer #2 pointed out an interesting trade-off in life history traits between growth and resistance to priority effects that we now include in the discussion (lines 535-551) as well as a figure in the manuscript (Figure 8).

    1. Author Response

      Reviewer #1 (Public Review):

      This works makes an important contribution to the study of the cell cycle and the attempt to infer mechanism by studying correlations in division timing between single cells.

      Given the importance of circadian rhythms to the ultimate conclusions of the study, I think it would be helpful to clarify the connection between possible oscillatory regulatory mechanisms and the formalism developed in e.g. Equation 3. The treatment appears to be a leading order expansion in stochastic fluctuations of the cell cycle regulators about the mean, but if an oscillatory process is involved, the fluctuations will be correlated in time and need not be small.

      We thank the reviewer for the positive assessment of our work. We have introduced Section S7 in the Supplementary Information to address the connection between our theory and two existing models of circadian modulation of cell division. In the first model, the circadian clock drives the interdivision time, while in the second model, the clock drives cell size control. We find that, while both models satisfy the cousin inequality for comparable parameters, they differ in their interdivision time correlation patterns. The first model yields an alternator-oscillator mixed pattern, while the second gives an aperiodic-oscillator pattern.

      The reviewer is right that our theory presents a leading-order expansion of cell cycle factor fluctuations. To overcome this limitation, we introduced the new Section S2 in the Supplementary Information, which shows how the correlation patterns are altered for moderately strong fluctuations. Interestingly, nonlinearity can be treated within our framework by introducing complexes of cell cycle factors. However, our model selection predicted that two cell cycle factors were enough to fit the present data without the need for complexes.

      Reviewer #2 (Public Review):

      This paper is of broad interest to scientists in the fields of cell growth, cell division, and cell-cycle control. Its main contribution is to provide a method to restrict the space of potential cell-cycle models using observed correlations in inter-division times of cells across their lineage tree. This method is validated on several data sets of bacterial and mammalian cells and is used to determine what additional measurements are required to distinguish the set of competing models consistent with a given correlation pattern.

      The patterns of correlations in the division times of cells within their lineage tree contain information about the inheritable factors controlling cell cycles. In general, it is difficult to extract such information without a detailed model of cell cycle control. In this manuscript, the authors have provided a Bayesian inference framework to determine what classes of models are consistent with a given set of observations of division time correlations, and what additional observations are needed to distinguish between such models. This method is applied to data sets of division times for various types of bacterial and mammalian cells including cells known to exhibit circadian oscillations.

      The manuscript is well-written, the analyses are thorough, and the authors have provided beautiful visualizations of how alternative models can be consistent with a finite set of observed correlations, and where and how extra measurements can distinguish between such models. Known models of growth rate correlations, cell-size regulation, and cell cycle control are analyzed within this framework in the Supplemental Information. A major advantage of the proposed method is that it provides a non-invasive framework to study the mechanism of cell-cycle control.

      We thank the reviewer for the positive response to our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript the authors describe an approach for controlling cellular membrane potential using engineered gene circuits via ion channel expression. Specifically, the authors use microfluidics to track S. cerevisiae gene expression and plasma membrane potential (PMP) in single cells over time. They first establish a small engineered gene circuit capable of producing excitable gene expression dynamics through the combination of positive and negative feedback, tracking expression using GFP (Figure 1). Though not especially novel or complex, the data quality is high in Figure 1 and the results are convincing. Note that the circuit is excitable and not oscillatory; it is being driven periodically by a chemical inducer. I think the authors could have done a better job justifying the use of an excitable engineered gene circuit system, since you could get a similar result by just driving a promoter with the equivalent time course of inducer.

      We restructured the manuscript by presenting the open-loop version of our synthetic circuit and demonstrate that closed loop system integrating feedbacks performs significantly better than its open-loop version (revised Figures 1 and 3). This open-loop system is based on Mar proteins that can synchronizes gene expression on extended spatiotemporal scales (PerezGarcia et al., Nat Comm, 2021). Other driven systems (i.e., TetR, AraC, LacI) can temporally synchronize gene expression in single bacteria cells to successive cycles of inducer. However, over time these bacterial systems build substantial delays in phases between cells, partially due to noise that ultimately led to desynchrony between individual cells even though they tend to follow the common inducer. This is clearly not the case in Mar-based systems (Perez-Garcia et al., Nat Comm, 2021) as eukaryotic cells synchronize to each other under guidance of common environmental stimuli with neglectable phase drift. Furthermore, in revised version we show that dual feedback strategy provides a robust solution to control ion channel expression and associated changes in PMP (see Conclusions lines 231-237).

      The authors then use a similar approach to produce excitable expression of the bacterial ion channel KcsA, tracking membrane voltage using the voltage-sensitive dye ThT rather than GFP fluorescence (Figure 2). The experimental results in this figure are more novel as the authors are now using the expression of a heterologous ion channel to dynamically control plasma membrane potential. While fairly convincing, I think there are a few experimental controls that would make these results even more convincing. It is also unclear why the authors are now using power spectra to display observed frequencies compared to the much more intuitive histograms used in Figure 1.

      Now we use violin plots with period distributions consistently in all figures to ease the comparison between scenarios.

      Finally, the authors move on to use a similar excitable engineered gene circuit approach to produce inducible control of the K1 toxin which influences the native potassium channel TOK1 rather than the heterologous ion channel KcsA (Figure 3). I have a similar reaction to this figure as with Figure 2: the results are novel and interesting but would benefit from more experimental controls. Additionally, the image data shown in Figure 3b is very unclear and could be expanded and improved.

      In revised version we have decided to remove K1 toxin data as we are aware that we cannot modulate K1 degradation rate due to its extracellular nature. Instead, we have decided to perform additional experiments in which we directly plugged our circuit to TOK1 native potassium channel to demonstrate that our feedback-integrating synthetic circuit is capable of controlling TOK1 dosage and associated PMP changes (revised Figure 3, and lines 209-220). We believe these new data make more direct connection between synthetic circuits phytohormones and native channel expression than presented earlier K1-based scenario.

      Overall, in my opinion the claims in the abstract and title are a bit strong. I would deemphasize global coordination and "synchronous electrical signaling" since the authors are driving a global inducer. To make the claim of synchronous signaling I would want to see spatial data for cells near vs. far from K1 toxin producing cells in Figure 3 along with estimates of inducer/flow timescale vs. expression/diffusion of K1 toxin. As I read the manuscript, I see that most of the synchronicity comes from the fact that all cells are experiencing a global inducer concentration.

      We agree with the Reviewer, synchronicity and global coordination comes from phytohormone sensing feedback circuit that is guided by cyclic environmental changes. We have revised definition of synchronous signaling as suggested, focusing on the macroscopic synchronization of ion channel expression achieved by external modulation, which is the key message coming from this work.

      Reviewer #2 (Public Review):

      The authors present a novel method to induce electrical signaling through an artificial chemical circuit in yeast which is an unconventional approach that could enable extremely interesting, future experiments. I appreciate that the authors created a computer model that mathematically predicts how the relationship between their two chemical stimulants interact with their two chosen receptors, IacR/MarR, could produce such effects. Their experimental validations clearly demonstrated control over phase that is directly related to the chemical stimulation. In addition, in the three scenarios in which they tested their circuit showed clear promise as the phase difference between spatially distant yeast communities was ~10%. Interestingly, indirect TOK1 expression through K1 toxin gives a nice example of inter-strain coupling, although the synchronization was weaker than in the other cases. Overall, the method is sound as a way to chemically stimulate yeast cultures to produce synchronous electrical activity. However, it is important to point out that this synchronicity is not produced by colony-colony communication (i.e., self-organized), but by a global chemical drive of the constructed gene-expression circuit.

      The greatest limitation of the study lies in the presentation (not the science). There are two significant examples of this. First, the authors state this study 'provides a robust synthetic transcriptional toolbox' towards chemo-electrical coupling. In order to be a toolbox, more effort needs to be put into helping others use this approach. However little detail is given about methodological choices, circuit mechanisms in relation to the rest of the cell, nor how this method would be used outside of the demonstrated use case. Second, the authors stress that this method is 'non-invasive', but I fail to see how the presented methodology could be considered non-invasive, in in-vivo applications, as gene circuits are edited and a reliable way to chemically stimulate a large population of cells would be needed. It may be that I misunderstood their claim as the presentation of method and data were not done in a way that led to easy comprehension, but this needs to be addressed specifically and described.

      We apologize Reviewer for a potential misunderstanding. By ‘non-invasive’ we meant that such systems would not need, for instance, the surgical installation of light components to control ion channel activity. Nonetheless, we have removed these confusing sentences from the revised manuscript.

      The rational for using Mar-based system with feedback strategy data has been now presented in more structured and comprehensive way across the revised manuscript to demonstrate benefits from integrating feedback as well as potential of such systems for excitable dynamics with noise-filtering capability and faster responsiveness. We also show how system can be coupled to native potassium channels, opening ways to integrate synthetic circuit into other organisms.

      In terms of classifying the synchronicity, while phase difference among communities was the key indicator of synchronization, there were little data exploring other aspects of coupled waveforms, nor a discussion into potential drawbacks. For example, phase may be aligned while other properties such as amplitude and typical wave-shape measures may differ. As this is presented as a method meant for adoption in other labs, a more rigorous analytical approach was expected.

      In the revised manuscript, we have analyzed synchronicity using several different approaches:

      (1) we calculate cumulative autocorrelations of response between communities.

      (2) to complement autocorrelation analysis, we developed a quantitative metric of ‘synchrony index’ defined as 1 - R where R is the ratio of differences in subsequent ThT peak positions amongst cell communities (phase) to expected period. This metrics describes how well synchronized are fungi colonies with each other under guidance of the common environmental signal.

      (3) we analyzed amplitudes and peak widths for all presented scenarios and we conclude that while periods and peak widths are robust across communities there is noticeable variation in amplitudes (i.e. Figure 3E).

      We therefore believe that this multistep quantitative approach is rigorous in identifying oscillatory signal characteristics.

      Reviewer #3 (Public Review):

      We are enthusiastic about this paper. It demonstrates controlled expression of ion channels, which itself is impressive. Going a step further, the authors show that through their control over ion channel expression, they can dynamically manipulate membrane potential in yeast. This chemical to electrophysiological conversion opens up new opportunities for synthetic biology, for example development of synthetic signaling systems or biological electrochemical interfaces. We believe that control of ion channel expression and hence membrane potential through external stimuli can be emphasized more strongly in the report. The experimental time-lapse data were also high quality. We have two major critiques on the paper, which we will discuss below.

      First, we do not believe the analyses used supports the authors' claims that chemical or electrical signals are propagating from cell-to-cell. The text makes this claim indirectly and directly. For example, in lines 139-141, the authors describe the observed membrane potential dynamics as "indicative of the effective communication of electrical messages within the populations". There are similar remarks in lines 144 and 154-156. The claim of electrical communication is further established by Figure 2 supplement 3, which is a spatial signal propagation model. As far as we can tell, this model describes a system different from the one implemented in the paper.

      Second, it is not clear why the excitable dynamics of the circuit are so important or if the circuit constructed does in fact exhibit excitable dynamics. Certainly, the mathematical model has excitable dynamics. However, not enough data demonstrates that the biological implementation is in an excitable regime. For example, where in the parameter space of Figure 1 supplement 1 does the biological circuit lie? If the circuit has excitable dynamics, then the authors should observe something like Figure 1 supplement 1B in response to a nonoscillating input. Do they observe that? Do they observe a refractory period? Even if the circuit as constructed is not excitable, we don't think that's a major problem because it is not central to what we believe is the most important part of this work - controlling ion channel expression and hence membrane potential with external chemical stimuli.

      We thank Reviewer for encouraging comments and positive evaluation of our work.

    1. Author Response

      Reviewer 1# (Public Review):

      Purkinje cells (PCs) in the cerebellum extend axonal collaterals along the PC layer and within the molecular layer. Previous anatomical studies have shown the existence of these tracts and recently, the existence of functional synapses from PCs to PCs, molecular layer interneurons (MLIs), and other cell types was demonstrated by Witter et al., (Neuron, 2016) using optogenetics. In this manuscript, Halverson et al., first characterize the PC to MLI synapse properties in the slice using optogenetics and dual patch recordings. They then use computer simulations to predict the role of these connections in eyelid conditioning and test these predictions using in vivo recordings in rabbits. Authors claim that PCs fire before their target MLIs and that their activity is anticorrelated. They further suggest that the special class of MLIs receiving inhibitory input from PCs might serve to synchronize PCs during eyelid conditioning.

      Major comments:

      1) The manuscript is quite long with 9 main figure panels and 6 supplementary figures. The flow of the results is not smooth. While the first 4 figures are nicely done in terms of their results and organization, the same cannot be said about the rest of the figures.

      To address this concern, we have revised the Results section extensively. We believe that it is now much more accessible and better integrated.

      In fact, it would make sense to split the manuscript in two, one describing the synaptic properties and circuit mapping of the PC-PC-MLI circuit and the other describing their role in eyelid conditioning. As it stands, this manuscript is a tough read and difficult to get through.

      We acknowledge that our results, which were done in two different labs and employed a variety of different techniques, could have been split into two (or even more) separate papers. However, we believe that there is high value to our readers in providing a comprehensive study that integrates many different types of analyses to attack the same fundamental question. That is why we chose to organize the content in the way that we did and that is why we still prefer to keep the entire story together. However, we do agree with the reviewer’s point that the previous version was unwieldy and too challenging to understand. Therefore, we have invested a lot of effort to improve the readability of the revised version.

      Further, the authors have not connected the initial slice physiology with the later in vivo work to argue for their presence in the same paper. For example, the quantal content measurement, the short-term plasticity, the mobilization rate measurement, etc do not figure in the latter half of the manuscript at all. I strongly suggest carving figures 1-4 out into a separate manuscript.

      The slice work motivated the computational simulation and the in vivo recordings of MLI activity. While it is true that it is hard to correlate every aspect of the slice work (e.g. quantal content measurements, etc.) with the in vivo recordings, and vice versa, there are elements of each that have informed the other. As a result, consistent properties of the PC-to-PC-MLI circuit emerged. We have highlighted the cross-connections more in the revised manuscript, including the following passages:

      1) “Only a subset of MLIs (8.7%) showed clear inverse correlation with eyelid PCs and they were within approximately 120 µm of eyelid PCs. These connectivity rates and distances are comparable to our observations in cerebellar slices, where we found that approximately 5-6% of MLIs receive PC feedback inhibition (Figure 1b) that extends over 200 m or less (Figure 3).” (p.13, para 1)

      2) “The pattern of cross-correlation between connected PCs and PC-MLIs was qualitatively similar to that observed in slices…” (p.14, para 2).

      3) “This correspondence strengthens the conclusion that putative PC-MLI identified in vivo are equivalent to the PC-MLI identified in slices” (p. 15, para 1).

      4) “The need for relatively large changes in PC activity in vivo highlights the importance of the frequency-independent synaptic synaptic transmission at the PC-to-PC-MLI synapse illustrated in Figure 2.” (p. 16, para. 1)

      We have more closely harmonized the style of all figures, to subliminally emphasize the close connection between the slice and in vivo results.

      Above we have addressed the suggestion to split the paper into two. Instead of breaking up the paper, we worked hard to better integrate the two parts and make them easier to read as a whole.

      2) Authors conclude that eyelid PCs and eyelid PC-MLIs are inversely correlated and that PCs precede PC-MLIs during CRs and therefore could drive their activity. Both of these points are insufficiently justified by their analysis. First, it is not clear how eyelid PCs are identified – I’m assuming this is based on negative correlation with CRs just like positively correlated MLIs are assigned as eyelid PC-MLIs.

      We apologize for failing to mention that eyelid PCs were identified by the presence of US -evoked (eyelid stimulation) complex spikes. This criterion is completely independent of the responses of PCs during expression of eyelid CRs and also provides an in vivo tool for identifying the “eyelid” region of the cerebellar cortex, which should also be where eyelid PC-MLIs are located. To address this omission, we have now describe the method used to identify eyelid PCs in the Methods section (p. 31, para. 2) and the Results section (p. 11, para. 2).

      If this is how PCs and PC-MLIs are identified, then the inverse correlation between the two cell types results from this definition itself. And, their activity pattern during CRs, illustrated in many figure panels is hardly surprising.

      Yes of course this would be circular logic, but it is not at all what we did! Again, we apologize for the confusion.

      Second, to show that PCs fire ahead of PC-MLIs, the authors calculate the difference in fractional change in spike rate before and after the start of the CR (PC-MLI). Their reasoning is that if the bulk of firing rate change happened before the start of CR for PCs, but at the start or later for PC-MLIs, then this value will be positive, else it will be negative. The distribution of these values was shifted to the positive side leading them to conclude that PCs fire ahead of PC-MLIs. However, this is a huge logical jump. The sign of (PC-MLI) is dependent on the depth of modulation in each cell type as well and does not necessarily indicate relative timing. In any case, such caveats have not been ruled out in their analysis. This analysis to establish timing is unconvincing. Would it not be better to look at the timing of the spike modulation start directly rather than the round-about method they are using?

      We agree with the reviewer that PC and PC-MLI activities undergo complex time-dependent changes, particularly during CRs, which makes it challenging to have a single parameter that uniquely represents the differences in timing between the activity of the two cell types. In our revised manuscript, we have addressed this issue by creating a new section that is entirely devoted to analysis of the temporal relationship between PC and PC-MLI activity (pp. 14-17). In brief, here are the main lines of evidence that PCs fire prior to PC-MLIs, both in baseline conditions and during conditioned eyelid responses.

      We have provided 3 types of evidence that PCs fire prior to putative PC-MLIs during baseline activity:

      1) A spike-triggered average of PC and putative PC-MLI activity during baseline firing showed a modest decrease in PC-MLI firing rate in response to a PC action potential (Figure 8a; see also Figure 8-figure supplement 1a and 1b).

      2) A pause in PC activity caused a very substantial rise in activity in putative PC-MLIs (Figure 8c; see also Figure 8-figure supplement 1c).

      3) A burst of PC activity caused a decline in putative PC-MLI activity (Figure 8d; see also Figure 8-figure supplement 1d).

      We have an additional 3 lines of evidence showing that PCs fire prior to putative PC-MLIs during CRs:

      1) Simultaneous recordings of the time course in changes in PC and putative PC-MLI activity during CRs indicate that PC activity usually declined prior to the activity of putative PC-MLIs. This is clearly visible in the examples shown in Figure 9c, as well as the averaged data shown in Figure 9-figure supplement 1a.

      2) We measured the delay between the time at which PC activity reached 50% of its minimum during the CS and the time at which the activity of putative PC-MLIs reached 50% of its maximum during single trials. Whenever CRs were observed, PCs reached their half-maximal response before putative PC-MLIs did (Figure 9─figure supplement 1b).

      3) We also measured the collective timing of changes in the activity of putative PC-MLIs and eyelid PCs during conditioning across all of our paired recordings. This was done by calculating a ratio representing the magnitude of changes in activity prior to CR onset, normalized to the peak amplitude of the change during the entire interval. The distribution of differences in the timing of changes in PC and PC-MLI activity has a mean that is greater than zero (Figure 10), indicating that eyelid PCs decreased their activity before putative PC-MLIs increased their activity in a majority of cases.

      We hope that these improvements have adequately addressed the reviewer’s concern.

      3) Many figure panels make the same point and appear redundant. For example, that PCs and PC-MLIs are inversely correlated with each other in vivo during CRs is shown in Figure 7, figure 8a, S2, S4, and S5. Of course, in each case, the data are sorted differently (according to ISI, CR initiation, cumulative distributions, etc.,) but surely, the point regarding inverse relationship can be conveyed more concisely?

      As mentioned in response to the reviewer’s previous comment, we have made significant changes to this part of the manuscript, including creating a section that addresses the temporal relationship between PC and PC-MLI activity. This has involved removing some of the analyses listed by the reviewer, adding some new analysis and relegating some previous figures to the supplementary materials. We believe that these changes allow the manuscript to efficiently clarify the relationship between PC and PC-MLI activity and highlight the value of each type of analysis that is included.

      4) Several details are missing in the methods section even though parts of it may have been published before. For instance, how are CRs calculated in the simulation? Methods state that 'The averaged and smoothed activity of the eight deep nucleus neurons was used to represent the output of the simulation and the predicted "eyelid response" of the simulation'. It is not clear what the nature of this transform is and if any calibration factors were used. How comparable are the simulated CRs in kinetics and amplitude to experimental CRs?

      In response to this comment, we have revised the Methods section to include much more detail about the simulation methods, including two schematic diagrams. The methods employed for the experimental work in the paper are already described in detail.

      The simulation can produce simulated CRs (smoothed histogram of nucleus activity) with kinematic variables that are comparable to experimental CRs. A detailed account of how this is accomplished is described in Medina and Mauk (2000) and are briefly summarized on p. 39, para.2: “The averaged and smoothed activity of the eight deep nucleus neurons was used to represent the output of the simulation and the predicted “eyelid response” of the simulation.” Although this approach is not intended to simulate the precise kinematics of an eyeblink, comparison of Figures 11a (simulation) and 6a (rabbit) show that there is a reasonable concordance. The real value of the simulation is in predicting the relative changes in eyelid responses that occur during conditioning.

    1. Author Response

      Reviewer #1 (Public Review):

      Figures 2 through 6. There is no description of the relationship between the findings and the anatomical location of the electrodes (other than distal versus local). Perhaps the non-uniform distribution of electrodes makes these analyses more complicated and such questions might have minimal if any statistical power. But how should we think about the claims in Figures 2-6 in relationship to the hippocampus, amygdala, entorhinal cortex, and parahippocampal gyrus? As one example question out of many, is Figure 2C revealing results for local pairs in all medial temporal lobe areas or any one area in particular? I won't spell out every single anatomical question. But essentially every figure is associated with an anatomical question that is not described in the results.

      To address the reviewer’s point we now report the distribution of spike-LFP pairs across anatomical regions for each Figure 2-6. The results split by anatomical regions are reported in Figure 2 – figure supplement 7, Figure 3 – figure supplement 7, Figure 4 – figure supplement 1, Figure 5 – figure supplement 2, and Figure 6 – figure supplement 3. We also calculated a non-parametric Kruskal-Wallis Test to statistically examine the effect of anatomical regions on the results shown in each figure. Generally, these new results show that the effects are similar across regions, apart from two exceptions (i.e. Figure 4 – supplement 1; and Figure 5 – supplement 2). However, we would like to stress that these results should be taken with a huge grain of salt because the electrodes were not evenly distributed across regions (i.e. ~75% of observations pertain to the hippocampus), and patients as the reviewer correctly points out. This leads to sometimes very low numbers of observations per region and it is difficult to disentangle whether any apparent differences are driven by regional differences, or differences between patients. Detailed results are reported below.

      Manuscript lines 207-212: “In the above analysis all MTL regions were pooled together to allow for sufficient statistical power. Results separated by anatomical region are reported in Figure 2 – figure supplement 7 for the interested reader. However, these results should be interpreted with caution because electrodes were not evenly distributed across regions and patients making it difficult to disentangle whether any apparent differences are driven by actual anatomical differences, or idiosyncratic differences between patients.”

      Manuscript lines 255-258: “Finally, we report the distal spike-LFP results separated by anatomical region in Figure 3 – figure supplement 7, which did not reveal any apparent differences in the memory related modulation of theta spike-LFP coupling between regions.”

      Manuscript lines 264-266: “PSI results separated by anatomical regions are reported in Figure 4 – figure supplement 1, which revealed that the PSI results were mostly driven by within regional coupling.”

      Manuscript lines 399-303: “We also analyzed whether the memory-dependent effects of cross-frequency coupling differ between anatomical regions (see Figure 5 – figure supplement 2). This analysis revealed that the results were mostly driven by the hippocampus, however we urge caution in interpreting this effect due to the large sampling imbalance across regions.”

      Manuscript lines 343-346: “As for the above analysis we also investigated any apparent differences in co-firing between anatomical regions. These results are reported in Figure 6 – figure supplement 3 and show that the earlier co-firing for hits compared to misses was approximately equivalent across regions.”

      Figure 1

      1A. I assume that image positions are randomized during a cued recall?

      Yes, that was the case. We now added that information in the methods section.

      Manuscript lines 526: “Image positions on the screen were randomized for each trial.”

      What was the correlation between subjects' indication of how many images they thought they remembered and their actual performance?

      We did not log how many images the patients thought they remembered. Specifically, if the patients answered that they remembered at least one image, then they were shown the selection screen where they could select the appropriate images. Therefore, we cannot perform this analysis. We report this now in the methods section. However, albeit interesting, the results of such an analysis would not affect the main conclusions of our manuscript.

      Manuscript lines 523-524: “The experimental script did not log how many images the patient indicated that they thought to remember.”

      1B. Chance is shown for hits but not misses. I assume that hits are defined as both images correct and misses as either 0 or 1 image correct. Then a chance for misses is 1-chance for hits = 5/6. It would be nice to mark this in the figure.

      Done as suggested (see Figure 1).

      The authors report that both incorrect was 11.9%. By chance, both incorrect should be the same as both correct, hence also 1/6 probability, hence the probability of both incorrect seems quite close to chance levels, right?

      Yes, that is correct, however, across sessions the proportion of full misses (i.e. both incorrect) was significantly below chance (t(39)=-1.9214; p<0.05). Nevertheless, the proportion of fully forgotten trials appears to be higher than expected purely by chance. This is likely driven by a tendency of participants to either fully remember an episode, or completely forget it, as demonstrated previously in behavioural work (Joensen et al., 2020; JEP Gen.). We report this now in the manuscript.

      Manuscript lines 132-136: “Across sessions the proportion of full misses (i.e. both incorrect) was significantly below chance (t39=-1.92; p<0.05). However, the proportion of fully forgotten trials appears to be higher than expected purely by chance. This is likely driven by a tendency of participants to either fully remember an episode, or completely forget it, as demonstrated previously in behavioral work (25).”

      1C. How does the number of electrodes relate to the number of units recorded in each area?

      The distribution of neurons per region is shown in the new Figure 1D (see above). It approximately matches the distribution of electrodes per region, except for the Amygdala where slightly more neurons where recorded. This is because of one patient (P08) who had two electrodes in the left and right Amygdala and who contributed at lot of sessions (i.e. 9 sessions, comparing to an average of 4.44 per patient).

      Line 152. The authors state that neural firing during encoding was not modulated by memory for the time window of interest. This is slightly surprising given that other studies have shown a correlation between firing rates and memory performance (see Zheng et al Nature Neuroscience 2022 for a recent example). The task here is different from those in other studies, but is there any speculation as to potential differences? What makes firing rates during encoding correlate with subsequent memory in one task and not in another? And why is the interval from 2-3 seconds more interesting than the intervals after 3 seconds where the authors do report changes in firing rates associated with subsequent performance? Is there any reason to think that the interval from 2-3 seconds is where memories are encoded as opposed to the interval after 3 seconds?

      Zheng et al. used a movie-based memory paradigm where they manipulated transitions between scenes to identify event cells and boundary cells. They show that boundary cells, which made up 7.24% of all recorded MTL cells, but not event cells (6.2% of all MTL cells) modulate their firing rate around an event depending on later memory. There are quite a few differences between Zheng et al’s study and our study that need to be considered. Most importantly, we did not perform a complex movie-based memory paradigm as in Zheng et al. and therefore cannot identify boundary cells, which would be expected to show the memory dependent firing rate modulation. This alone could contribute to the fact that no significant differences in firing rates in the first second following stimulus onset were observed. Such an absence of a difference of neural firing depending on later memory is not unprecedented. In their seminal paper, Rutishauser et al. (2010; Nature) report no significant differences in firing rates (0-1 seconds after stimulus onset, which is similar to our 2-3 seconds time window) between later remembered or later forgotten images. This finding is also in line to Jutras & Buffalo (2009; J Neurosci) who also show no significant difference in firing rates of hippocampal neurons during encoding of remembered and forgotten images.

      The 2-3 seconds time interval, which corresponds to 0-1 seconds after the onset of the two associate images, is special because it marks the earliest time point where memory formation can start, therefore allowing us to investigate these very early neural processes that set the stage for later memory-forming processes. While speculative, these early processes likely capture the initial sweep of information transfer into the MTL memory system which arguably is reflected in the timing of spikes relative to LFPs. It is conceivable that these initial network dynamics reflect attentional processes, which act as a gate keeper to the hippocampus (Moscovitch, 2008; Can J Exp Psychol) and thereby set the stage for later memory forming processes. This interpretation would be consistent with studies in macaques showing that attention increases spike-LFP coupling, whilst not affecting firing rates (Fries et al., 2004; Science). We modified the discussion section to address this issue.

      Manuscript lines 468-474: “Interestingly, these early modulations of neural synchronization by memory encoding were observed in the absence of modulations of firing rates, which is consistent with previous results in humans (16) and macaques (12), but contrasts with (43). Studies in macaques showed that attention increases spike-LFP coupling whilst not affecting firing rates (44). It is therefore conceivable that these initial network dynamics reflect attentional processes, which act as a gate keeper to the hippocampus and thereby set the stage for later memory forming processes (45).”

      Lines 154-157 and relationship to the subsequent analyses. These lines mention in passing differences in power in low-frequency bands and high-frequency bands. To what extent are subsequent results (especially Figures 3 and 4) related to this observation? That is, are the changes in spike-field coherence, correlated with, or perhaps even dictated by, the changes in power in the corresponding frequency bands?

      To address this question we repeated the analysis that we performed for SFC for Power in those channels whose LFP was locally coupled to spikes in gamma, and distally coupled to spikes in theta. Furthermore, we correlated the difference in peak frequency between hits and misses between Power and SFC. If power would dictate the effects seen in SFC then we would expect similar effects of memory in power as in SFC, that is an increase of peak frequency for hits compared to misses for gamma and theta. Furthermore, we would expect to find a correlation between the peak frequency differences in power and SFC. None of these scenarios were confirmed by the data. These results are now reported in Figure 2 – figure supplement 5 for gamma, and Figure 3 – figure supplement 5 for theta.

      Manuscript lines 195-199: “We also tested whether a similar shift in peak gamma frequency as observed for spike-LFP coupling is present in LFP power, and whether memory-related differences in peak gamma spike-LFP are correlated with differences in peak gamma power (Figure 2 – figure supplement 5). Both analyses showed no effects, suggesting that the effects in spike-LFP coupling were not coupled to, or driven by similar changes in LFP power.”

      Manuscript lines 248-253: “As for gamma, we also tested whether a similar shift in peak theta frequency is present in LFP power, and whether there is a correlation between the memory-related differences in peak theta spike-LFP and peak theta power (Figure 3 – figure supplement 5). Both analyses showed no effects, suggesting that the effects in spike-LFP coupling were not coupled to, or driven by similar changes in LFP power.”

      Do local interactions include spike-field coherence measurements from the same microwire (i.e., spikes and LFPs from the same microwire)?

      Yes, they do. Out of the 53 local spike-SFC couplings found for the gamma frequency range, 11 (20.75%) were from pairs where the spikes and LFPs were measured on the same microwire. We assume that the reviewer is asking this question because of a concern that spike interpolation may introduce artifacts which may influence the spectrograms and consequently the spike-LFP coupling measures. This was also pointed out by Reviewer #2. To address this concern, we split the data based on whether the spike and LFP providing channels were the same or different. The results show that (i) the spectrogram of SFC is highly similar between the two datasets, with a prominent gamma peak present in both and no significant differences between the two; (ii) restricting the analysis to those data where the LFP and spike providing channels are different replicated the main finding of faster gamma peak frequencies for hits compared to misses; and (iii) limiting the SFC analysis further to only ‘silent’ channels, i.e. channels where no SUA/MUA activity was present at all also replicated the main finding of faster gamma peak frequencies for hits compared to misses.

      These analyses suggest that the SFC results were not driven by spike interpolation artefacts.

      Manuscript lines 199-203: “To rule out concerns about possible artifacts introduced by spike interpolation we repeated the above analysis for spike-LFP pairs where the spike and LFP providing channels are the same or different, and for ‘silent’ LFP channels (i.e. channels were no SUA/MUA activity was detected (see Figure 2 – figure supplement 6). “

      60 Hz. It has always troubled me deeply when results peak at 60 Hz. This is seen in multiple places in the manuscript; e.g., Figures 2B, 2E. What are the odds that engineers choosing the frequency for AC currents would choose the exact same frequency that evolution dictated for interactions of brain signals? This is certainly not the only study that reports interesting observations peaking at 60 Hz. One strong line of argument to suggest that this is not line noise is the difference between conditions. For example, in Figure 2B, there is a difference between local and distal interactions. It is hard for me to imagine why line noise would reveal any such difference. Still ...

      The frequency for AC currents in Europe is 50 Hz, not 60 Hz as in the US. Therefore, our SFC effects are well outside the range of the notch.

      Figure 6. I was very excited about Figure 6, which is one of the most novel aspects of this study. In addition to the anatomical questions about this figure noted above, I would like to know more. What is the width of the Gaussian envelope?

      The width of the Gaussian Window used in the original results was 25ms. We chose this time window because in our view it represents a good balance between integrating over a long-enough time window and thus allowing for some jitter in neural firing between pairs of neurons, whilst still being temporally specific. Finding the right balance here is not trivial because a too short time window underestimates co-firing, and a too long time window may not provide the temporal specificity necessary to detect co-firing lags (Cohen & Kohn, 2011; Nat Neurosci). To test whether this choice critically affected our results, we repeated the analysis for different window sizes, i.e. 15, 35, and 45 ms. The results show that the pattern of results did not change, with hits showing earlier peaks in co-firing compared to misses. Critically, the difference in co-firing peaks was significant for all window sizes, except for the shortest one which presumably is due to the increase in noise because of the smaller time window over which spikes are integrated. These issues are now mentioned in the methods section, and the results for the different window sizes are reported in Figure 6 – figure supplement 4.

      Manuscript lines 346-347: “The co-firing analyses were replicated with different smoothing parameters (see Figure 6 – figure supplement 4).”

      Manuscript lines 894-898: “We chose this time window because it should represent a good balance between integrating over a long-enough time window and thus allowing for some jitter in neural firing between pairs of neurons, whilst still being temporally specific (57). To test whether this choice critically affected our results, we repeated the analysis for different window sizes, i.e. 15, 35, and 45 ms (see Figure 6 – figure supplement 4).”

      Are these units on the same or different microwires?

      All units used for the analysis shown in Figure 6 come from different microwires. This was naturally the case because the putative up-stream neuron was distally coupled to the theta LFP, and the putative down-stream neuron was locally coupled to gamma at this same theta LFP electrode. This information is listed in Figure 6 – source data 1 which lists the locations and electrode IDs for all neuron pairs shown in figure 6.

      How do the spike latencies reported here depend on the firing rates of the two units?

      To address this question we first tested whether firing rates (averaged across the putative up-stream and down-stream neurons) differ between hits and misses. If they do, this would be suggestive of a dependency of the spike latency differences between hits and misses on firing rates. No such difference was observed (p>0.3). Second, we correlated the differences between hits and misses in Co-firing peak latencies with the differences in firing rates. Again, no significant correlation was observed (R=-0.06; p>0.7), suggesting that firing rates had no influence on the observed differences in co-firing latencies. These control analyses are now reported in the main text.

      Manuscript lines 347-350: “No significant differences in firing rates between hits and misses were found (p>0.3), and on correlations between firing rates and the co-firing latencies were obtained (R=-0.06; p>0.7), suggesting that firing rates had no influence on the observed co-firing differences between hits and misses.”

      What do these results look like for other pairs that are not putative upstream/downstream pairs?

      As we reported in the original manuscript in lines 352-355 we did not find a memory dependent effect on co-firing latencies if we select neuron pairs solely on the basis of distal theta SFC. Within this analysis the distally theta coupled neuron would be the up-stream neuron and the neuron recorded at the site where the theta LFP is coupled would be the down-stream neuron. This null-result suggests that in order for the memory dependent difference in co-firing lags to emerge, the down-stream neurons need to be coupled to a local gamma rhythm in order for the memory effect on co-firing latencies to emerge. However, within this previous analysis there is still a notion of up-stream and down-stream neurons because neuron pairs were selected based on distal theta phase coupling. We therefore repeated this analysis for all pairs of neurons in a completely unconstrained fashion such that all possible pairs of neurons that were recorded from different electrodes were entered into the co-firing analysis. This analysis also revealed no difference in co-firing lags, neither for positive lags nor for negative lags. Instead, what this analysis showed is tendency for hits to show a higher occurrence of simultaneous or near simultaneous firing, which is in line with Hebbian learning. These results are now reported in Figure 6 – figure supplement 1.

      Manuscript lines 333-335: “In addition, a completely unconstrained co-firing analysis where all pairs possible pairings of units were considered also showed no systematic difference in co-firing lags between hits and misses (Figure 6 – figure supplement 1).”

      Reviewer #2 (Public Review):

      Roux et al. investigated the temporal relationship between spike field coherence (SFC) of locally and distally coupled units in the hippocampus of epilepsy patients to successful and unsuccessful memory encoding and retrieval. They show that SFC to faster theta and gamma oscillations accompany hits (successful memory encoding and retrieval) and that the timing of the SFC between local and distal units for hits comports well with synaptic plasticity rules. The task and data analyses appear to be rigorously done.

      Strengths: The manuscript extends previous work in the human medial temporal lobe which has shown that greater SFC accompanies improved memory strength. The cross-regional analyses are interesting and necessary to invoke plasticity mechanisms. They deploy a number of contemporary analyses to disentangle the question they are addressing. Furthermore, their analyses address limitations or confound that can arise from various sources like sample size, firing rates, and signal processing issues.

      Weaknesses:

      Methodological:

      The SFC coherence measures are dependent in part on extracting LFPs derived from the same or potentially other electrodes that are contaminated by spikes, as well as multiunit activity. In the methods, they cite a spike removal approach. Firstly, the incomplete removal or substitution of a signal with a signal that has a semblance to what might have been there if no spike was present can introduce broadband signal time-locked to the spike and create spurious SFC. Can the authors confirm that such an artifact is not present in their analyses? Secondly, how did they deal with the removal of the multiunit activity? It would be suspected that the removal of such activity in light of refractory period violation might be more difficult than well-isolated units, and introduce artifacts and broadband power, again which would spuriously elevate SFC. Conversely, the lack of removal of multiunit activity would seem to for a surety introduce significant broadband power. One way around this might be that since it is uncommon to have units on all 8 of the BF microwires, to exclude the microwire(s) with the units when extracting the LFP to avoid the need to perform spike removal.

      The reviewer raises a valid concern which we address as follows. Firstly, an artefact introduced into SFC by linear interpolation would be a problem for those local SFCs where the spike providing channel and the LFP providing channel are identical. Out of the 53 local spike-SFC couplings found for the gamma frequency range, only 11 (20.75%) were from pairs where the spikes and LFPs come from the identical microwire. It is unlikely that this minority of data would have driven the results. Furthermore, it is unlikely that the interpolation would introduce a frequency shift of SFC that is memory dependent, because the interpolation is more likely to cause a general increase in broadband SFC (as opposed to having a frequency band specific effect). However, to address this concern, we split the data based on whether the spike and LFP providing channels were the same or different. The results show that (i) the spectrogram of SFC is highly similar between the two datasets, with a prominent gamma peak present in both and no significant differences between the two; (ii) restricting the analysis to those data where the LFP and spike providing channels are different replicated the main finding of faster gamma peak frequencies for hits compared to misses.

      Secondly, we followed the reviewer’s suggestion and repeated the SFC analysis for ‘silent’ microwires, i.e. microwires where no single or multi-units were detected. This analysis replicated the same memory effects as observed in the analysis with all microwires. Specifically, we found an increase in the local gamma peak SFC frequency for hits compared to misses, as well as an increase in distal theta peak SFC frequency for hits compared to misses. These results are reported in the main manuscript and in Figure 2 – figure supplement 6 for gamma, and figure 3 – figure supplement 6 for theta.

      Manuscript lines 199-203: “To rule out concerns about possible artifacts introduced by spike interpolation we repeated the above analysis for spike-LFP pairs where the spike and LFP providing channels are the same or different, and for ‘silent’ LFP channels (i.e. channels were no SUA/MUA activity was detected (see Figure 2 – figure supplement 6).”

      Manuscript lines 253-255: “We also repeated the above analysis for spike-LFP pairs by only using ‘silent’ LFP channels (i.e. channels were no SUA/MUA activity was detected (see Figure 3 – figure supplement 6) to address possible concerns about artefacts introduced by spike interpolation.”

      In a number of analyses the spike train is convolved with a Gaussian in places with a window length of 250ms and in others 25ms. It is suspected that windows of varying lengths would induce "oscillations" of different frequencies, and would thus generate results biased towards the window length used. Can the authors justify their choices where these values are used, and/or provide some sensitivity analyses to show that the results are somewhat independent of the window length of the Gaussian used to convolve with the times series.

      The different choices in window length for the Gaussian convolution reflect the different needs of the two analyses where these convolutions were applied. In one analysis we wanted to get a smooth estimate of spike densities that we can average across trials, similar to a peri-stimulus spike histogram. For this analysis we used a window length of 250 ms which we found appropriate to yield a good balance between retaining smooth time courses whilst still being temporally sensitive. Importantly, for the statistical analysis of the firing rates, spike densities were averaged in much larger time windows than 250 ms (i.e. 1 – 2 seconds) therefore our choice of window length for spike densities would not have any bearing on the averaged firing rate analysis.

      In the other analysis, which is more central for our manuscript, we used a cross-correlation between spike trains to estimate co-firing lags in the range of milliseconds. Therefore, this analysis necessitated a much higher temporal precision. We used a Gaussian Window with a width of 25ms because it represents a good balance between integrating over a long-enough time window and thus allowing for some jitter in neural firing between pairs of neurons, whilst still being temporally specific. Finding the right balance here is not trivial because a too short time window would be prone to noise and underestimates co-firing, whereas a too long time window may not provide the temporal specificity necessary to detect co-firing lags (Cohen and Kohn, 2013; Nat Neurosci). To test whether this choice critically affected our results, we repeated the analysis for different window sizes, i.e. 15, 35, and 45 ms. The results show that the basic pattern of results did not change, with hits showing earlier peaks in co-firing compared to misses. Critically, the difference in co-firing peaks was significant for all window sizes, except for the shortest one which is likely due to the increase in noise because of the smaller time window over which spikes are integrated. These issues are now mentioned in the methods section, and the results for the different window sizes are reported in Figure 6 – figure supplement 4.

      Manuscript lines 346-347: “The co-firing analyses were replicated with different smoothing parameters (see Figure 6 – figure supplement 4).”

      Manuscript lines 894-898: “We chose this time window because it should represent a good balance between integrating over a long-enough time window and thus allowing for some jitter in neural firing between pairs of neurons, whilst still being temporally specific (57). To test whether this choice critically affected our results, we repeated the analysis for different window sizes, i.e. 15, 35, and 45 ms (see Figure 6 – figure supplement 4).”

      Conceptual:

      The co-firing analyses are very interesting and novel. In table S1 are listed locally and distally coupled neurons. There are some pairs for example where the distally coupled neuron is in EC and the downstream one in the hippo, and then there is a pair that is the opposite of this (dist: hippo, local EC). There appear to be a number of such "reversal", despite the delay between these two regions one would assume them to be similar in sign and magnitude given the units are in the same two regions. It seems surprising that in two identical regions of the hippo the flow of information or "causality", could be reversed, when/if one assumes information flows through the system from EC to hippo. This seems unusual and hard to reconcile given what is known about how information flows through the MTL system.

      The reviewer is correct that the spike co-firing analysis suggests a bi-directional flow of information between the hippocampus and surrounding MTL regions (e.g. entorhinal cortex; see Figure 6 – figure supplement 3). However, this bi-directional flow of information is not incompatible with neuroanatomy and the memory literature. The entorhinal cortex serves as an interface between the hippocampus and the neocortex with superficial layers providing input into the hippocampus (via the perforant pathway), and the deeper layers receiving output from the hippocampus (van Strien et al., 2009; Nat Rev Neurosci). Therefore, on a purely anatomical basis we can expect to see a bi-directional flow of information between the hippocampus and the entorhinal cortex, albeit in different layers. Importantly, reversals as shown in our Figure 6 – source data 1 involved different microwires and therefore different neurons (i.e. the entorhinal unit in row 1 was recorded from microwire 3, whereas the entorhinal unit in row 2 was recorded from microwire 8). It is conceivable that these two neurons correspond to different layers of the entorhinal cortex and therefore reflect input vs. output paths. Moreover, studies in humans demonstrated that successful encoding of memories depends not only on the input from the entorhinal cortex into the hippocampus, but also on the output of the hippocampal system into the entorhinal cortex, and indeed on the dynamic recurrent interaction between these input and output paths (Maass et al. 2014; Nat Comms; Koster et al., 2018; Neuron). Our bi-directional couplings between hippocampal and surrounding MTL regions (such as the EC) are in line with these findings. We have added a discussion of this issue in the discussion section.

      Manuscript lines 447-452: “Notably, the neural co-firing analysis indicates a bidirectional flow of information between the hippocampus and surrounding MTL areas, such as the entorhinal cortex (see Figure 6 – figure supplement 3; Figure 6 – source data 1). This result parallels other studies in humans showing that successful encoding of memories depends not only on the input from surrounding MTL areas into the hippocampus, but also on the output of the hippocampal system into those areas, and indeed on the dynamic recurrent interaction between these input and output paths (43, 44).”

    1. Author Response

      Reviewer #3 (Public Review):

      Canetta et al have characterized the developmental regulation of PV neurons in PFC. The experiments have been carefully conducted and even though this is an area of broad scientific interest, there are several issues that require consideration.

      1) The dosing regime of the CNO that has been employed will not provide persistent inhibition. Inhibition will operate on a 16 hr on/ 8 hr off cycle. Under such circumstances, it will be very difficult to rule out interspersed inhibition-related artifacts.

      Our approach of twice daily injections of CNO is consistent with that of other publications that have used similar chemogenetic approaches to chronically alter activity1-3. However, the reviewer is correct that our twice daily CNO injection protocol may only intermittently inhibit PV cells, and it is possible that persistent inhibition might result in even stronger behavioral and circuit effects. However, it is also possible that more continuous CNO administration could lead to hM4DGi desensitization. Given these caveats, we respectfully submit that repeating all the experiments under conditions that would allow constant chronic dispensation of CNO (such as implantation of minipumps) is an excellent future experiment but currently outside the scope of this manuscript.

      2) The second major issue with the dosing regime is that it is long (35 days). Realizing that the development of PFC circuitry is complex but at P90, the animals will have been dosed for more than a third of their lives. How can the authors rule out compensatory changes that do not have anything to do with critical periods?

      In future studies we hope to refine the timing of the developmental window mediating these long-term effects by comparing inhibiting during shorter developmental periods. Our current studies demonstrate that a 35-day window of inhibition during development, but not during adulthood, leads to long-lasting effects on behavior and prefrontal network function. If the developmental manipulation is more impactful because the manipulation represents a longer proportion of the animals’ lifetime, we would expect that the effect should wane as the animal gets older. However, for the rescue experiments that take place at P120 and P130, rather than P90, we still find that the Dev Inhibition animals are impaired, suggesting that it is not the proportion of the animals’ lifetime that has been inhibited, but the timing during which this inhibition occurs, that matters most.

      3) To this point, in the discussion first para line 8 - please change "transient" to something more suitable to reflect the duration of treatment.

      We have replaced transient with reversible.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well performed study to demonstrate the antiviral function and viral antagonism of the dynein activating adapter NINL. The results are clearly presented to support the conclusions.

      This reviewer has only one minor suggestion to improve the manuscript.

      Add a discussion (1) why the folds of reduction among VSV, SinV and CVB3 were different in the NINL KO cells and (2) why the folds of reduction of VSV in the NINL KO A549 and U-2 OS cells.

      Thank you for this suggestion. We have amended the results section to include additional information about these observations and possible explanations for these results.

      Reviewer #2 (Public Review):

      This manuscript is of interest to readers for host-viral co-evolution. This study has identified a novel human-virus interaction point NINL-viral 3C protease, where NINL is actively evolving upon the selection pressure against viral infect and viral 3Cpro cleavage. This study demonstrates that the viral 3Cpros-mediated cleavage of host NINL disrupts its adaptor function in dynein motor-mediated cargo transportation to the centrosome, and this disruption is both host- and virus-specific. In addition, this paper indicates the role of NINL in the IFN signaling pathway. Data shown in this manuscript support the major claims.

      In this paper, the authors have identified a novel host-viral interaction, where viral 3C proteases (3Cpro) cleave at specific sites on a host activating adaptor of dynein intracellular transportation machinery, ninein-like protein (NINL or NLP in short) and inhibit its role in the antiviral innate immune response.

      The authors firstly found that, unlike other activating adaptors of dynein intracellular transportation machinery, NINL (or NLP) is rapidly evolving. Thus, the authors hypothesized that this rapid evolution of NINL was caused by its interaction with viral infection. The authors found that viruses replicated higher in NINL knock-out (KO) cells than in wild-type (WT) cells and the replication level was not attenuated upon IFNa treatment in NINL KO cells, unlike in WT cells. Next, the authors investigated the role of NINL in type I IFN-mediated immune response and found that the induction of Janus kinase/signal transducer and activation of transcription (JAK/STAT) genes were attenuated in NINL KO cells upon IFNa treatment. The author further showed that the reduction of replication IFNa sensitive Vaccinia virus mutant upon IFNa treatment was decreased in NINL KO A549 cells compared to WT cells. The authors further showed that the virus antagonized NINL function by cleaving it with viral 3Cpro at its specific cleavage sites. NINL-peroxisome ligation-based cargo trafficking visualization assay showed that the redistribution of immobile membrane-bound peroxisome was disrupted by cleavage of NINL or viral infection.

      This paper has revealed a novel host-virus interaction, and an antiviral function of a rapidly evolving activating adaptor of dynein intracellular transportation machinery, NINL. The major conclusions of this paper are well supported by data, but several aspects can be improved.

      1) It would be necessary to include a couple of other pathways involved in innate immune response besides JAK/STAT pathway.

      We are very interested in this question as well. Our RNAseq data (Supplementary file 4 and Figure 3 – Figure supplement 4) suggest that there are several transcriptional changes that result from NINL KO. Our goal in this manuscript was to focus on IFN signaling in order to understand this specific effect of NINL KO since it might have wide-ranging consequences on viral replication. While we agree that broadening our studies to other signaling pathways, including other pathways involved in innate immune response, is a good idea, we feel that those experiments would take longer than two months to perform and therefore fall outside of the scope of this paper.

      2) The in-cell cleavages of NINL by viral 3Cpros were well demonstrated and supported by data of high quality. A direct biochemical demonstration of the cleavage is needed with purified proteins.

      We agree with the reviewer that a direct biochemical cleavage assay would further demonstrate that viral 3Cpros cleave NINL specifically. However, our attempts to purify full-length NINL have been unsuccessful due to solubility issues (see example gel below), which is not surprising given that NINL is a >150 kDa human protein that has multiple surfaces that bind to other human proteins. As such, we focused our efforts on in-cell cleavage assays using specificity controls for cleavage. Specifically, we used catalytically inactive CVB3 3Cpro to show a dependence on protease catalytic activity and a variety of NINL constructs in which the glutamine in the P1 position is replaced by an arginine to show site specificity of cleavage. Notably, the cleavage sites in NINL that we mapped using this mutagenesis were predicted bioinformatically from known sites of 3Cpro cleavage in viral polyproteins, further indicating that cleavage is 3Cpro-dependent. We believe these results thus demonstrate that cleavage of NINL is dependent on viral protease activity and occurs in a sequence-specific manner. In light of the difficulty of purifying full-length NINL that would make biochemical experiments very challenging and likely take longer than two months to perform, we believe that our in cell data should be sufficient to demonstrate activity-dependent site-specific cleavage of NINL by viral 3Cpros.

      Sypro stained SDS-PAGE gel showing supernatant (S) and insoluble pellet (P) fractions across multiple purifications with altered buffer conditions.

      3) The author used different cell types in different assays. Explain the rationale with a sentence for each assay.

      Throughout this work, we choose to use a variety of cell lines for specific purposes. A549 cells were chosen as our main cell line as they are widely used in virology, are susceptible to the viruses we used, are responsive to interferon, and express both NINL and our control NIN at moderate levels. In the case of our virology and ISG expression data, we performed the same experiments with NINL KOs in other cell lines confirm that the phenotypes we observed in A549 cells could be attributed to the absence of NINL rather than off-target CRISPR perturbations or cell-line specific effects. All cleavage experiments were performed in HEK293T for their ease of transfection and protein expression. The inducible peroxisome trafficking assays were performed in U-2 OS cells as their morphology is ideal for observing the spatial organization of peroxisomes via confocal microscopy, and based on the fact that we had recapitulated the virology results and ISG expression results in those cells. At the suggestion of the reviewer, we have amended the text to include rationales where appropriate.

      4) While cell-based assays well support the conclusions in this paper, further demonstration in vivo would be helpful to provide an implication on the pathogenicity impact of NINL.

      We agree. However, we believe that examining the impact of the loss of or antagonism of NINL on the pathogenesis of infectious diseases in an in vivo model is outside the scope of this study.

      In summary, this manuscript contributes to a novel antiviral target. In addition, it is important to understand the host-virus co-evolution. The use of the evolution signatures to identify the "conflict point" between host and virus is novel.

    1. Author Response

      Reviewer #3 (Public Review):

      This paper is based on digital reconstruction of a serial EM stack of a larva of the annelid Platynereis and presents a complete 3D map of all desmosomes between somatic muscle cells and their attachment partners, including muscle cells, glia, ciliary band cells, epidermal cells and specialized epidermal cells that anchor cuticular chaetae (chaetal follicle cells) and aciculae (acicular follicle cells). The rationale is that the spatial patterning of desmosomes determines the direction of forces exerted by muscular contraction on the body wall and its appendages will determine movement of these structures, which in turn results in propulsion of the body as part of specific behaviors.

      To go a step further, if connecting this desmosome connectome with the (previously published) synaptic connectome, one may gain insight into how a specific spatio-temporal pattern of motor neuron activity will lead, via a resulting pattern of forces caused by muscles, to a specific behavior. In the authors' words: "By combining desmosomal and synaptic connectomes we can infer the impact of motoneuron activation on tissue movements".This is an interesting idea which has the potential to make progress towards understanding in a "holistic" way how a complex neural circuitry controls an equally complex behavior. The analysis of the EM data appears solid; the authors can show convincingly that desmosomes can be resolved in their EM dataset; and the technology used to plot and analyze the data is clearly up to the task. My main concern is with the way in which the desmosome pattern is entered in the analysis, which I think makes it very difficult to extract enough relevant information from the analysis that would reach the stated goal.

      1) The context of how different structures of the Platynereis larval body, by changing their position, move the body needs much more introduction than the short paragraph given at the end of the Introduction.

      -My understanding is that the larval body is segmented, and contraction of the segments can cause a certain type crawling or swimming: does it? Do the longitudinal muscles, for example, insert at segment boundaries, and alternating contraction left-right cause some sort of "wiggling" or peristalsis?

      Longitudinal muscles do not insert only at segment boundaries, but have desmosomal connections along the entire length of the cell. Individual longitudinal muscle cells can span up to 3 segments. However the cells are staggered in such a way that all longitudinal muscle cells with somas in one segment can collectively cover up to 4 segments. Longitudinal muscles are involved in turning when swimming (Randel et al., 2014). The undulatory trunk movements and parapodial walking movements are due to the contraction of oblique and parapodial muscles. The longitudinal muscles provide support during crawling (via desmosomal links) but it is unlikely that these muscles contract segmentally. Disentangling the distinct contributions of 53 types of muscles during crawling will require further studies.

      -In addition, there are segmental processes (parapodia, neuropodia), and embedded in them are long chitinous hairs (Chaetae, Acicula). Do certain types of the muscles described in the study insert at the base of the parapodia/neuropodia (coming from different angles), such that contraction would move the entire process, including the chaetae/acicula embedded in their tips?

      Yes, acicular muscles insert at the proximal base of the acicula, and by moving the acicula they move the entire noto-/neuropodia. We have presented the anatomy of all acicular and chaetal muscles types in the figures and videos.

      -Or is it that only these chaetae/acicula move, by means of muscles inserting at their base (the latter is clearly part of the story)? Or does both happen at the same time: parapodium moves relative to the trunk, and chaeta/acicula moves relative to the parapodium? How would these movements lead to different kind of behaviors?

      -Diagrams should be provided that shed light on these issues.

      We have extended Video2 to show individual muscles and their relation to the aciculae in one of the parapodia. We also clarified this in the text:

      “Several acicular muscles attach on one end to the proximal base of the aciculae and on the other end to the paratrochs and epidermal cells. Oblique muscles attach to the basal lamina, epidermal and midline cells at their proximal end, run along the anterior edge of parapodia and attach to epidermal and chaetal follicle cells at their distal tips. Both of these muscle groups are involved in moving the entire parapodium. Acicular muscles move the proximal tips of the aciculae, while oblique muscles move the parapodium by moving the tissue around the chaetae and the aciculae. All acicular movements also correspond to parapodial movements. Chaetae are embedded in the parapodium and therefore move with it, but the chaetal sac muscles can also independently retract the chaetae into the parapodium or protract them and make them fan out.”

      2) The main problem I have with the analysis is the way a muscle cell is treated, namely as a "one dimensional" node, rather than a vector.

      -In the current state of the analysis, the authors have mapped all desmosomes of a given muscle cell to its attached "target" cell. But how is that helpful? The principal way a muscle cell acts is by contracting, thereby pulling the cells it attaches to at its two end closer together. As the authors state (p.4) "...desmosomes..are enriched at the ends of muscle cells indicating that these adhesive structures transmit force upon muscle-cell contraction."

      At the level of the current analysis our data reveal which cells may be moved by the contractions of the individual muscle cells. The reviewer is right that treating a muscle as a vector (or set of vectors) would be a more accurate description, which would potentially also open up the possibility of computational modelling. We have provided such a vectorised dataset in the revised version, where each muscle-cell skeleton is subdivided into short linear segments (Figure2–source–data 2). This dataset may be useful to approach the problem with a three dimensional approach, which is beyond the scope of the current analysis. We also included an additional video (Video 7) showing examples of muscles and their partners where the cells and the desmosomes connecting them are highlighted. This reveals that the desmosomes connecting two cells are often at the very end of the muscle cell.

      -for that reason, the desmosomes at the muscle tips have to be treated as (2) special sets. Aside from these tip desmosomes there are other desmosomes (inbetween muscles, for example), but they (I would presume) have a very different function; maybe to coordinate muscle fiber contraction? Augment the force caused by contraction?

      Desmosomes between muscles only occur between muscles of different types, not for homotypic connections. There are other types of junctions (adhaerens-like junctions) that connect individual cells of a muscle bundle together (not analysed here). We clarified this in the text.

      • As far as I understand for (all of) the desmosome connectome plots, there is no differentiation made between desmosome subsets located at different positions within the muscle fiber. I therefore don't see how the plots are helpful to shed light on how the multiplicity of muscles represented in the graphs cause specific types of neurons.

      We would like to point out that the cells and structures that muscles connect to via desmosomes are very likely the parts of the body that will move during the contraction of the muscle or will provide structural support (e.g. basal lamina) for the muscle cell to contract. This is most evident in the parapodial complex. The majority of muscles in the body connect to the aciuclar folliclecells and the aciculae are the most actively moving parts in the body during crawling (see Video 4). In any case, since we provide all skeleton reconstructions and the xyz coordinates of all desmosomes, the data could be further analysed following these suggestions by the reviewer.

      • As it stands these plots "merely" help to classify muscles, based on their position and what cell type they target: but that (certainly useful) map could have probably also be achieved by light microscopic analysis.

      This has never been achieved by light microscopy analysis in the hundreds of papers on invertebrate muscle anatomy (e.g. by phalloidin staining). For an LM analysis, it would not be sufficient to label the muscle fibres, but one would also need to label the desmosomes and a multitude of non-muscle cell types including the extent of their cytoplasm. This is technically very challenging (we would nevertheless be happy to hear specific suggestions for markers etc. from the Reviewer). Currently, only EM provides the required depth of structural information and resolution. This is why we believe that our dataset and analysis is unique, despite over a century of research in invertebrate anatomy.

      3) Section "Local connectivity and modular structure of the desmosomal connectome" p.4-7" undertakes an analysis of the structure of the desmosome network, comparing it with other networks.

      -What is the rationale here? How do the conclusions help to understand how the spatial pattern of muscles and their contraction move the body?

      We hope that our analysis may also be of interest to the community of network scientists and we believe that the reconstruction of a quite large and novel type of biological network warrants a more quantitative network analysis, using the standard methods and measures of network science – as we presented e.g. in Figure 4 – even if these mathematical analyses may not directly reveal how muscles move the body. We hope that some readers with an interest in quantitative analyses will also appreciate the broader picture here.

      -Isn't, on the one hand (given that position of the desmosome was apparently not considered), the finding that desmosome networks stand out (from random networks) by their high level of connectivity ("with all cells only connecting to cells in their immediate neighbourhood forming local cliques") completely expected?

      We disagree that the result was completely expected. Even if this was the case, we think it is quite different to say that a result is expected or to thoroughly quantify certain parameters and mathematically characterise key properties of the desmosomal graph (as we have done). These network analyses help to conceptualise our findings and to think about the muscle system in more global, whole-body terms.

      -On the other hand, does this reflect the reality, given that (many?) muscle cells are quite long, connecting for example the anterior border of a segment with the posterior border.

      Indeed, a quantitative analysis helped us to identify cases where the reality deviated somewhat from what was completely expected, and we thank the reviewer for these comments. As we explain in the revised version, some longitudinal muscles show an unexpected position in the force-field layout of the graph, due to their long-range connections. We have added extra clarifications to the text: “To analyse how closely the force-field-based layout of the desmosomal connectome reflects anatomy, we coloured the nodes in the graph based on body regions (Figure 5). In the force-field layout, nodes are segregated by body side and body segment. Exceptions include the dorsolateral longitudinal muscles (MUSlongD) in segment-0. These cells connect to dorsal epidermal cells that also form desmosomes with segment-1 and segment-2 MUSlongD cells. These connections pull the MUSlongD_sg0 cells down to segment-2 in the force-field layout (Figure 5D).”

      1. In the section "Acicular movements and the unit muscle contractions that drive them" the authors record movement of the acicula and correlate it with activity (Ca imaging) of specific muscle types. This study gives insightful data, and could be extended to all movements of the larva.

      -The fact that a certain muscle is active when the acicula moves in a certain direction can be explained (in part) by the "connectivity": as shown in Fig.7L, the muscle inserts at a acicular follicle cell on the one side, and to an epithelial (epidermal?) cell and the basal lamina on the other side. But how meaningful is a description at this "cell type level" of resolution? The direction of acicula deflection depends on where (relative to the acicula base) the epithelial cell (or point in the basal lamina) is located. This information is not given in the part of the connectome network shown in Fig.7L, or any of the other graphs.

      This information is indeed not shown in the graphs, where each cell is treated as a node. However, we provide this information in the detailed anatomical figures in Figure 6 – figure supplement 1-3 and Video 7, where the individual acicular and oblique muscle types are visualised. In principle, one could subdivide aciculae into e.g. proximal and distal halves and derive a more detailed network. We have not done this but since all the EM, anatomical rendering and connectivity data are available in our public CATMAID server (https://catmaid.jekelylab.ex.ac.uk/), we hope that the interested readers will be able to further analyse the data.

      We renamed ‘epithelial’ cells to ‘epidermal’ cells.

    1. Author Response

      Reviewer #1 (Public Review):

      Using a large neonatal dataset from the developmental Human Connectome project, Li and colleagues find that cortical morphological measurements including cortical thickness are affected by postnatal experience whereas cortical myelination and overall functional connectivity of ventral cortex developed significantly were not influenced by postnatal time. The authors suggest that early postnatal experience and time spent inside the womb differentially shape the structural and functional development of the visual cortex.

      The use of large data set is a major strength of this study, furthermore an attempt to examine both structural and functional measures, and connectivity analysis and separating these analyses based on the pre-and full-term infants is impressive and strengthens the claims made in the paper. While I find this work theoretically well-motivated and the use of the large dHCP dataset very exciting, there are some concerns, that need to be addressed.

      There is a bit of confusion if the authors really compared the structural-functional measures in the final analysis. If the authors wish to make claims about the relationship, then there must be a compelling analysis detailing these findings.

      Thanks for the suggestions. We have added analysis to directly investigate the relationship between the development of homotopic connection and corresponding structural measurements in the area V1 (Page 13 Line 5-16):

      “The above results revealed that structural and functional properties of the ventral visual cortex both developed with PMA, but were differently influenced by the in-utero and external environment (Table 1). We further investigated the relationship between structural and functional development based on area V1, which showed a strong developmental effect in both structural and functional analyses. Mediation analysis was employed to see whether the development (GA or PT) of the homotopic connection between bilateral V1 was mediated by the structural properties (CT or CM). We found that the PT had a significant direct effect on the homotopic function that was not mediated by CT or CM (Fig 6a-b). In contrast, the direct effect of GA on the homotopic connection was not significant but the indirect effect of GA through CM on the connection was significant (Fig 6c-d).”

      There is also a bit of confusion in the terminology used in the study regarding ages; the gestational age, premenstrual age, and postnatal time. I think clarifying and simplifying it down to GA and postnatal time will help the reader and avoid confusion.

      Thank you for the suggestion. We have made extensive revision regarding the terminology throughout the paper and simplified it down to GA and PT. Please see the response to the 1st major concern in the Essential Revisions (for the authors) section above.

      *Reviewer #2 (Public Review):

      The authors utilize the publicly available dHCP dataset to ask an interesting question: how does postnatal experience and prenatal maturation influence the development of the visual system. The authors report that experience and prenatal maturation differentially contribute to different aspects of development. Namely, the authors quantify cortical thickness, myelination, and lateral symmetry of function as three different metrics of development. The homotopy and preterm infant analyses are strengths that, on their own, could have justified reporting. However, I have concerns about the analytic approaches that were used and the conclusions that were drawn. Below I list my major concerns with the manuscript.

      PMA vs. GA vs. PT

      The authors seek to understand the contribution of experience and prenatal development, yet I am unsure why the authors focused on the variables they did. There are three variables of interest used throughout this study: Gestational age at birth (GA), postnatal time (PT), and postmenstrual age at the time of scan (PMA). The last metric, PMA, is straightforwardly related to GA and PT since PMA = GA + PT. In most (but not all) of the manuscript, the authors use PMA and PT, with GA used without justification in some cases but not in others.

      It is unclear why PMA is used at all: PMA is necessarily related to PT and GA, making these variables non-independent. Indeed, the authors show that PMA and PT are highly correlated. The authors even say that "the contribution of postnatal experience to the development was not clarified because PMA reflects both prenatal endogenous effect and postnatal experience." So, why not use GA at birth instead of PMA? Clearly, GA is appropriate in some cases (e.g., Figure S4 or in some of the ANOVA applications), and to me, it seems to isolate the effect the authors care about (i.e., duration of prenatal development). Perhaps there is some theoretical justification for using PMA, but if so, I am unaware.

      That said, I expect that replacing all analyses involving PMA with GA will substantially change the results. I do not see this as a bad thing as I think it will make the conclusions stronger. As is, I am left unsure about what the key takeaways of this paper are.

      We appreciate the suggestions, and we have replaced the related analyses involving PMA with GA in the manuscript. Please see the Response to the 1st major concern in the Essential Revisions (for the authors) section above for more detail.

      Using GA instead of PMA will have several benefits: 1) It will be much simpler to think of these two variables since they contrast the duration of fetal maturation and time postnatally. 2) This will help the partial correlation analyses performed since the variance between the variables is more independent. It will also mean that the negative relationships observed between PT and cortical thickness when controlling for PMA (e.g., Figure 2h) might disappear (reversed signs for partial correlations are common when two covariates are correlated). 3) this will allow the authors to replace Figure 1a with a more informative plot. Namely, they could use a scatter of GA and PT, giving insight into the descriptive statistics of both dimensions.

      We have revised the manuscript throughoutly following the reviewer’s suggestion. However, we thought it would be necessary to show the overall development of CT and CM across the general age (PMA) in Figure 1. Therefore, we didn’t replace the figure 1a but added a scatter figure between GA and PT in Figure 2-figure supplement 1 and added descriptive statistics of them in the manuscripts: “The mean GA of the neonates was 39.93 weeks (SD = 1.26) and the mean PT was 1.21 weeks (SD = 1.25), the correlation between them was not significant (r = - 0.08, p > 0.1; Figure 2-figure supplement 1).” Moreover, the negative relationships between PT and CT when controlling for PMA disappeared in the revised results as the reviewer’s predicted.

      I suspect that one motivation for the use of PMA over GA is for the analysis in Figure 6. In this analysis, the authors pick a group of term infants with a PMA equal to the preterm infants. Since PMA is the same, the only difference between the groups (according to the authors) is the amount of postnatal experience. However, this is not the only difference between the groups since they also vary in GA (and now PT and GA are negatively correlated almost perfectly). I don't know how to interpret this analysis since both the amount of prenatal maturation and postnatal experience vary between the groups.

      We appreciate the reviewer’s opinion that both GA and PT were different between preterm and term-born neonates. Then any of the differences between the two groups might came from the combined effect of GA and PT in our results, and unfortunately, we might not able to separate them in this analysis. However, the preceding results indicated that the CT was significantly influenced by PT and GA while CM was significantly influenced by GA, which So we discuss the preterm and term-born comparison in the context of these findings (Page 19 Line 26-29 and Page 20 Line 1-5): “We found CT in the ventral cortex was generally lower in the term-born than preterm-born infants, while the CM showed the opposite trend in the two groups. Since the preterm babies have longer PT but shorter GA compared to full-term infants at the same PMA, this result supported the above analysis that CT was preferably influenced by PT while CM was largely dependent on GA during the neonatal period”. Furthermore, we added a description in the limitation section to stress the caveat (Page 20 Line17-19): “Meantime, both GA and PT were different between preterm and term-born neonates. Then any of the differences between the two groups might came from the combined effect of GA and PT, and unfortunately, we were not able to separate them in this study.”

      Justification of conclusions and statistical considerations

      I had concerns about some of the statistical tests and conclusions that the authors made. I refer to some of these in other sections (e.g., the homotopy analyses), but I raise several here.

      I am not sure what evidence the authors are using to make this claim: "we found that the cortical myelination and overall functional connectivity of ventral cortex developed significantly with the PMA but was not directly influenced by postnatal time." Postnatal time is significantly correlated with cortical myelination, as shown in Figures 2g, 2h, 3b, 3c, and postnatal time is significantly correlated with functional connectivity, as shown in Figures 4h, 5c, 5d, and 5e. Hence, this general claim that "the development of CT was considerably modulated by the postnatal experience while the CM was heavily influenced by prenatal duration" doesn't seem to be supported: both myelination and thickness are affected by postnatal experience and prenatal duration (as measured by PMA). A similar sentiment is expressed in the abstract. Perhaps the authors suggest different patterns in the strength of change for PMA vs. PT across these metrics, but if so, then statistical tests need to support that conclusion, and the claims need to reflect that sentiment.

      Interestingly, Figure S4 presents a compelling ANOVA that does support this conclusion. Still, this result is relegated to the supplement, and it also uses GA, rather than PMA, making it hard to reconcile with the other claims made in the main text. Moreover, it uses ANOVAs, which dichotomizes a continuous variable. Here and elsewhere in the manuscript (e.g., Figures 3d, 3e), the authors split the infants into quartiles and compare them with ANOVAs. Their use for visualization is helpful, but it is unclear what the statistical motivation for this is rather than treating these as continuous variables like is possible with linear mixed-effects models. Moreover, it is unclear why the authors excluded half the data from the study (i.e., quartiles 2 and 3) in this ANOVA when all four quartiles could be used as factors.

      We appreciate the reviewer’s comments. We have clarified our results and conclusion in the revised manuscript based on the new analyses that replaced PMA with PT and GA (See the response to the 1st major concern in the Essential Revisions). The previous claims have been changed as following:” the postnatal time could modulate the cortical thickness in ventral visual cortex and the functional circuit between bilateral primary visual cortices. But the cortical myelination, particularly that of the high-order visual cortex, developed without significant influence of postnatal time in such early period” (Page 2, Lines 8-12). This claims could be supported by the results in figure 2. Moreover, to support the claims about the comparison of the influence between GA and PT on structural development, we replaced the ANOVA analysis with a linear mixed-effect model as the reviewer mentioned.

      1) To compare the influence of GA against PT on the structural development in the whole ventral visual cortex (Page 7 Line 15-19), “We applied a linear mixed-effect model to test whether the CT (or CM) of the whole ventral cortex were differently influenced by the GA vs. PT, and found that the GA had a significantly stronger effect on the CM than PT (interaction between GA and PT, p < 0.05) but no significant difference was found of the effect on the CT between the ages (p > 0.6).”

      2) To compare the influence of GA against PT on the structural development in the area V1 and VOTC, we applied a similar linear mixed-effect model analysis for the two ROIs (Page 8 Line 17-18 and Page 9 Line 1-4): “Moreover, we applied a linear mixed-effect model to test the developmental influence of GA vs. PT on the cortical structure , and the results showed that the CT in two ROIs showed non-significantly different influences from GA against PT (p > 0.3), but CM showed at least marginally significant results in both two ROIs (V1: p < 0.01 and VOTC: p < 0.09).”

      It is unclear what the evidence is to support the following claim: "Both CT and CM show higher correlation with PMA in the posterior than anterior region, and higher correlation in the medial than lateral part within the anatomical mask (Figure 2a and Figure S2b-c [sic])" From Figure 2 or Figure S2, I don't see a gradient. From Figure S3, there might be a trend in some plots, but it is hard to interpret since it is non-monotonic. More generally, is there a statistical test to support this claim?

      We added a correlation analysis between the diction (x: lateral to medial; y: posterior to anterior) and measurements (CT and CM) in the ventral visual cortex, and the resulting coefficient was all significant (r = 0.7/-0.8 for CT along x/y axis, and r = 0.91/-0.83 for CM along x/y axis; p < 0.001). See Figure 1-figure supplement 2. However, the consideration provided by the reviewer still exists that such significance was driven by part of the areas and the gradient was non-monotonic. Therefore, we replaced the original claim with the following sentence (Page 6 Line 3-8): “In addition, we found distinct spatial variation along ventral cortex, e.g. posterior-anterior and medial-lateral directions (Figure 1-figure supplement 2a-b). Generally, both CT and CM showed higher correlation with PMA in the posterior than anterior region (r = -0.8 and -0.83; p < 0.001), and higher correlation in the medial than lateral part within the ventral visual cortex (r = 0.7 and 0.91; p < 0.001; Figure 1-figure supplement 2c-d).”.

      "and the interaction [sic] was more prominent in CM (simple effect: t = 10.98, p < 10-9) that in than CT (t = 2.07, p < 0.05)." Does 'more prominent' mean it is 'significantly stronger'? If not, then the authors should adjust this claim

      The claim ‘more prominent’ did express ‘significantly stronger’ since we found that the interaction between CM and CT along PMA or PT was significant in the ANOVA analysis. This analysis has been removed because we thought that the comparison between two structural measurements is not very relevant to the conclusion of the paper. We now applied a linear mixed-effect model to compare the influence of GA against PT on specific structural development. So this result and claim have been removed from the new manuscript.

      Are the authors Fisher Z transforming their correlations? In numerous places, correlation values seem to be added together or used as the input to other correlation analyses. It is unclear from the methods whether the authors are transforming their correlation values to make that use appropriate.

      We are sorry for the confusion. All the statistical analyses involving correlation coefficients were Fisher-Z transformed. We have added a clear description in the manuscripts involving the Fisher-Z transformation (Page 25 Line 16-18).

      Homotopy analyses

      The homotopy section is a strength of the paper, but I have doubts about the approach taken to analyze this data and some of the conclusions drawn. I don't expect any of my suggestions to change the takeaway of this section, but I do think they are essential criticisms to address.

      I do not think that the non-homotopic control condition is appropriate. In Arcaro & Livingstone (2017), the authors had 3 categories for this analysis: homotopic pairs (e.g., left V1 vs. right V1), adjacent pairs (e.g., left V1 vs. right V2), and distal pairs (e.g., left V1 vs. right PHA1). In the homotopy analysis performed by Li and colleagues, they compare homotopic pairs with all other pairs. I don't think that is generous to the test since non-homotopic pairs include adjacent pairs that should be similar and distal pairs that shouldn't be similar. This may explain why some non-homotopic distribution overlaps with the homotopic distribution in Figure 4c.

      Thanks for these suggestions. In the revised manuscript, we reanalyzed the data by dividing the connections into three groups for each subject. See Page 26 Line 24-29: “For each subject, Pearson correlations were carried out on the ROI-averaged time series within and across the left and right ventral cortex. The resulting connections were divided into three groups, namely the homotopic connection (the connection between two paired areas in two hemispheres. e.g. right and left V1), adjacent connection (e.g., right V1 and left V2 since V1 and V2 are adjacent) and distant connections (two areas that were not the paired or adjacent)”.

      Regardless of this decision, I think the authors should reconsider their statistical test. I think the authors are using a between samples t-test to compare the 34 homotopic pairs with the hundreds of non-homotopic pairs. This is statistically inappropriate since the items are not independent (i.e., left V1 vs. right V1 is not independent of left V1 vs. right V2, which is also not independent of left V3 vs. right V2). This means the actual degrees of freedom are much lower than what is used. Moreover, I am unsure how the authors do this analysis across participants since this test can be done within participants. The authors should clarify what they did for this analysis and justify its appropriateness.

      Thank you for the suggestion. In the previous manuscript, we first averaged the connection matrix across subjects and then calculated the homotopic (or non-homotopic) connections between areas, and therefore, statistical analysis could not be performed. In the revised paper, we calculated the three groups of connections for each subject before the average. We applied a non-parameter statistical analysis (Wilcoxon signed-rank) to address the issue of the independent comparison among the connections, and found the homotopic connections were significantly stronger than the adjacent or distant connections.

      See (Page 26 Line 29 and Page 27 Line 1-3): “Independent-sample T-test was used to test whether the homotopic correlation was significantly greater than zero across subjects. To compare the correlation among the three types of connections, we applied a non-parameter statistical analysis (Wilcoxon signed-rank) across subjects”.

      The results showed that (Page 9 Line 17-21) “the homotopic connections in all ROIs of ventral cortex were significant (mean r = 0.13– 0.43, t > 12.87, s < 10-9; Fig 4a-b), and were significantly higher than adjacent connections (0.29 ± 0.12 vs. 0.19 ± 0.10, Wilcoxon signed rank test on the Fisher-Z transformed r value: z = 16.32, p < 10-9) and distal connections (0.04 ± 0.06, z = 16.32, p < 10-9; Fig. 4c)”.

      Could the authors speculate on why the correlations in homotopic regions are so much lower than what Arcaro and Livingstone (2017) found. I can think of a few possibilities: higher motion in infants, less rfMRI data per participant, different sleep/wake states, and different parcellation strategies. Regarding the last explanation, I think this is a real possibility: the bilateral correlation may be reduced if the Glasser atlas combines functionally heterogeneous patches of the cortex. Hence, the authors should consider this and other possible explanations.

      Thank you for the suggestion. The neonates included in this study were all under natural sleep during the scan, so sleep/wake states would not be one of the causes. We added some possible reasons for this difference following the related results (Page 19 Line 9-13): “However, the present homotopic connections in the human neonates were lower than those in neonate macaca mulattas (Arcaro and Livingstone, 2017). This difference might relate to the higher motion in human infants, less r-fMRI data in the present study, coarser parcellation in the visual cortex used in this work, and the developmental difference between primates and humans in the neonatal period.”

      The authors assume that the homotopic analyses mean that there are lateral connections between hemispheres (e.g., "Furthermore, the connections among the ventral visual cortex have developed during this early stage. Specifically, the homotopic connections between bilateral V1 and between bilateral VOTC both increased with GA, indicating an increased degree of functional distinction"). While this might be true, it doesn't need to be. Functional connectivity can be observed between regions that lack anatomical connectivity. Instead, two regions could both be driven by another region. In this case, the thalamus might drive symmetrical activity in the visual cortex.

      We agree with the reviewer’s view that the development of functional connectivity might be driven by other regions like thalamus. So we added this interpretation in the discussion section (Page 19 Line 23-25): “It is worth noting that the increased homotopic connection can be direct or indirect, e.g., the effect might be driven external regions with enhanced connection to both of the areas (e.g. thalamus)”.

      Miscellaneous

      I am not sure what the motivation of this line is: "Moreover, those studies did not fully control the visual experience in the first few weeks of the subjects, thus cannot give a clear conclusion whether the innate functional connectivity is unrelated to postnatal visual experience." Arcaro, Schade, Vincent, Ponce, & Livingstone (2017) did control the visual experience of subjects. Moreover, the research here doesn't control infant experience in the way this sentence implies: it implies an experiment manipulation (i.e., fully control) rather than a statistical control that is done here. Consider rephrasing

      We have rephrased this sentence in the introduction section (Page 5 Line 2-5): “Moreover, the human infants participating in a previous study (Kamps et al., 2020) were around one month old (mean age: 27 d; range from 6 to 57 d), who might already acquire some visual experience, and thus this study could not exclude postnatal visual experience on the innate functional connectivity”.

      I am not sure why this claim is made: "Area V1 was selected because this region is the most basic region for visual processing and probably is the most experience-dependent area during early development". Is there evidence supporting this claim? Plasticity is found throughout the visual cortex, and I think which region is most plastic depends on the definition of plasticity. For instance, most people have the same tuning properties to gabor gratings (e.g., a cardinality bias), but there is enormous variability in face tuning across cultures.

      We have removed this claim in the manuscript.

      The abstract says 783 infants were included in this study, but far fewer are actually used. The authors should report the 407 number in the abstract if any number at all.

      We have revised the number accordingly.

      Any comparisons of preterms and terms ought to be given the caveat that the preterm environment can be very different than the term environment: whereas a term infant goes home and sees friends and family without restriction, the preterm environment can be heavily regulated if they are in a NICU. Authors should either provide details about the environments of the preterms in their study, or they should consider how differences in the richness of visual experience - regardless of quantity - may affect visual development.

      We agree with the reviewer’s concern, and added a paragraph in the limitation section to stress the caveat (Page 20 Line 12-16): “One limitation of this study is the comparison between preterm and term-born infants did not consider the different visual experience in these infants. The preterm-born neonates may experience very different environment than those of the term-born, e.g. the preterm environment can be heavily regulated if they were in a NICU, but we didn’t have detailed information about the postnatal environment to control for it.”

      Reviewer #3 (Public Review):

      The authors use a large neonatal dataset to examine how development may occur differently based on whether on not the neonate spent that time in gestation or out of the womb accruing potentially accruing visual experience. In this manner, the authors hope to tease apart those aspects of development that are biologically programmed versus those that occur in response to experience within the visual cortex. They show structurally that cortical thickness is affected by postnatal experience while cortical myelination is not, and functionally they find regional differentiation present between visual areas at birth and that their connectivity changes with development and postnatal experience. The conclusions seem well supported by the data and analyses and provide some insight into which aspects of brain structure at birth are sculpted more by postnatal experience and which are more determined by endogenous developmental timelines.

      The analyses are based on a large sample of infants, and the authors were careful to statistically separate which aspects of an infant's age, gestational or postnatal, are driving brain development, providing a deeper picture of infant brain development than previous publications. Overall, the findings seem well supported by the data as the analyses are relatively straightforward.

      Visualization of the data and findings could be improved, as a few figures are difficult to interpret without having to read the methods.

      We have extensively revised the figures in the manuscript to improve the readability. See updated Figures 2-7.

      The acronyms regarding gestation, postnatal, and post-menstrual time are a little distracting. Please consider explicitly writing "gestational time" etc when referring to these numbers to improve readability.

      We have replaced the analyses involving PMA with gestational age (GA) or postnatal time (PT) in the revised manuscript to simplify the terminology. Please see the Response to the 1st major concern in the Essential Revisions (for the authors) section above. We believe this change makes the paper easier to follow even with the abbreviations.

      Because the cortical ribbon of infants is so thin at birth, there seems to be a possibility that partial-volume effects could be more prevalent in less-developed infants and impact myelin metrics. If not modeled or estimated, it should at least be discussed.

      In fact, the cortical thickness of the neonatal brain is not thinner than that of the adult. Particularly, the average cortical thickness of infants aged 0-5 months is around 2-2.5 mm (Wang et al., 2019), which is similar to adults (Fjell et al., 2015). Therefore, the partial-volume effect for cortical gray matter is not a special concern for infants.

      Nevertheless, we agree that the partial-volume effects might have different influences on infants of different ages. We added this consideration in the limitation section (Page 20 Line 20-24). “Another concern was about the partial-volume effect on the cortical measurements. The changing thickness of cortical ribbon during development may changes the degree of partial-volume effect, and thus may affect the cortical myelination measurement and may contribute to the myelination difference observed between preterm and term-born groups.”

      Structural and functional development could be more formally compared using quantitative models if the authors want those points more strongly related; the two are only qualitatively discussed at present.

      We have added a formal analysis to investigate the relationship between structural and functional development. Please see the Response to the 1st concern of Reviewer 1 (public review).

    1. Author Response

      Reviewer #1 (Public Review):

      The previous study as the authors stated showed a weaker expression of DMP1 in skeletal muscle. The authors provide a clear justification that sarcopenia-like phenotype was unlikely caused by DMP1-cre expression in muscle cells given there is no change of muscle cell numbers. It would be helpful to provide some quantification data of muscle cells to further preclude this possibility.

      To define how osteocyte partial ablation was achieved, we performed the quantification of empty lacunae ratio of DTAhet mice at 13 weeks. About 80% empty lacunae was observed in DTAhet mice at 13 weeks which increased about 20% compared to 4 weeks (Line 127-131, Figure 1 – figure supplement 1B), indicating diphtheria toxin (DT) has an accumulative effect with age in DTAhet mice. We speculated that when DT accumulated to a threshold, osteocytes were ablated.

      The underlying molecular mechanism is not shown in the current study, but it might be worthwhile to provide some more-depth discussions and hypotheses concerning how osteocytes could influence cell lineage commitment in bone marrow.

      We thank the reviewer’s suggestion, and we now have updated this in the Discussion in the revised version (Lines 424-433).

      Reviewer #3 (Public Review):

      The finding that osteocyte reduction induced senescence in osteoprogenitors and myeloid lineage cells is intriguing. However, further validation of cellular senescence in bone/bone marrow is lacking. Additional approaches, such as immunostaining of key senescence markers in bone tissue sections, are needed to validate the phenotype.

      According to the reviewer’s suggestion, we performed the senescence associated 𝛽galactosidase (SA-𝛽Gal) staining of frozen sections of WT and DTAhet mice femur (Figure 6 - figure supplement 1D). Accordingly, the details were given in Response to Essential Revision 2.

      It is interesting that partial osteocyte ablation alters mesenchymal lineage commitment, i.e. increased adipogenesis and impaired osteogenesis. The authors should perform further analysis of their scRNA-Seq data and conduct trajectory analysis to confirm the phenomenon. Additional functional assays of bone marrow mesenchymal stem/progenitor cells, such as CFU-F and tri-lineage differentiation assays, are needed to claim the lineage commitment change of the cells.

      As we used total bone marrow cells to perform scRNA-seq, the number of MSCs was not enough to perform further re-clustering and trajectory analysis. We performed GO enrichment analysis of MSC cluster which revealed that downregulated genes after osteocyte ablation were enriched in ossification and biomineral tissue development (Figure 6 - figure supplement 1E), which was consistent with the finding of impaired osteoblast differentiation (Figure 4H-J). Further, as reviewer suggested, we performed qPCR to verify related gene changes during tri-lineage differentiation. We found that the mRNA level of osteogenic markers including Alp, Ocn, Runx2 was decreased (Figure 4J), indicating the impaired osteogenesis after osteocyte ablation. Meanwhile, the mRNA level of adipogenic markers including Adipoq, Fabp4, Ppap𝛾 and Cebpa was significantly increased (Figure 6 - figure supplement 1F), indicating the promoted adipogenesis and altered MSC commitment. Besides, the mRNA level of cartilage anabolism related genes (Col1a2, Acan, Sox9 and Prg4) and catabolism related genes (Mmp3, Mmp13, Adamts1 and Adamts5) was not significantly changed (Figure 6 - figure supplement 1G), indicating that chondrogenesis was not altered after osteocyte ablation. And we now have updated this in the revised version (Lines 324-333) and trilineage differentiation methods and information of primers have been updated in Material and methods (Lines 579-590, Lines 623-637).

      The mechanism why osteocyte reduction causes cellular senescence of the surrounding cells is an interesting question. It would be helpful if the authors provide evidence or give an explanation on this point. Does the phenotype recapitulate age-associated bone impairment? The laboratories of Sundeep Khosla (Mayo Clinic) and Maria Almeida (University of Arkansas for Medical Sciences) reported that osteocytes are a major cell type in bone that become senescent during aging. Although most of osteocytes were eliminated in the mouse model used in this study, were the rest osteocytes undergoing cellular senescence?

      We thank the reviewer’s suggestion, and we now have updated this in the Discussion in the revised version (Line 424-433). The details were given in Response to Essential Revision 4.

      We thought that the phenotypes after osteocyte ablation were similar with the ageassociated bone impairment, and to certain degree this phenotype recapitulated the ageassociated bone impairment, which further indicated the important role of osteocytes in maintaining the bone homeostasis during aging. We performed the SA-𝛽Gal staining of frozen sections of WT and DTAhet mice femur, in which we observed SA-𝛽Gal+ cell in the cortical bone region of DTAhet mice (Figure 6 - figure supplement 1D). As cortical bone mainly contains osteocyte and matrix, we inferred that the rest osteocytes may also underwent cellular senescence.

    1. Author Response

      Reviewer #3 (Public Review):

      In invertebrates, learning-dependent plasticity was reported to take place predominantly in presynaptic neurons. In Drosophila appetitive olfactory learning, cholinergic synapses between presynaptic Kenyon cells and postsynaptic MBONs undergo behaviourally relevant associative plasticity, and it was shown to reside largely in Kenyon cell output sites. This study provided several lines of evidence for postsynaptic plasticity in MBONs. The authors nicely showed the requirement of Kenyon cell output during training, strongly suggesting that behaviourally relevant associative plasticity also resides downstream of Kenyon cell output. This is further supported by impaired appetitive memory by downregulating nAChR subunits (a2, a5) and scaffold protein Dlg in specific MBONs. Live imaging experiments demonstrated that the learning-dependent depression in M4-MBON was reduced upon knocking down the a2 nAChR subunit. Using in-vivo FRAP experiments, the authors showed recovery rates of nAChR-a2::GFP were altered by the co-application of olfactory stimulation and DA. All these lines of evidence point to the significance of nAChR subunits in MBONs for postsynaptic plasticity.

      On the technical side, this study achieved a very high standard, such as the measurement of lowexpressed receptor mobility by in-vivo FRAP. The authors conducted a wide array of experiments for collecting data supporting postsynaptic mechanisms. The downside of this multitude is somewhat compromised coherence. To give an example, the authors duplicated many behaviour and imaging experiments in different MBONs for non-associative learning (Fig. 7 and 8), which is primarily out of the scope of this paper (cf. title).

      We thank the reviewer for their positive assessment and constructive criticism. We have thought a lot about removing data on non-associative learning (Fig. 7 and 8.), however feel that they do add important experiments that are not feasible to address for the other MBONs due to technical constraints (complexity of training protocols and localization of imaging area). We also decided, as reviewer 1 was happy with these experiments, that it is important to show that the receptor plasticity is not confined to associative appetitive memory but also is important for other postsynaptic memory storage mechanisms. As a response to this reviewer, we have adjusted the title to:

      Postsynaptic plasticity of cholinergic synapses underlies the induction and expression of appetitive and familiarity memories in Drosophila

      We also now include familiarity learning in the abstract. Moreover, we now expanded our explanation on to why we conduct these additional experiments and now state:

      line 436ff: ‘Our data so far suggest that regulation of α2 subunits downstream of α5 are involved in postsynaptic plasticity mechanisms underlying appetitive, but not aversive memory storage. Besides associative memories, non-associative memories, such as familiarity learning, a form of habituation, are also stored at the level of Drosophila MBs. We next asked whether postsynaptic plasticity expressed through α5 and α2 subunit interplay, was exclusive to appetitive memory storage, or would represent a more generalizable mechanism that could underlie other forms of learning represented in the MBs. We turned to the α’3 compartment at the tip of the vertical MB lobe that has previously been shown to mediate odor familiarity learning. This form of learning allows the animal to adapt its behavioral responses to new odors and permits for assaying direct odor-related plasticity at the level of a higher order integration center. Importantly, this compartment follows different plasticity rules, because the odor serves as both the conditioned (activating KCs) and unconditioned stimulus (activating corresponding dopaminergic neurons)15. While allowing us to test whether the so far uncovered principles could also be relevant in a different context, it also provides a less complex test bed to further investigate whether α5 functions upstream of α2 dynamics.’

      We also would like to emphasize that - if the reviewer feels that keeping these data / this information as part of our manuscript would prevent publication - we are prepared to remove these data from the manuscript, and submit these data in their own right (potentially as a research advance subsequently).

    1. Author Response

      Reviewer #1 (Public Review):

      1. Probably the shortest review I've ever written! Most birds today can lift the upper beak independently of the brain case. This is made possible by a series of mobile joints and bending zones in the skull. To investigate the evolution of this phenomenon, the authors successfully CT-scanned the thoroughly squished skull of the Early Cretaceous stem-bird Yuanchuavis. The detailed description and illustration of the shapes and positions of the skull bones leave no doubt about the conclusion that the toothed snout was unable to move independently of the brain case. They also show, however, that the loss of a few extensions from specific skull bones would have made mobility possible. This plugs a major gap in our understanding of the evolution of mobility within the skull in birds (and by extension elsewhere, notably in the similarly diverse lizards & snakes).

      Yes, we are delighted that this work will further advance our understandings about the avian skull evolution.

      Reviewer #2 (Public Review):

      1. Wang et al. present a detailed description and analysis of the previously reported cranial remains of enantiornithine bird Yuanchuavis. The authors use X-ray CT scan data to reconstruct the cranial elements and retro-deform the facial and palatal skeleton. The authors also use principle component analysis with geometric morphometrics data to investigate where Yuanchuavis falls in palatine phylomorphospace. The authors use these data to make inferences about the kinetics of the Yuanchuavis skull as well as the evolution of cranial kinesis across birds. Generally, I find the authors' direct interpretation of their anatomical and PCA data to be convincing and compelling. The anatomical description is thorough and accurate. The methods used for the geometrics morphometrics and PC analyses are appropriate. I find compelling the authors' interpretations that Yuanchuavis largely retained the ancestral non-avialan akinetic skull.

      One of the greatest strengths of this paper are the extremely attractive figures. In particular, I find figure 4 to be exceptionally useful - this is easily the most effective illustration I have yet seen of avian cranial kinesis and the shifts in cranial morphology that underlie its evolution. I applaud whoever designed this figure. My one major concern with this paper's methodology is that the palatine used for Ichthyornis is incorrect. Torres et al. (2021) published the correct palatines, which were very different from those incorrectly (but understandably) identified in Field et al. (2018) and used here. I strongly urge the authors to rerun their GMM analysis with corrected data

      We thank the reviewer for supporting this study. As for the palatine of Ichthyornis, we have used the palatine reconstruction in Torres et al. (2021) and reperformed the GMM analyses. This certainly changes the GMM result, and the main conclusion has not been strongly influenced. We are grateful for this comment.

    1. Author Response

      Reviewer 2 (Public Review):

      1) The hypothesis that the genes responsible for the Mendelian traits are also the causal genes for the cognate complex traits does not seem to hold, given the prior work and the data shown in the study. For example, if this hypothesis is true, it is unexplained why the candidate genes were not even enriched in the GWAS regions for height and breast cancer.

      Following the removal of a data artifact from our breast cancer analysis and the inclusion of Backman et al.’s larger list of genes implicated in height, every phenotype in our analysis displays enrichment in proximity to GWAS peaks. Enrichment is present not only in genes selected based on cognate Mendelian phenotypes, but also on those from Backman et al., which examined the same complex trait phenotypes that were used for GWAS. In that work, the enrichment GWAS signal near of genes selected on coding variants was as high as 59.3-fold.

      Our use of Mendelian-trait-causing genes is not dependent on GWAS. Short of large-scale experimental work, we do not know any better way to confirm the genes’ broad relevance to GWAS phenotypes than their enrichment near peaks. This enrichment has been persuasively demonstrated by previous research. Freund et al. (2019) tested the enrichment of 20 Mendelian disorder gene sets against 62 complex phenotypes. Though there was no statistically significant overlap of phenotypically non-matched Mendelian genes and GWAS peaks (2% matched), the overlap of matched Mendelian genes and GWAS peaks was significant (54% matched).

      We have included additional evidence and references for this relationship in Supp. Note 1.

      2) The only evidence supporting their hypothesis appears to be the enrichment of the candidate genes in the GWAS regions for seven out of the nine traits. However, significant enrichment of the candidate genes in the GWAS regions does not necessarily mean that a large proportion of the candidate genes are the causal genes responsible for the GWAS signals. Analogously, we cannot use the strong enrichment of eQTLs in GWAS regions as evidence to claim that a large proportion of the GWAS signals are driven by eQTLs.

      Our gene sets were selected by considering two criteria: whether they are relevant to each complex trait, and whether they are biologically interpretable.

      The genes identified in Backman et al. have a strong case for relevance. They are evaluated for association, not with cognate Mendelian phenotypes, but with the exact same complex traits used for GWAS.

      Our genes, selected based on cognate Mendelian traits, are less obviously relevant, but have advantages for interpretation. Many have well-understood biological roles and are part of pathways that have been studied in great detail. Because most of these genes can cause dramatic phenotypic changes with one variant, the direction of effect is easier to understand than genes identified through burden testing. In fact, loss-of-function coding variants that cause autosomal dominant traits can be thought of as large-effect, context-independent eQTLs—they cause phenotypic change by decreasing gene expression roughly 50% across cell types, developmental stages, etc.

      Ideal genes for our analysis would combine the advantages of both sets. They would have individual coding variants that could be tied to complex traits using exome sequences. However, natural selection creates tradeoffs between variant frequencies and variant effect sizes. Large-effect variants (such as those responsible for Mendelian traits) are generally too rare to be detected in population sequencing. Coding variants that reach frequencies detectable in databases such as UK Biobank typically have smaller effect sizes, requiring them to be aggregated in order to implicate genes.

      We believe that our original gene set is plausible both because of its collective enrichment in GWAS signal and because each gene is individually known to cause cognate phenotypes. Enrichment is not proof, but can serve as strong evidence when backed up by known biology. Though selection precludes a perfect gene set, the enrichment in both our Mendelian gene set and the set from Backman et al. addresses each criterion—interpretability and relevance—individually, and, taken together, provides an argument for the relevance of genes selected based on coding variants.

      3) Considering the large numbers of GWAS signals, we would expect a substantial number of genes in the GWAS regions by chance. It would be interesting to quantify the number of genes in the GWAS regions if the 143 genes are randomly selected. Correcting the observed number of genes for that expected by chance (e.g., subtracting the observed number by that expected by chance), the proportion of the candidate genes in the GWAS regions would be small.

      The proportion of the candidate genes whose eQTL signals were colocalized with the GWAS signals or in close physical proximity with the fine-mapped GWAS hits was small. However, I would not be surprised if they are significantly enriched, compared with that expected by chance (e.g., quantified by repeated sampling of the 143 genes at random).

      Taking random sets of genes, or the entire set of non-putatively-causative genes shows that, given the size of our gene set, we would expect 43 randomly selected genes to fall within 1 Mb of a peak (95% confidence interval: 31.5-54.5). Instead, we find 147 peak-adjacent genes. When looking closer to genes, the enrichment increases. At a distance of 100 kb, we find 104 putatively causative genes, but the null model predicts only 11 (95% CI 4.5-17.0), a roughly ten-fold difference.

      Enrichment remains significant even when using a more conservative null. It may be that genes like ours, with importance to phenotype, are more likely than random genes to fall near GWAS peaks, even if their phenotype does not correspond to the GWAS phenotype. In this case, we might see enrichment even in the absence of a relationship between our Mendelian and complex traits. To account for this, we also tested significance by testing genes sets against different phenotypes (e.g. testing our LDL genes with a UC GWAS, and our height genes with a T2D GWAS). The results of this permutation are visible in Supp. Fig. 1, and further confirm the enrichment.

      Finally, non-expression based analysis found that Mendelian genes had large enrichments in heritability. As in our study, they included Mendelian genes for diabetes and LDL—the Mendelian diabetes genes were enriched 65-fold for common-variant heritability and the Mendelian LDL genes were enriched 212-fold (Weiner et al. 2022).

      Though it is true that the number of colocalizations and TWAS hits likely represents a statistically significant enrichment over all genes, we feel that this does not affect the conclusions of the paper. The model that noncoding variants identified by GWAS act as eQTLs certainly has some truth—colocalization and TWAS studies have found, in total, many associations. But the model’s success has not lived up to its expectations. This has been suggested, albeit inconclusively, by the failure of most GWAS peaks to colocalize. By evaluating, not the portion of loci that can be tied to a gene, but the portion of already-implicated genes that can be tied to a locus, we believe the model’s deficiencies are both more clear and more puzzling.

      4) It is unclear how the authors selected the breast cancer genes. If the genes were selected based on tumor somatic mutations, it is a problem because there is no evidence supporting that somatic mutation target genes are also cancer germline risk genes.

      Genes for breast cancer were selected using the MutPanning method (Dietlein et al. 2020), which takes somatic mutations found in tumors, and evaluates them in the context of known mutation patterns. The relationship between somatic and germline variants in cancer is little studied. We believe it is meaningful that, as explained in our response to overall comment 2ii, we do now find an enrichment of our breast cancer genes near GWAS peaks. Though these genes are very unlikely to be a perfect set, the conclusions of our paper remain true with or without the inclusion of this phenotype.

      5) The authors observed no enrichment of the candidate genes in height and breast cancer GWAS regions. In this case, should these traits and the corresponding genes be removed from the subsequent analyses?

      The reviewers’ notes about enrichment—and its absence in height and BC—prompted us to review our analysis of it. The enrichment for five of our phenotypes remained significant, and the lack of enrichment for breast cancer genes proved artifactual. After accounting for the artifact, the enrichment of breast cancer genes displays the same pattern as most other phenotypes, displaying highly significant enrichment as compared to the genomic background and a permutation analysis. Supplementary figure 1 has been updated to reflect this change, and to add the enrichments found in Backman et al.

      Because our original analysis of height has nominal, but not corrected, significance for enrichment, the problem may be one of power. The set of height genes identified by Backman et al. is larger than our original set and displays a significant enrichment in proximity to GWAS signal. This enrichment is also present when the two gene sets are combined, as shown in the updated Supp. Figure 1.

      Reviewer 3 (Public Review):

      1) The positive results are substantially reduced when restricting the analyses to a set of selected tissues of relevance to the trait. Isn't it implicated that the selection of relevant tissues in this study is not comprehensive, and further, tissue specificity is common in mediating genetic effects by gene expression? First, it seems some apparently relevant tissues are not selected (Table 2), such as bone for height (Finucane et al. 2015 NG). One approach to assess the relevant tissues for the predefined set of putatively causative genes is to see if these genes are enriched in the differentially expressed gene sets for those tissues. Second, among 84 putatively causative genes overlapped with GWAS signals, they identified 39 genes by TWAS, 11 genes by fine mapping with linear distance to chromatin modification features, and 41 genes by fine mapping with ChromHMM enhancer annotations, but these numbers reduced substantially to 9, 5 and 27 when restricting the same analysis to the selected tissues for each trait. If genes function only in the relevant tissues, I think using bulk expression data would lose power but is unlikely to give false positives. Thus, it is possible that for the traits analysed, not all relevant tissues are selected so that only a fraction of genes identified in bulk expression analysis can be replicated in the tissue-specific analysis. This appears to me a notable piece of evidence to support the hypothesis of biological context that the authors tend to have reservations in discussion.

      Testing for colocalizations or TWAS hits in all tissues may increase power for several reasons. First, it is possible that some GTEx tissues have unrealized relevance to our phenotypes. Secondly, in the event that a tissue is not present in GTEx, we may still detect relevant eQTLs in a tissue that is not itself involved in the trait, but which has similar patterns of expression. Finally, some tissues may be correct, but underpowered due to their small sample size. In this case, we may better detect the colocalization in tissues that are “irrelevant,” but are well-powered and have correlated expression.

      However, this creates problems of interpretation. Say we find, for example, a colocalization of an APOE eQTL with an LDL GWAS peak in skin tissue. Does this mean that skin tissue contributes to LDL levels? Is it simply because skin tissue has more samples than liver? Are we uncovering a strange, unexpected pleiotropy?

      We believe we can achieve both objectives—power and interpretability—with our use of MASH (Urbut et al. 2019) as described in response 3 of the first section. Briefly, MASH is a Bayesian tool that we use to update the estimates of eQTLs in GTEx data. Each tissue is adjusted to incorporate signals detected in other tissues with similar expression. This mitigates the danger of ignoring the correct tissue, and increases the power of tissues with small sample sizes. Its benefit is demonstrated by the substantial increase in the number of expression-GWAS colocalizations identified by coloc—however, the number of genes identified that fall within our putatively causative gene sets remains strikingly small.

      2) How much do both LD differences between GWAS and eQTL samples and the presence of allelic heterogeneity contribute to the observed low colocalization rate? One of their main findings is the low colocalization between trait-associated variants and eQTL in non-coding regions, which accounts for only 7% of the putatively causative genes. In discussion, the authors believe that this finding cannot be explained by lack of statistical power and is directly supported by a Bayesian analysis which reported high posterior probabilities of distinct signals for GWAS and eQTL. I agree that power is probably not a big issue. However, my concern is that given the large difference in sample size between GWAS and GTEx datasets, any small differences in LD between the two samples might cause a statistical separation of the signals even when trait phenotype and gene expression truly share a causal variant. Moreover, the presence of more than one causal variant with allelic heterogeneity in the locus may also play a part in the failure of colocalization. Consider two causal variants for the complex trait, one regulating the target gene and the other regulating another gene in co-expression. Potentially, the presence of the second causal variant would diminish the colocalization probability at the target gene.

      The ability of our statistical tools to actually find colocalizations is a critical one in this project. Small sample size increases the variance of the LD matrix, but is one of only many factors that influence power, which include LD differences between study populations and eQTL effect sizes.

      Though we restricted both GWAS and GTEx samples to subjects with European ancestry and used PCs as covariates, reviewers are correct that there are likely to be LD differences between samples, due to both slight variations in populations and the smaller sample sizes of GTEx. Analysis of colocalization tools in cases of mismatched LD have shown that decreases in power are small. Chun et al. (2017) tested JLIM in simulated conditions of modest population mismatch, using CEU haplotypes to create the GWAS, and haplotypes from all non-Finnish Europeans for eQTL associations. They then attempted to distinguish shared vs. distinct causative variants for GWAS and eQTL, finding no decrease in sensitivity or specificity (Supp. Fig. 6 of Chun et al. 2017).

      The case in which two genes are co-regulated by nearby variants, both causative for the GWAS trait, creates a condition of allelic heterogeneity for the GWAS trait (as opposed to the expression trait). Chun et al. evaluated JLIM’s loss of power as a result of AH, and found that the power loss is small, except in cases in which the two variants have equal effects (Supp. Fig. 10). Testing cases in which the AH occurs for the expression trait returned a similar result (Supp. Fig. 9).

      Hukku et al. (2021) performed similar analyses on coloc, eCAVIAR, and fastENLOC. Allelic heterogeneity was found to damage the power of coloc (by about a factor of 2). Testing on different pairs of populations, they conclude that extreme LD mismatches (e.g. Finnish vs. Yoruban samples) can lead to substantial power loss, but moderate LD mismatches (e.g. Finnish vs. British samples) do not. Though a factor of two is substantial, it would not change the qualitative conclusions of this paper. Overall, given the variety of methods we employ (including those, such as JLIM, more robust to AH), we are confident that they have, when taken together, been shown to be robust to the concerns raised.

      Finally, TWAS should, by design, be less vulnerable to LD differences and allelic heterogeneity. This can result in false positives, when genes with correlated expression are identified together, despite only one being causative. It can also result in non-causative genes being prioritized over causative ones, however, generally both genes will be identified (Wainberg et al. 2019).

      3) Perhaps the authors can perform some simulations to quantify the influence of tissue-specific expression effects, LD differences between eQTL and well-powered GWAS, and allelic heterogeneity, as discussed above, on their analyses. I understand that the authors may not be willing to do as it would involve a lot of work. But I'd like to see at least some discussion on how these questions can be better addressed in the future research.

      These are nuanced technical questions, and to address them by simulation in our paper would, as noted, involve a lot of work. We have summarized previous work that evaluated the effects of LD differences and AH in our response to essential revision 4. We discuss our concerns about the possibility of an overly broad tissue search in essential revisions 3 and 5, and our decision to address this question using MASH in essential revision 3.

      4) It looks quite striking that only 6% of the putatively causative genes are identified by TWAS with the correct effect direction. But I think this number is slightly misleading as one may interpret it as only 6% of the functionally relevant genes are regulated by trait-associated variants. In fact, 46% of the genes are detected by TWAS but only 11% are confirmed in their selected tissues, among which about half (5/9) have correct effect direction. First, the result could be limited by the selection of relevant tissues, as discussed above. Second, the fact that half of the genes do not show correct effect direction may reflect a nonlinear relationship between expression and trait, or the presence of cell-type heterogeneity within a tissue. These may not necessarily overturn the assumption that these genes are regulated by trait-associated variants in the causal tissues or cell types.

      In our initial submission, we had been reluctant to expand the list of tissues for two reasons. First, increasing from the small number of tissues with known biological relevance to all tissues (or all non-brain tissues) increases the multiple-testing correction burden. Second, and, in our eyes, more important, colocalizations in tissues without clear biological relevance are not biologically interprable. Such hits can be results of complicated genetic architecture (e.g. shared eQTLs), power differences in tissues with correlated expression, or biology not directly related to the trait in question.

      That said, the tissue data we have access to are incomplete, and we are without question missing some relevant tissues. Additionally, some relevant tissues have lower sample sizes, and thus lower power, than tissues that are not relevant but may still share eQTLs. To overcome these problems, we applied Multivariate Adaptive Shrinkage (MASH), a Bayesian method that detects correlations between different (in this case tissues) and uses them to produce posterior estimates of summary statistics in each tissue (Urbut et al. 2019). Unlike meta-analysis, which produces one result, the effect size estimates for each tissue are distinct, though informed by one another.

      Using MASH has a pronounced effect on colocalization results. The number of non-putatively causative genes colocalizing increases from 389 to 489, while the number of putatively causative genes in our Mendelian set is unchanged, remaining at 2. The number of genes from the Backman et al. set increases from 2 to 5. Though this is a proportionally large increase, it still represents a small fraction of genes. We have updated our paper to use these results—which should be less dependent on the tissues we selected—but the message has not changed.

      5) While they highlight the roles of alternative regulatory mechanisms, few testable hypotheses are put forward for the field, which is somewhat disappointing but understandable given how little we know about the human genome at the mechanistic level.

      We have added a set of models that may explain the “missing heritability” to Table 4 in the discussion. Though we do not propose experiments, we have included citations for research relevant to confirming or disproving these models.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Kowalczyk and colleagues report on identifying coding and non-coding genetic determinants of hairlessness in mammals using an approach they developed called RER-converge. The approach has previously been employed to examine several different traits in previous publications from this group. The authors determine that hairlessness is associated with relaxed evolutionary constraint at genetic loci and identify both coding genes and non-coding sequencing associated with this phenotype. Several known-hair-associated and novel genes and microRNAs are observed.

      This is a strong manuscript with interesting results. It is remarkable how robust this method is. There are a few places where I was not fully convinced of the choice to highlight a gene as "significant" however.

      In Figure 4 and the associated text and figure legend the claim is made that non-coding regions exhibit accelerated evolution of matrix and dermal papilla elements. However, the enrichment, even prior to multiple testing correction is not significant. Should this be reported on?

      We agree that some of the results that we displayed had borderline significance and we have clarified this in the text so the reader is aware. Our rationale for highlighting tissue annotations from borderline-significant enrichment results from noncoding analyses (matrix p=0.078 adj.p=0.18, dermal sheath p=0.059 adj.p=0.16, dermal papilla p=0.049 adj.p=0.16) is because we believe that these are an honest depiction of the trends we see in this scan (with alpha=0.2 for adjusted p-values), particularly when supported by effect sizes as reported through AUC. We prefer to set a generous threshold to avoid missing any meaningful results rather than setting a more stringent threshold. A more generous alpha is also more forgiving of the noise related to identifying noncoding regions and assigning them to genes.

      Related to the above, Table 1 includes just one 'significant gene,' with the remainder of the genes highlighted because they have a Bayes Factor ratio >5. Should a gene with a BF HvM be highlighted as a gene "whose evolutionary rates are significantly associated with the hairless phenotype?" Perhaps I am incorrect, but the hypothesis that is being tested by this approach seems distinct from "is the gene associated with hair loss."

      Similar to pathway enrichment analyses, we also used generous significance thresholds for gene-specific results to show our top, most significant results from protein-coding analyses. Significance of noncoding enrichment was not a criterion for inclusion/exclusion of genes in Table 1. Generally, some genes with significant convergent evolutionary rate shifts in protein coding sequence also have significant enrichment of convergent rate shifts in nearby noncoding regions (like PTPRM in Table 1), but many do not, which is also shown in Figure 6A. We have clarified column titles in Table 1 by adding (Gene) or (Noncoding) to indicate which sequences the values refer to.

      Bayes factors (BF) are a complementary Bayesian approach to analyze statistical associations that we use here to supplement information we get from our more traditional Kruskal-Wallis test. BF are easy to interpret because they directly describe the amount of support for our alternative hypothesis rather than indirectly describing support as p-values do. For example, in Table 1, the hypothesis that the evolution of FGF11 is associated with the evolution of mammalian hairlessness has 6,354.7 times more support than the null hypothesis that phenotype and gene evolution are not related. These large values are interpreted as supporting the alternative, which is equivalent to what we want to be able to interpret from p-values (i.e. low p-values allow us to reject the null and implicitly support the alternative).

      BF values in Table 1 are calculated using evolutionary rates in protein coding sequence and so are not expected to match values in the “Noncoding” columns. “BF Hairless” is directly related to the “Statistic” and “p-adj” columns, which is why the “BF Hairless” values are all quite large, indicating a large amount of support for an association between gene and phenotype evolution.

      The hypotheses that are tested with the “Statistic” and “p-adj” columns and the “BF HvM” column are colloquially the same: they both test to determine if the evolutionary rate of the gene is different in hairy mammals compared to hairless mammals. Only the details are different. The traditional statistics test for an association without accounting for marine mammals as a potential confounder. The BF tests check for a significant association that is driven more strongly by hairlessness than by marine habitat.

      Slightly more description of the Bayes factor calculation would be beneficial to the supplement. e.g. is the R package BayesFactor package being used here... or something else?

      We agree that a clearer description of Bayes factors is appropriate and have modified the methods description as follows:

      “In addition to calculating element-specific association statistics, Bayes factors were calculated for each gene using the marine and hairless phenotypes using the BayesFactor R package (Morey & Rouder, 2021). These values were calculated to disentangle the two phenotypes, which are heavily confounded since nearly all marine mammals in the genome alignment used for this work are hairless. Briefly, Bayes factors are a Bayesian approach complementary to more standard statistical tests. Instead of returning statistics and p-values, Bayes factors directly quantify the amount of support for an alternative hypothesis. For example, a Bayes factor value of 5 for a particular statistical test would indicate 5 times more support for the alternative hypothesis than the null hypothesis. Bayes factors can also be used to compare different alternative hypotheses by calculating the ratio of two Bayes factors. When considering the hairless phenotype, we use Bayes factors to quantify the support for a linear model predicting phenotype using evolutionary rate information from each gene, with a higher Bayes factor indicating greater support. We perform this calculation for two alternative hypotheses: 1) a gene shows different evolutionary rates in hairless versus hairy species, and 2) a gene shows different evolutionary rates in marine species versus non-marine species. The ratio of Bayes factors between the hairless and marine phenotypes quantifies the level of support of one phenotype over the other and thus can be used to tease apart intricacies of the two heavily-confounded phenotype. When the Bayes factor for the hairless phenotype is much larger than the Bayes factor for the marine phenotype, that indicates stronger support for signal driven by hairlessness.”

      Why are the qq-plot distributions of non-coding elements so distinct compared to coding? Some comment on this would be appreciated in the main text, even if briefly.

      We have added the following text as a tentative speculation about why noncoding elements seem to show more signal than coding signal:

      “Interestingly, noncoding regions appeared to show even stronger deviation from uniformity than coding regions, perhaps because regulatory changes more strongly underlie the convergent evolution of hairlessness.”

      Reviewer #3 (Public Review):

      The authors present a phylogenetic analysis of evolutionary rates as they correlate with independently derived "hairlessness" across mammals. This is a very good paper, well written and very carefully analyzed. This paper makes a number of interesting biological insights, including the identification of protein coding as well as noncoding regions that appear to evolve in correlated fashion with hairlessness.

      I have several recommendations:

      1) The main assumption behind this experiment is that species "use" the same genes to accomplish hairlessness. Only then would one predict correlated rate shifts along hairless lineages. If, on the other hand, each hairless species used a unique gene to accomplish hairlessness, then one might only see a rate shift on that species' lineage. Therefore, a complementary approach might be to i) define all genes with known involvement in hair morphology (i.e., genes in the categories listed in Fig. 1C). ii) test how many of those genes show a significant rate shift in at least one hairless lineage. iii) test whether hair genes are more likely to show at least one rate shift compared to genomic background. This complementary analysis would relax the assumption that all hairless species show similar rate shifts compared to haired species.

      Our analyses detect convergently evolving genomic elements associated with hairlessness for two reasons. First, species-specific analyses may detect genomic changes associated with any unique phenotypes in a particular species and it is difficult to distinguish which of those genomic changes are associated with hairlessness. Second, we are seeking genomic elements associated with hair growth in all mammals and species-specific adaptations will not be shared across all mammals.

      Nevertheless, we conducted a complementary analysis to test for rate shifts specific to each hairless species compared to all of the non-hairless species. We then tested for enrichment of hair follicle genes among genes with significant rate shifts in different numbers of hairless species. For example, among all genes with significant rate shifts in at least one hairless species, is there an enrichment of hair follicle genes? Then, among all genes with significant rate shifts in at least two hairless species, is there an enrichment of hair follicle genes? Et cetera until we test for enrichment only in genes with rates shifts in all ten hairless species. As expected, the signal of enrichment gets stronger as more species share the rate shift (the “convergent signal”). This happens because the genes with shared rate shifts are more hair-specific than the genes with unshared rate shifts.

      We also performed another analysis to test for enrichment of hair follicle genes among genes with significant rate shifts per hairless species. For example, in orca, are the genes with significant rate shifts enriched for hair follicle genes? To complement this analysis, we also repeated the procedure for non-hairless species for comparison. Only two of the ten hairless species show species-specific hair follicle enrichments, which indicates that most of the hairless species alone are insufficient to detect hair signal at all. Even among the two species with significant enrichment, there are thousands of total genes identified, many of which are likely related to other unique characteristics of those species other than hairlessness, and it is impossible to distinguish the hair-related genes from the other genes without additional information.

      All of these results are reported in the manuscript in the text and figures shown below:

      Species-Specific Analyses

      In addition to conducting convergent evolution analyses to identify genetic elements evolving at different rates across all hairless species, we also conducted complementary analyses to detect elements evolving at different rates in individual hairless species to demonstrate the importance of convergent evolution in our analyses. Indeed, the strength of enrichment for hair follicle-related genes among top hits steadily increases as more hairless species share rate shifts in those genes, an indicator of the power of the convergent signal (Figure 2). Further, analyses on single species alone only show enrichment for hair follicle-related genes among top hits in two hairless species out of ten – armadillo and pig (Figure 2 Supplement 1). Together, these results demonstrate the importance of testing for convergent evolutionary rate shifts across all hairless mammals to best detect hair-related elements.

      Also of important note is that every individual hairless species has thousands of genes with significant rate shifts in that species (Supp. File 10). It is impossible to tell which of those rate shifts is associated with hairlessness specifically because the species have many unique phenotypes other than hairlessness that could be responsible for rate shifts in their respective genes. Convergent analyses allow for more concrete identification of hair-related elements by weeding out rate shifts that are not shared across species with the convergent hairless phenotype.

      2) It would be interesting to break up noncoding into additional strata. For example, one might predict that rate shifts in predicted transcription factor binding sites would have a larger functional impact than rate shifts in noncoding regions with no function. Or... that rate shifts in highly conserved noncoding regions vs. less conserved noncoding regions.

      We have performed extensive analyses to investigate the roles of TFBSs in the convergent evolution of hairlessness and found little enrichment of specific TFBS in our top noncoding regions from RERconverge. Perhaps because the noncoding regions are highly conserved, they contain many potential locations for TF binding and so it may be more reasonable to consider their full stretch of sequence as functional than it would be if they were less conserved.

      We have calculated conservation scores for noncoding regions and found no global association between RERconverge results and sequence conservation score.

      3) Why is aardvark considered a haired species? Aardvarks have as much (or as little) hair as pigs.

      Body hair is a difficult phenotype to categorize in mammals because all mammals do have hair. In order to create a binary distinction between hairy and hairless mammals, we needed to make a choice about where to draw that line. We were particularly concerned about the impact of assigning some of the hairier mammals, like pig, armadillo, and human, as hairless, so we performed the drop-out tests shown in Figure 4 to demonstrate that removing individual hairless species from our analyses does not change the overall signal. Indeed, removing pig impacts detection of genes in the two hair-related pathways shown less than removing clearly hairless species like killer whale or dolphin. We believe that these results are sufficient to demonstrate that subtle differences in phenotyping decisions will not substantially change the findings stated in our manuscript.

      4) The primary goal of the paper is to identify coding/noncoding regions that show shifts in evolutionary that are correlated on hairless vs. haired lineages. I was left wondering... when these correlations are found, how often is it due to the same mutations hitting the regions vs. mutations randomly hitting the same regions. If the former, this would suggest some limited way that species can achieve "hairlessness".

      In general, we do not expect amino acid convergence (for genes) or nucleotide convergence (for noncoding regions) to drive much of the signal we detect using RERconverge. For species separated by millions of years of evolutionary time, it is highly unlikely that a change in a single amino acid (or nucleotide) would drive exactly the same phenotypic change for a highly complex phenotype like hairlessness. However, we argue that there do appear to be some limited ways that species become hairless, albeit at the scale of evolutionary rates across a length of sequence rather than individual bases.

      Related to this point is the distinction between positively selected regions compared to regions under reduced constraint, which we would expect to accumulate mutations randomly.

      For genes, we believe that accelerated evolution of specific genomic regions in hairless species is caused by an accumulation of random mutations, not positive selection or specific targeted mutations. As stated in the manuscript, we performed branch-site tests for positive selection on our top genes, all KRTs, and all KRTAPs, and we found little indication that quickly evolving genes are undergoing positive selection specific to hairless species. This conclusion is also consistent with the hypothesis that genes under relaxation of evolutionary constraint will have rate shifts that are easier to detect over long periods of evolutionary time compared to genes under more subtle and short-lived periods of positive selection in association with the establishment of a new phenotype.

      For noncoding regions, it is much more difficult to distinguish positive selection from relaxation of evolutionary constraint because it is difficult to establish an estimate of neutral evolution for those sequences. Models of positive selection in regulatory sequence is a current area of emerging research in the field and are not yet reliable enough to make the distinction between positive selection and accumulation of random mutations.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors took advantage of an existing protein-trap resource in zebrafish to identify genes important for normal pacemaker function in adults. They generated a collection of lines with mutation in genes that expressed at reasonably high levels in the heart and assess their ECG. They identified 3 candidates with increased incidence of sinus arrest and focused on validation of dnajb6b. The dnjb6b mutant fish display other defects including enhanced response to atropine and carbacol and bradycardia. They show that dnajb6b is expressed in a subset of cells in the sinus node in zebrafish. In mouse sinus node, DNAJB6 expressing cells have low expression of TBX3 and its target HCN4. In addition, Dnajb6b+/- mice also display similar phenotypes. Analysis of pacemaker function in ex vivo mouse hearts by high-resolution fluorescent optical mapping of action potentials revealed that the number of leading pacemakers in Dnajb6b+/- hearts is decreased in the sinus node, with a concomitant increase in the auxiliary pacemakers. RNAseq analysis of the right atrial tissues detected expression changes in ion channels and genes involved in Ca2+ handling and Wnt signaling. Overall, the results support the conclusion that DNAJB6 is important for proper sinus node function, thus adding it to the short list of sick sinus syndrome genes. However, the manuscript has several weaknesses.

      Weakness:

      The manuscript does not address the mechanism by which decreased DNAJB6B causes sick sinus syndrome. For example, it is unknown if DNAJB6B functions cell autonomously or non-cell autonomously in the sinus node. The RNAseq analysis identified changes in ion channels in the right atrial tissues of 1-year old mice, cellular electrophysiology of the sinus node cells was not assessed.

      The main goal of this research is to prove the feasibility of discovering novel SSS genes in adults via a forward genetic approach in zebrafish. Thus, the major hallmark would be to prove causality and specificity of the candidate genes identified from this screen, such as Dnajb6. Comprehensive mechanistic study would be a focus for future studies.

      Nevertheless, we carried out the following experiments to address the mechanisms. Based on these data, a new section was added to the discussion section (Lines 424-465).

      (1) In mice, we did more antibody immunostaining and confirmed a negative correlation in terms of expression intensity between the Dnajb6 and Tbx3 proteins. We further detected a significantly increased Tbx3 immunostaining signal in the SAN tissues of Dnajb6 heterozygous mice compared to WT controls (new Figure 3D-F).

      (2) In zebrafish, we compared expression patterns of the sqET33-mi59B conduction system reporter line between the GBT411/dnajb6b heterozygous and homozygous mutants. We found the atrio-ventricular canal (AVC) signal became diffused in GBT411/dnajb6b homozygous adult hearts. In addition, the ring-like structure usually seen in the SAN region of WT controls and in the GBT411/dnajb6 heterozygous was largely lost in 3 out of 9 GBT411/dnajb6b homozygous adult hearts examined (new Figure 2).

      Together with the ectopic pacemaker activity detected in the Dnajb6 heterozygous mice (new Figure 5A and 5B), we speculate that Dnajb6 might act as a suppressor of Tbx3 transcription factor in defining cell fate specification into SAN pacemaker myocytes. Since Tbx3 was reported to suppress chamber myocardial differentiation (Mommersteeg et al., Circ Res. 2007;100(3):354-62), upregulation of Tbx3 may thus contribute to enhanced atrial ectopic activity in Dnajb6 heterozygous mice.

      Furthermore, TBX3 has been recently identified as a component of the Wnt/β-catenin-dependent transcriptional complex (Zimmerli et al., eLife. 2020;9:e58123), which is significantly affected in Dnajb6 heterozygous mice (see new Figure 7B-C). This further supports a possible role of TBX3 in both SAN and atrial remodeling.

      (3) Finally, in collaboration with Drs. Grandi, Morotti, and Ni from University of California Davis, we utilized a population-based computational modeling approach to determine the cellular/ionic mechanisms that could underlie the ex vivo observed SSS phenotype in the Dnajb6 heterozygous mice (new Figure 6). We used our previously published model of the mouse SAN myocyte (Morotti et al. Int J Mol Sci. 2021; 22(11):5645) and enhanced it with addition of both sympathetic and parasympathetic stimulations to model the effects of isoproterenol- and carbachol-induced changes in pacemaker activity (i.e., firing rate), respectively. We generated a population of 10,000 mouse SAN myocyte models by random modification of selected model parameters describing maximum ion channel conductances and ion transport rates from the baseline model and assessed isoproterenol- and carbachol-induced effects on each model variant. We then separated this population of models in two subpopulations representing the WT and Dnajb6+/- mice phenotypes: namely, we extracted the model variants that recapitulate changes observed in Dnajb6+/- vs. WT mice, including a reduced firing rate at baseline, an increased response to isoproterenol, and a decreased response to carbachol administration (new Figure 6). This filtering process resulted in n=438 models that correspond to the Dnajb6+/- mice phenotype and n=6,995 models that correspond to the WT phenotype. We analyzed the parameter value differences in these two subgroups to revealed several crucial parameters that are significantly correlated with the observed electrophysiological changes. The analysis revealed a significant decrease in the maximal conductances of the fast (Nav1.5) sodium current, the L-type Ca2+ current (ICa,L), the transient outward, sustained, and acetylcholine-activated K+ currents, the background Na+ and Ca2+ currents, as well as the ryanodine receptor maximal release flux of the Dnajb6+/- vs. WT model variants. We also found a significant increase in the Na+/Ca2+ exchanger (NCX) maximal transport rate, and conductances of the T-type Ca2+ current and the slowly-activating delayed rectifier K+ current. These new studies provide some novel mechanistic insights into the observed SSS phenotype in Dnajb6+/- mice. Importantly, these new in silico experiments add another conceptual level to the phenotype-based screening approach introduced in the current study to identify new genetic factors associated with SAN dysfunction. Direct testing of these mechanisms would require a substantial amount of single SAN cell patch clamp and confocal microscopy experiments which are out of scope of the current manuscript and will be pursued in a follow-up study.

      The manuscript does not address why the zebrafish homozygous mutants are adult viable while the mouse homozygotes are embryonic lethal. The insertion of the GBT411 disrupt dnajb6b(L) but not dnajb6b(S), while the mouse mutation deletes the entire gene. Does this difference partially explain the difference?

      Indeed, the difference between zebrafish and mouse can be partially explained by the fact that only the long isoform of dnajb6b gene, dnajb6b(L), was disrupted in the GBT411 mutant, while both the long-Dnajb6(L) and short-Dnajb6(S) isoforms of Dnajb6 gene was largely deleted in the Dnajb6 knockout mice. However, we think the main reason is probably that functional redundancy in zebrafish but not mouse: zebrafish has two dnajb6 homologues, dnajb6b and dnajb6a, while mouse has only one Dnajb6 homologue. We added these points to the paper (Lines 377-379).

      Reviewer #2 (Public Review):

      In this manuscript, the authors expand upon previous work describing development of a protein trap library made with the gene-break transposon. This library was screened to identify lines displaying gene trap expression in the heart (zebrafish insertional cardiac mutant collection). A pilot screen of these lines using adult ECG phenotypes identifies dnajb6b as a new gene important for cardiac rhythm. Using the GBT/dnajb6b zebrafish line, Ding et al. find a proportion of aged homozygous mutant fish (1.5-2 years) present sinus arrest episodes and reduced heart rate. Treating GBT411/dnajb6b mutant adults with compounds revealed aberrant responses to autonomic stimuli, and sinus arrest episodes were induced following verapamil exposure, providing evidence that GBT411/dnajb6b as an arrhythmia mutant. This conclusion could be better supported by presenting specific ECG parameters to characterize the conduction defect more thoroughly. The authors then report that Dnajb6+/- adult mice recapitulate some of the phenotypes observed in zebrafish, including sinus arrest and AV blocks, as well as impaired (although different) responses to autonomic stimuli. The authors describe that these are features of sick sinus syndrome in the absence of cardiomyopathy phenotypes in either the zebrafish or mouse lines. However, overall cardiac morphology is not well described for either the GBT411/dnajb6b or Dnajb6+/- models.

      We carried out more experiments to examine left ventricular (LV) structure in Dnajb6 heterozygous mice at 1 year of age, using H&E staining, Masson’s trichrome staining, and transmission electron microscopy (TEM) analysis. We now show clearly that there are no significant myocardium structural changes in the LV as well as atrial and SAN tissues of Dnajb6 heterozygous mice (new Supplemental Figures 3 and 5), when the SSS phenotype was already noticeable. However, in the GBT411/dnajb6b heterozygous mutant at ~2 years of age, we detected severe sarcomere structural abnormality in 1 out of 3 fish hearts examined (see Response-only Figure 1). In addition, in a previous publication (Ding et al., Circ Res, 2013:112(40:606-17), we reported evident cardiac remodeling phenotypes in the GBT411/dnajb6b homozygous fish at 12 months of age.

      Together, we have obtained more experimental evidence to strengthen the claim that arrhythmia is not due to cardiomyopathy/structural remodeling in the Dnajb6+/- mice. However, the evidence from fish remains weak. Therefore, we removed the claim that “when structural remodeling/cardiac dysfunction have not yet occurred” in fish and modified our statement in mice accordingly (Lines 372-377, 385-386).

      To further support a role for Dnajb6 in sinoatrial node dysfunction, the authors performed optical mapping of action potentials from isolated mouse atrial tissue. These data reveal that Dnajb6+/- cultures exhibit ectopic pacemakers outside of the sinoatrial node, including within the atrial wall and inter-atrial septum. These data also show prolongation of SAN recovery time at baseline and following autonomic stimulation, further suggesting SAN dysfunction. RNA-sequencing experiments of DNAjb6+/- adult right atrial tissue showed differentially expressed genes encoding Ca2+ handling related proteins, ion channels, and WNT pathway related proteins. As these genes are involved in the cardiac conduction system, the authors suggest these pathways as molecular mechanisms underlying SSS phenotypes in Dnajb6 models.

      Sick sinus syndrome is a relatively rare arrhythmia most commonly found in older populations. Therefore, it has been challenging to establish clinically relevant models and there is a limited understanding of mechanisms of SSS pathogenesis. One particular strength of this manuscript is the ECG phenotype-based forward screen of the gene-breaking transposon (GBT)-based gene trap library in aged animals. This pilot study provides proof-of-concept that this screening approach is well suited to identify regulators of cardiac function in adults and genes linked to adult diseases like SSS.

      Thank you very much for recognizing the major strength of our manuscript!

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the role of a long noncoding RNA VPS9D1-AS1(VPS) in colorectal cancer (CRC). They found that a high level of VPS was negatively associated with T cell infiltration in CRC patients; in cell line-derived xenograft models or a conditional knock-in mouse model, VPS overexpression enhanced tumor growth and suppressed the infiltration of CD8+ T cells, which was reverted by VPS antisense oligonucleotide (ASO) treatment. They also investigated the molecular mechanisms underlying VPS function and revealed a VPS/TGF-β/ISG signaling cascade in tumor cells and crosstalk between tumors and T cells depending on IFNAR1 level.

      The authors had performed extensive analyses on the functions of VPS using patient samples, CRC cell lines, xenograft tumors, and drug-induced tumors, and the data were of relatively good quality; they targeted VPS overexpression in cell line-derived xenografts or mouse tumors by ASO treatment as potential therapeutics, although the overexpression level may not be physiologically relevant. The authors also made great efforts to explore the mechanisms in vitro and proposed a very interesting model of ribosomes/VPS/TGF-β/ISG signaling axis in tumor cells and opposing regulation on IFNAR1 in tumor and T cells; however, the mechanistic model was tested in vitro, not in cell line-derived xenografts or mouse tumors used in the study, which undermined the authors' claims.

      Thanks for these positive comments from reviewer #1, and we attached great importance to critical comments about in vivo data.

      Reviewer #2 (Public Review):

      In this paper, Yang et al. seek to show the importance of the lncRNA VPS9D1-AS1 in the biology and pathology of colorectal cancer (CRC). Starting with the analysis of patient data, and proceeding to cellular and animal cancer models.

      Specifically, the authors report higher VPS9D1-AS1 levels in tumor tissues in two independent cohorts of CRC patients. There was a positive association between VPS9D1-AS1 levels and molecules involved in TGFb signaling, yet a negative association between VPS9D1-AS1 levels and levels of tumor-infiltrating CD8+ T cells (and a negative correlation of these levels of tumor-infiltrating CD8+ T cells and protein expression of molecules involved in TGFb signaling). Cell line studies revealed a positive feedback loop between VPS9D1-AS1 and TGFb signaling molecules, with a cell-intrinsic, pro-proliferative, and pro-survival effect of VPS9D1-AS1 on CRC cancer cells. VPS9D1-AS1 also controls the expression of several genes in the IFN pathway, in particular the ISGs IFI27 and OAS1. In addition, IFI27 and OAS1 expression are controlled by TGFb, TGFBR1, and SMAD1, and the promoter of OAS1 is targeted by SMAD4 (but also TGFb), which binds to it. VPS9D1-AS1 expression in tumor cells promotes PD1 expression and negatively affects IFNAR1 on T cells to reduce their effector functions. In vivo, MC38 CRC cells overexpressing VPS9D1-AS1 show increased tumor growth in mice, and animals with transgenic VPS9D1-AS1 expression in the intestine develop larger CRC lesions upon AOM/DSS treatment. Finally, in vivo targeting of VPS9D1-AS1 using anti-sense oligo reduced tumor size. The data indicate a series of intricate molecular and cellular interactions and suggest that VPS9D1-AS1 can help with patient stratification, improving prognostic prediction and allowing for personalized treatment.

      Taken together, there is a multitude of datasets and several complementary experiments using patient-derived samples, genetically engineered cell lines, and mouse models. Definitely, the paper includes many avenues of inquiry that cover the broad field of cancer molecular biology, biochemistry, and pathogenesis. However, this broad approach renders the paper difficult to follow at times and also leads to numerous typographical and interpretive (but, largely, not methodological), mistakes. In addition, the quality of some of the figures needs to be improved before they can be properly evaluated.

      In methodology, the authors are largely successful, and I would not recommend major changes to the work, other than to recommend a "focusing" of the manuscript objectives, or a paring of the data to better convey the desired story.

      The experiments presented herein, particularly those that test the efficacy of the lncRNA as cancer therapeutics are important for the field, and should be of high import to other cancer biologists.

      We thank you very much for your constructive comments. We had replied all your concerns

      Reviewer #3 (Public Review):

      The authors have accomplished large amounts of work to prove the role of VPS9D1-AS1 in promoting immune escape from cytotoxic T cells, and the mechanistic exploration is valid enough to support the conclusions, as well as the translational significance of this target through in vivo experiments. However, the logicality of the diagram requires improvement, and several revisions are warranted.

      We thank for reviewer’s positive comments. We revised our manuscript according to your suggestions

    1. Author Response

      Reviewer #2 (Public Review):

      This study has investigated the pathway for degradation of the inner nuclear membrane protein SUN2. Based on earlier studies that had searched for bTrCP substrates and interactors, the authors postulated that SUN2 might be a target of this ligase. They found two potential bTrCP recognition sites and showed that the second of these, Site2, is important for SUN2 turnover. A phospho-mimetic mutant is turned over faster, and a phospho-resistant mutant is turned over slower. The degradation is slowed by inhibitors of Neddylation (and therefore, of Cullin Ring Ligases), inhibitors of p97, and proteasome inhibitors. Using a genetic screen, they find bTrCP, components of the Cullin ring ligase, p97, the proteasome, and subunits of CK2. They use inhibitors to show that CK2 is needed for maximal SUN2 degradation, and a phosphatase called CTDNEP1 antagonises CK2-mediated SUN2 degradation. Using a non-degraded variant of SUN2, the authors show that its overexpression can influence nuclear morphology and various nuclear functions. In sum, the authors outline a pathway for regulated degradation of the inner nuclear membrane SUN2.

      The study is generally sound in its logic, well written, and appropriately interpreted for the most part. The data are of high quality. The findings are new and will provide a foundation to now examine how LINC complex abundance is regulated. I have a number of suggestions for improvement, listed in order of importance. Only the first two should require any experimental work, and the second item is potentially optional depending on the authors' response. The remaining items can be handled with adjustments to the manuscript.

      1) It is surprising that nowhere in the paper is an experiment directly and rigorously establishing that bTrCP is required for SUN2 degradation. I realise this is quite plausible from the shown experiments, but it seems to be a rather glaring oversight (apologies if I have missed it somewhere). At present, the current evidence for its role is the similarity of Site2 to a bTrCP recognition motif, the physical interaction of SUN2 with bTrCP, and the modulation of this interaction by mutants intended to mimic or eliminate phosphorylation. The inhibitor experiment is not strong evidence because it inhibits all CRLs. I would therefore recommend, at the least, to present an experiment knocking down or out bTrCP2 (i.e., FBXW11, which nicely showed up in the genetic screen). This simple experiment could be included in the validation experiments in Fig. S4b. It would be worth also including FBXW1A for comparison, and if needed, the double-knockdown. This seems essential to complete the study.

      We thank the reviewer for suggestion. These experiments have now been included in the manuscript. For further information, see response to Editor’s point 1.

      2) The experiments with TBCA are not complemented with knockdown experiments of CK2 subunits. I realise CK2 is essential, but cells can evidently tolerate acute knockdown sufficiently well to do experiments given that this came up in the CRISPR screen. I would think such knockdown experiments would strengthen the argument and mitigate any concern about the off-target effects of TBCA. Kinase inhibitors are often only partially specific, so arguments about the involvement of any kinase are stronger if inhibitor studies are complemented with genetic perturbations.

      We thank the reviewer for suggestion. We have now included knockdown experiments with independent sgRNAs that validate our conclusion on the role of CK2 in SUN2 degradation (Figure Supplement 4C). In addition, we would like to point out that besides the essential CK2 regulatory subunit CSNK2B, our genome widescreen also identified the catalytic subunit CSNK2A2 (non-essential as it is redundant with CSNK2A1) (see Figure S3A). Considering that our library contains 4 sgRNAs per gene, this makes a total of 8 sgRNAs targeting subunits of CK2. Importantly no other kinase was identified in our screen. Moreover, TBCA is well established as a specific CK2 inhibitor. Altogether, these various observations make us quite confident that CK2 is the prime kinase controlling SUN2 stability.

      3) Lines 173-183: MLN4924 is used interchangeably with inhibition of SCFbTrCP. But MLN4924 is an NAE inhibitor that indirectly inhibits all CRLs. It seems premature to invoke SCFbTrCP as being involved because the experiments have not yet established a role for this specific CRL (see point 1 above). Instead, the conclusion should be that the data indicate a role for one or more CRLs. At this point in the narrative, the only evidence that bTrCP is involved is the sequence similarity of site1 and site2 to canonical bTrCP recognition sites. However, this is not enough evidence as no experiments knocking down or knocking out bTrCP, or experiments showing a physical interaction, have been presented yet. That comes in the subsequent section.

      We thank the reviewer for pointing this out. The text was modified and additional data on the depletion of βTrCP has been included in the revised manuscript to support our conclusions.

      4) Line 195 - At this point in the narrative, there is no evidence that SUN2 is ubiquitinated by SCFbTrCP. This needs to be rephrased. I would think one can conclude at this point that SUN2 is degraded by a pathway that relies on a CRL, p97, and the proteasome. The degradation is controlled by Site2, potentially by phosphorylation (again, this has not really been established at this point in the story, even if it seems plausible based on the mutagenesis).

      The sentence has been modified.

      5) I think the discussion needs to include some thoughts on what the authors believe happens to the rest of the SUN2 trimer or more broadly, the LINC complexes. In other words, what is the consequence of degrading a single protein of a much larger complex? In this vein, the model shows monomeric SUN2. Is it worth showing that it is part of a trimer and part of the LINC complexes? Regardless of how the authors depict the model, discussing this issue seems worthwhile.

      We thank the reviewer for the suggestion. We observe that the turnover rates of endogenous SUN2 is affected by the exogenous expression of SUN2 and primarily its derivatives Site 2A and Site 2D (Figure 1E). The effects are likely due to the assembly of trimers containing both endogenous and exogenous SUN2. This observation also suggests that degradation of one of the subunits in the trimer leads to the degradation of the other two. However, in the current manuscript we do not directly test or analyse these models or look at SUN2 complexes.

      6) Lines 225-226 - again, MLN4924 is not an inhibitor of SCFbTrCP, but rather a CRL inhibitor. The evidence for bTrCP being the key ligase is still missing at this point in the narrative.

      We now present evidence in an earlier figure that βTrCP is the F-Box involved in SUN2 degradation. In this context, the sentence appears correct.

      7) Fig. 5G is not especially convincing - to my eye, the effect on endogenous SUN2 is very similar to the effect on the transgene SUN2-site2A mutant, but simply a fainter exposure. Can the authors provide some numbers to allay this concern? It might well be that there is little difference between the behaviour of the endogenous and exogenous SUN2 in this experiment because they engage in heterotrimeric complexes. Also, why is the transgenic SUN2 not detected on the SUN2 blot? Would it not be evident at ~100 kD?

      We have consistently seen that SUN2 Site 2A is refractory to CTDNEP1 regulation. The blot has been replaced to better convey this result.

      The transgenic SUN2 is not detected in this blot because while the same cell lines were used for this experiment, to visualise the endogenous SUN2, doxycycline were not added to these cells. Thus, two sets of lysates were collected, one for cells that were treated with Doxycycline (transgene) and one without Doxycycline (endogenous). This is explained in the figure legend.

      8) In panel 1E, the heterologously expressed SUN2 protein has two bands, with the upper band being more readily degraded than the lower band in some cases. Is the upper band the phosphorylated product? Might be worth a comment if anything is known about what the two bands represent.

      We believe that the two bands do not correspond to different phosphorylated SUN2 forms. This is based on the analysis of SUN2 by SDS-PAGE in presence of Phostag reagent and the fact that two bands are seen both also for non-phosphorylatable and phospho-mimetic SUN2 derivatives. The appearance of two bands has been observed for other ERAD substrates characterized in our lab (for example Weijer et al. 2020) and appears to depend on the lysis conditions (see for example Figure 2 and 3).

      9) Worth mentioning in the main text that FBXW11 is bTRCP2. Also, it is worth noting whether bTRCP1 (FBXW1A) was a hit on the screen or not.

      Thanks for the suggestion. We have now included this information.

      Reviewer #3 (Public Review):

      The manuscript by Krshnan et al. reports a cellular mechanism akin to the endoplasmic reticulum-associated degradation (ERAD) that degrades SUN2, a nuclear inner membrane protein. The authors previously identified the Asi ubiquitin ligase complex that mediates the degradation of inner nuclear membrane proteins in budding yeast. In this manuscript, they identified the SCF β TrCP, and SCF as another ligase that regulates the ubiquitination and degradation of SUN2 in mammalian cells. The key findings include the identification of a substrate recognition motif that appears to undergo casein kinase (CK) dependent phosphorylation. Mutagenesis studies show that mutants defective in phosphorylation are stabilized while a phosphor-mimetic mutant is more unstable. They further show that the degradation of SUN2 requires the AAA ATPase p97, which allows them to draw the analogy between SUN2 degradation and Vpu-induced degradation of CD4, which occurs on the ER membrane via the ERAD pathway. Lastly, they show that the stability of endogenous SUN2 is regulated by a phosphatase and that over-expression of a non-degradable SUN2 variant disrupts nuclear envelope morphology, cell cycle kinetics, and DNA repair efficiency. Overall, the study dissects another example of inner nuclear envelope protein turnover and the involvement of a pair of kinase and phosphatase in this regulation. The data are of extremely high quality and the manuscript is clearly written. That being said, the following questions should be addressed to improve the robustness of the conclusions and to avoid potential misinterpretation of the data.

      1) Since SUN2 is normally incorporated into a SUN2-SYNE2-KASH2 LINC heterohexamer complex, the authors should be cautious with the use of over-expressed SUN2 in this study. Over-expressed SUN2 is expected to stay mostly as unassembled molecules and thus is likely degraded by a protein quality control mechanism that targets unassembled proteins. Consistent with this possibility, CK2 has been implicated in the regulated turnover of aggregation-prone proteins (Watabe, M. et al., JCS 2011). This mechanism would be potentially distinct from the one proposed for endogenous SUN2 degradation.

      We thank reviewers for the suggestion to provide further genetic evidence of the involvement of βTrCP1 and 2 F-box proteins in the degradation of SUN2. We now show that maximum stabilization of endogenous (Figure Supplement S4D) and transgenic (Figure S2) SUN2 is observed upon simultaneous depletion of βTrCP1 and βTrCP2 indicating that these F-Box proteins are redundant. Depletion of βTrCP1 alone did not impact SUN2 levels while depletion of βTrCP2 increased SUN2 steady state levels, with the effect being more pronounced for overexpressed SUN2. Depletion of other F-Box proteins did not affect SUN2 levels indicating that the effect observed for βTrCP1 is specific (Figure S2B). These results are in line with the results of our genome wide screen (Figure 4 and S3) and the literature. The differences in the effects of βTrCP1 and βTrCP2 depletion likely result from the relative abundance of the two F-Box proteins in the HEK cells used in this study.

      2) Certain conclusions appear to be an overstatement. This is particularly the case for the title, which implies that SUN2 is a protein that undergoes regulated turnover (under certain physiological conditions). Given that CK2 is a constitutive kinase and that the authors have not identified the conditions under which the activity of CTDNEP1 is regulated, it is premature to make such a conclusion.

      We disagree with the reviewer in this point. We present clear evidence that the turnover rate of SUN2 (both overexpressed and endogenous) is regulated by opposing kinase/phosphatase activities. This per se implies a mode of regulation. Similar kinase/phosphatase balances regulate a plethora of physiologic processes (from cell cycle progression to DNA repair) and the term “regulation” is commonly used in these contexts. We agree with the reviewer that upstream events controlling SUN2 remain elusive however, we do present evidence the balance of CK2 and CTDNEP1 activities regulate SUN2 degradation.

      3) Likewise, the demonstration of the impact of SUN2 accumulation on different cellular pathways mainly relies on the over-expression of a non-degradable SUN2 mutant. Whether similar defects could be seen when the degradation of endogenous SUN2 is blocked remains an open question.

      It would be great to gene edit the SUN2 locus to introduce the desired mutations. But as pointed out this is not trivial, in particular considering that the desired mutations would need to be introduced in both chromosomal copies.

    1. Author Response

      Reviewer #1 (Public Review):

      “Overall this is an interesting study of the function of ATP6AP2 in the osteoblastic lineage. This gene is unstudied in the osteoblast, despite its known role in WNT signaling. In this study, the authors first show that loss of this gene in mature osteoblasts results in a strong cortical bone phenotype, with reduced osteocyte numbers and disorganized collagen. This phenotype is not present at birth but progressively worsens as the animals reach weaning age. In the compact bone, they show that loss of ATP6AP2 results in osteocytes largely devoid of dendritic processes. Loss of this gene starting at the osteocyte stage results in a milder phenotype. They then show that the osteocytes presenting have reduced MMP14 and that partial restoration of MM14 attenuates the severity of the cortical phenotype.”

      Strengths

      “This study uses cutting-edge microscopy to thoroughly characterize how and where the loss of ATP6AP2 in either the mature osteoblast or the osteocyte results in disorganized bone. Innovative proteomics techniques are used to identify cell surface proteins, including MMP14 that may mediate this phenotype. Two cre-drivers are used to determine when in the osteoblast-osteocyte lineage this gene has the maximum effect. Lastly, in vivo lentivirus replacement is used to test if the replacement of MMP14 can rescue the phenotype. This latter experiment solidifies the importance of MMP14 as a major player in the downstream sequela of ATP6AP2 action.”

      Weaknesses

      “Unfortunately, all of the histology is conducted on demineralized bone, and counts of osteoblasts and osteoclasts on the bone surface are not presented. This reduces the ability to interpret all downstream work. As such, the extent of the mineralization defects is difficult to interpret. Much of this paper is focused on the osteocyte, which is curious as the phenotype of the mature osteoblasts ATP6AP2 knockout mice is so much more severe than that of the osteocyte ATP6AP2 knockout mice. While it is clear how MMP14 was identified as being deficient in the mature osteoblasts ATP6AP2 knockout cells, it is not obvious how this gene became the sole focus of the remainder of this paper. This phenotype progresses as the mice become ambulatory and therefore weight bearing on their limbs. This could partially explain the presentation of the mouse phenotype, but this is not discussed.”

      Good suggestions! We have performed the suggested experiments on mineralized bone sections, and quantified both osteoblasts and osteoclasts on the bone surface.

      The results, shown in Fig. 1A-B above, demonstrated increased osteoclast numbers in both trabecular and endocortical bone surfaces in ATP6AP2 mutant mice, which were accompanied with elevated bone resorption (see Fig. 1C, measured by serum levels of PYD). However, upon bisphosphonates (alendronate) treatment, an inhibitor of osteoclastic activity, the trabecular bone mass was restored, but little effect on the cortical bone phenotype in the mutant mice (Fig. 2A-F). These results thus suggest an osteoclast activity independent cortical bone phenotype in the mutant mice.

      We thus further investigated the cortical bone phenotypes and the osteoclast independent underlying mechanisms. Whereases no significant change in the number of osteoblasts was detected in the metaphysis region of femur in ATP6AP2Ocn-cre mice (see Fig. 6A-B below), we did detect mineralization deficit in the mutant mice by both in vivo and in vitro experiments (see Fig. 5 and Fig. 7). These results suggest that the increased cortical woven bone in the mutant mice is likely due to an impairment in the replacement of woven bone with the mineralized cortical bone matrix.

      Additionally, the expression levels of ATP6AP2 in osteocytes appeared to be similar to that in osteoblasts and BMSCs (Fig. 8). The phenotype of osteocyte ATP6AP2 knockout mice (ATP6AP2DMP-Cre) appeared to be weaker than that of the BMSCs/osteoblasts ATP6AP2 knockout mice (ATP6AP2OCN-Cre) led us to speculate that ATP6AP2 in Ocn-Cre+ osteoblastic cells (e.g., immature osteocytes) may play a more critical role than that in DMP1-Cre+ mature osteocytes in regulating cortical bone matrix remodeling and osteocyte development.

      These results and points, in line with our model, will be included into a revised manuscript.

      Reviewer #3 (Public Review):

      “In this work, the authors have assessed the bone phenotype of a mouse with targeted ablation of the vacuolar ATPase accessory protein ATP6AP2 in the osteoblast lineage. They observe a clear increase in cortical thickness, but the cortex is highly porous and contains remnant cartilage as well as extensive woven bone. They then follow this by suggesting that one cause of this phenotype may be a change in the surface expression of the protein MMP14, a matrix metalloproteinase, known to be involved in bone matrix degradation, at least in osteoclasts. They provide evidence that this protein may also regulate matrix degradation surrounding osteocytes and an increase in this protein in osteocytes lacking ATP6AP2 may be a cause of the initial phenotype described.”

      While the phenotype described is very dramatic, the interpretation that it reflects a defect in osteoblast to osteocyte transition is questioned by this reviewer. The phenotype appears to be an osteopetrosis, including a lack of remodelling of the cortex. Cartilage and woven bone are not replaced effectively by lamellar bone. The bone contains ample osteocytes, but they are the osteocytes typical of woven bone, with rounded cell bodies, disordered organisation, low sclerostin expression, and short dendritic processes. The defect in the ATP6AP2 mice is a lack of cortical remodelling during cortical consolidation (for review see PMID: 34196732). Cartilage and woven bone remnants, which are normally remodelled as cortical bone matures, remain in the cortex until adulthood. It is not clear whether this results from reduced or increased remodelling of the cortex, but it is not because the osteoblasts cannot form osteocytes.

      Some of the data is very challenging to interpret because of low sample numbers (n=4 for much of the analysis), and lack of detail as to the sex of the animals. Regions used for imaging, histomorphometry, and dynamic histomorphometry all need to be defined throughout the work. Since the cortex differs dramatically by site, and by distance from the growth plate (due to the different stages of maturation) this is critical. Some methods are not defined, although they could be of great use to the field (e.g. the method for assessing bone degradation by MMP14).

      Good suggestions! We will describe the results more precisely in a revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, Bentley et al. describe the development and use of a novel microfluidic platform to study motility of green algae. By confining algae to circular corrals of various diameters (and with a height that renders the system quasi-two-dimensional), the authors gather extremely long time series of the swimming trajectories under various degrees of lateral confinement, in the presence of several different kinds of perturbations.

      The data is presented in a number of ways, most importantly by means of transitions between the three characteristic states of motion for these algae. This allows contact to be made with ideas from nonequilibrium dynamical systems by examining the transition probabilities between those states and identifying nonequilibrium characteristics of the fluxes between them.

      Overall the work is extremely impressive in terms of the data acquisition and careful time series analysis. The work falls short though in not following through on the many interesting observations that can be deduced from the data to come to precise conclusions about the biology and physics. For example, we see in Figs. 2 and 3 the effects of confinement on the trajectories, leading to clearly chiral motion at the strongest confinement. I would have expected the next step of the analysis to be a study of this problem in the context of, say, a Fokker-Planck equation for the probability distribution function for orientations, complete with boundary conditions that encode the scattering laws that we know from prior work by Kantsler et al. and others. Similar comments can be made about the other observations, which are followed up with any clear mechanistic analysis or comparison with theory.

      The example above suggests that this paper, in its current form, is more akin to a "Methodology" paper than one that discovers new phenomena and explains them.

      We thank the reviewer for their summary of our work and for these pertinent comments. As discussed above, we performed new experiments and modelling to successfully answer the main question of why chiral circling appears in the smallest traps (highest confinement), and also why the chirality depends on light. As demonstrated in the prior Cammann et al PNAS (and to some extent also the Ostapenko et al, PRL) study, encoding the scattering laws measured from Kantsler et al for a basic swimmer produces chiral motion (circular movement). An analytical treatment in terms of FP equations already appears in these prior studies. However, the novelty here is why this circular movement should remain chiral in the time-averaged sense.

      In the revisions, we restrict ourselves to a conceptual-level where we show how a small internal asymmetry at the cellular level suffices to produce macroscopic chirality and how this depends on the size of the trap. Our new explanation reaches a precise conclusion about how a fundamentally biological phenomenon (slightly asymmetric flagellar beating in a phototactic swimmer), leads to a confinementinduced physical phenomenon (a preferred sense of circular swimming).

      In a separate follow-up study, we will extend our model to incorporate more realistic parameters from the current dataset (e.g. time-dependent speed, stochastic reorientations, shock-responses, softness of the potential etc) to understand more subtle aspects of the high-resolution data we acquired.

      Reviewer #2 (Public Review):

      The authors use microfluidic devices to follow single swimmers for long periods, measuring their movement in detail and allowing detailed statistics at a level that has never been possible before and machine learning.

      Its strength is the extraordinary detail and the doors opened by the quality of the resultant data. As such it makes a substantial contribution to a narrow field and adds slightly more subtly to an important field of full mathematically accessible descriptions of migration phenotypes.

      Its weakness is that these tools are not yet used for any particularly enlightening tests. The directed probability fluxes are interesting, but not surprising. The strength of this paper is in the method, the analysis, and the ability to generate rigorous datasets.

      We thank the reviewer for highlighting the quality and detail of our datasets, and we agree with the criticisms raised. We hope these weaknesses are now rectified in the revision. There is clearly scope to do much more now that we have access to this data, and we demonstrate this in the revision with our new model/interpretation.

      We highlight three innovations of our work that may not have been made clear in the previous draft.

      1. We have suggested a new paradigm for analysis of microbial motility and behaviour. The extraction of state transition probabilities from single-cell trajectories reveals exactly how motility changes at the subcellular level, which is much more informative than whether an organism speeds up or slows down on average. This tells us how does a given individual modulates the balance of possible behavioural states in response to their environment and also over time. These concepts apply not just microbes but to any behaving organism.

      2. The emphasis on keeping track of the ‘arrow of time’ in the analysis of movement trajectories in important, can again be applied to any organism. As discussed above, while circling behaviour or symmetry breaking in confinement may not be surprising itself (though that does not prevent the flurry of experimental and theoretical papers on this topic), we argue that the emergence of chirality in the timeaveraged trajectory is surprising and does requires more subtle treatment. We now suggest this is down to a very small amount of internal symmetry breaking – it is interesting that such a small amount of symmetry-breaking at the sub-cellular scale can manifest as robust symmetry-breaking at the macroscopic scale.

      This kind of insight has broad implications for understanding how (even simple) organisms can dramatically alter how they interact with their physical environment by effecting even minute internal adjustments. This could also motivate the design of novel biomimetic artificial devices or microswimmers.

      1. Our approach of fusing droplets to investigate rapid motility responses to chemicals has plenty of potential for drug screening and also for investigating cellular transduction pathways (e.g. functional assays of mutants). We demonstrate its operation here as proof of concept on one species for one chemical only, but there are clear advantages over traditional approaches involving setting up chemical gradients or similar, where it is impossible to get a handle on instantaneous cell reactions nor individual-level responses.
    1. Author Response

      Reviewer #1 (Public Review):

      This paper follows several innovative articles from the authors exploring the molecular mechanisms of insulin and IGF1 receptors activation by their ligands using cryo-electron microscopy. Here the authors explore the role of an alpha helical C-terminal segment (called the alpha-CT motif) of a disordered disulfide-linked insert domain in the FnIII-2 module of the insulin and IGF1 receptors (at the end of the alpha subunit), in the mechanism of ligand binding, negative cooperativity and receptor activation.

      Biochemical data gathered over several decades have suggested that insulin and IGF1 use two separate binding sites, site 1 and site 2, to bind to two distinct domains (sites 1 and 2, and 1'and 2') on each protomer of the homodimeric receptors, disposed in an antiparallel symmetry. This disposition was corroborated by the early x-ray crystallographic studies of the unliganded insulin receptor ectodomain (apo-receptor). A subsequent somewhat surprising finding was that the insulin receptor site 1 is in fact a composite, made of the beta surface of the L1 module of one protomer, and of the alpha-CT motif of the other protomer which binds perpendicularly to the L1 surface (a "tandem binding element"), with insulin binding more to the alpha-CT motif than to L1.

      Previous work from the authors showed that the subsaturated insulin receptor has an asymmetric configuration while the receptor saturated with 4 insulins has a symmetric T-shaped configuration. In contrast, the IGF1R shows only one IGF1 bound to an asymmetric configuration, indicating according to the authors a stronger negative cooperativity. This is attributed to a more rigid and elongated conformation of the alpha-CT motives that restricts the structural flexibility of the alternate binding site.

      To test this hypothesis, the authors determined the cryo-EM structure of IGF1 bound to IGF1R with a mutated alpha-CT motif elongated by four glycine residues. Strikingly, a portion of these constructs adopt a T-shaped symmetric structure.

      Conversely, they show that the cryo-EM structure of insulin bound to an insulin receptor with non-covalently bound alpha-CTs insert domains by mutation of the cysteines to serine adopts asymmetric conformations even at saturated insulin concentrations. They conclude that the alpha-CTs in disulfide-linked insert domains of the insulin receptor play an important role in the structural transition from asymmetric to symmetric during the insulin-induced insulin receptor activation.

      All in all, this is a very interesting and well-designed study that represents an advance in the knowledge of the insulin/IGF1 receptor systems, although the details of the structural interpretations deserve some discussion.

      This is very clear and succinct summary of our work. We thank Dr. Pierre De Meyts for the positive assessment of our manuscript, and we greatly appreciate the constructive comments which we have addressed.

      Reviewer #2 (Public Review):

      Li et al build upon recent observations that the alphaCT peptide is a key element in the IGF-1R and IR that regulates negative cooperativity and receptor activation. The use of IGF-1R and IR mutants builds upon previous observations with these mutants by Li et al (IGF-1R) and Weis et al. (IR).

      Here they determined the structures of the IGF-1R mutant, IGF-1R-P673G4, which has a 4 glycine motif inserted at residue P673 at 4Å resolution. By introducing structural flexibility in the alphaCT the IGF-1R is able to bind 2 IGFs and adopts a symmetric T conformation, which is in contrast to the single IGF bound WT IGF-1R that adopts an asymmetric conformation. The ability to bind two IGFs is taken as a sign that negative cooperativity has been affected and confirms the importance of the alphaCT in constraining the IGF-1R into the asymmetric conformation. The increased flexibility of the alphaCT linkage between the two receptor monomers results in reduced ability of IGF-I to activate the IGF-1R and Erk leading to reduced IGF-1R internalisation. This is consistent with previous reports that effective Erk signalling is dependent on endosomal signalling. A second mutant, IGF-1R -3CS, was also used where a cysteine triplet in the alphaCT is mutated to serine to perturb the disulfide bonding between the alphaCTs of the two monomers. IGF-1R activation and signalling by this mutant was also reduced.

      In addition, the structure of an equivalent insulin receptor mutant, IR-3CS, was determined with complexes formed with excess insulin. Again, the increased flexibility of the alphaCT altered the structural rearrangement upon ligand binding. Three conformations with 4 insulins bound were detected, two unique asymmetric (4.5 Å and 4.9 Å) and one symmetric (3.7 Å), whereas WT IR:insulin complex predominantly forms a symmetric T 4 insulin bound structure. This suggests the IR alphaCT is important in stablising the active T structure. In contrast to the IGF-1R -3CS, the IR-3CS has the same affinity for insulin as WT IR, is more potently activated (pY1150/Y1151) by insulin but has a reduced signalling response. This demonstrates the role of alphaCT in the activation.

      Whilst the symmetric IR-3CS:insulin complex structure is compared with the WT IR: insulin complex, no comparisons were made between the asymmetric conformations described here and those previously reported. Is the ligand binding in the asymmetric conformations different to the asymmetric binding seen when WT IR:insulin complexes were generated at low insulin concentrations? It would be interesting to see these overlaid. How do these asymmetric conformations relate to the existing asymmetric conformations reported by Nielsen (10.1016/j.jmb.2022.167458) and Xiong (DOI 10.1038/s41589-022-00981-0)?

      Thank you for the good suggestions. We have now prepared a new Figure 4-supplement 2 that compares the structure of asymmetric IR-3CS/insulin with that of asymmetric IR bound with subsaturated insulin previously published by us and others. All asymmetric structures of IR bound with subsaturated insulin have similar structural features, i.e., in one half of the complex, one insulin bound at site-1 also contacts site-2 from adjacent protomer, or vice versa. However, in the asymmetric structure of IR-3CS/insulin, two insulins were bound at the hybrid site in the middle of the IR-3CS/insulin complex. To accommodate the binding of two insulins, the L1/αCT together with bound site-1 insulin move outward as compared to the asymmetric structure of IR bound with subsaturated insulin. This is the major structural difference between these asymmetric structures. We have discussed this in the revised manuscript.

      What is the distance between the FnIII-3 domains of the IR:insulin asymmetric conformations and in the symmetric structure? Does this correlate to activity as is seen for the IGF-1R-P673G4? It would be good to comment on this, particularly as there is an interesting disconnect between the receptor activation and downstream signalling activity. Why is there greater pY1150/Y1151 activation than for the WT IR and how can the lower downstream signalling activity be explained?

      We thank Dr. Briony Forbes for raising this point. Asymmetric IR-3CS/insulin, asymmetric IR/insulin and symmetric IR/insulin have similar distances between their membrane-proximal regions (approximately 30 – 35 Å). This indicates that the distances between the membrane-proximal regions within these complexes are all short enough to allow the intracellular kinase to undergo efficient autophosphorylation, in contrast to IGF1R.

      As indicated by Dr. Forbes, our cellular functional assays showed that the IR-3CS has higher levels of autophosphorylation, but lower levels of downstream signaling activity and a defect in endocytosis. Although the distances between the membrane-proximal regions are similar, the relative positions and orientations between the two membrane proximal regions are significantly different between asymmetric IR-3CS and symmetric IR. Given the fact that the FnIII-3 domain is connected to the transmembrane domain by a short linker containing four residues, we speculate that the structural differences in the extracellular domains may lead to both differential dimeric assembly of transmembrane and intracellular domains, as well as the stable interaction between the intracellular IR domains and downstream adaptors and effectors. This could in part explain why IR-3CS can still undergo robust autophosphorylation but its downstream signaling becomes defective. Similar hypothesis has been proposed in the EGF and TGF-α induced activation of EGFR (PMID: 34846302). The endocytosis defects of IR-3CS might be the result of reduced IR signaling, but it is tempting to speculate that less endocytosis of IR-3CS may cause defective downstream signaling. The structure of transmembrane and intracellular domains in the context of the entire full-length/insulin complex needs to be further investigated. We have included new analysis and expanded the discussion.

      It would be good to reword the opening statement that "IGF1 only has one type of ligand binding site (site-1)" to acknowledge that two binding sites on IGF-I have been detected through analysis of competition binding studies which are fitted to a two-site sequential model and detect both high affinity and low affinity binding (Kiselyov). Site directed mutagenesis studies of both IGFs have detected two binding surfaces analogous to insulin's site 1 and site 2 (Gaugin et al and Alvino et al). Furthermore, binding assays with mini-IGF-1R (L1, CR, L2 fused to alphaCT, ie site 1 only) clearly demonstrated that IGF-II site 2 residues do contribute to overall binding affinity (Alvino et al). Perhaps we are yet to capture site 2 of IGF-1R as it is not in the same location as IR site 2? It would be good to comment on this.

      Point accepted. Gauguin L. et al. demonstrated that alanine mutagenesis in IGF1, including E9A, D12A, F16A, D53A, L54A, and E58A, markedly reduced IGF1R-binding affinity. With the exception of IGF1 E9 (Site-1b of IGF1R, the same position in IGF2, E12), none of IGF1 D12, F16, D53, L54, and E58 are involved in the binding to site-1, suggesting that IGF1 has an additional site that maximizes the binding to the receptor. Despite saturated IGF1 levels, however, our previous and current structural studies did not reveal the putative site-2 of IGF1R-IGF1 binding. We speculate that IGF1 binds to site-2 transiently, which might be important for IGF1-induced activation of IGF1R. We have revised the manuscript and expanded the discussion.

      Reviewer #3 (Public Review):

      Li et al. present cryo-EM structures of the insulin receptor (IR) and insulin-like growth factor-1 receptor (IGF1R), exploring the functional roles of the disulfide-linked alphaCT regions in ligand binding and receptor activation.

      Cryo-EM structures of mutants of IGF1R and IR designed to increase the flexibility between disulfide-linked alphaCT regions revealed conformational states that were distinct from those of the wild-type (WT) receptors. Mutant (P673G4) IGF1R displayed conformations in which two IGF1 molecules were bound, rather than the 1:1 ligand:receptor state observed previously for WT IGF1R. Mutant (3CS) IR displayed asymmetric conformations with four insulin molecules bound, as well as the symmetric T conformation with four insulin molecules bound observed previously for WT IR. In each case, the mutant receptor was shown in cells to be poorly activated by its respective ligand.

      This study demonstrates the importance of the disulfide-coupled alphaCT regions in the IR and IGF1R for ligand binding and receptor activation. What is not resolved in this study is whether differences in the alphaCT regions of these two highly related receptors contribute to their disparate active states - asymmetric for IGF1R (and 1:1 IGF1:IGF1R) vs. symmetric (T) for IR (and 4:1 insulin:IR).

      We thank Dr. Stevan Hubbard for the positive assessment of our manuscript, and we greatly appreciate the constructive comments which we have addressed.

    1. Author Response

      Reviewer #1 (Public Review):

      In one of the most creative eDNA studies I have had the pleasure to review, the authors have taken advantage of an existing program several decades old to address whether insect declines are indeed occurring - an active area of discussion and debate within ecology. Here, they extracted arthropod environmental DNA (eDNA) from pulverized leaf samples collected from different tree species across different habitats. Their aim was to assess the arthropod community composition within the canopies of these trees during the time of collection to assess whether arthropod richness, diversity, and biomass were declining. By utilizing these leaf samples, the greatest shortcoming of assessing arthropod declines - the lack of historical data to compare to - was overcome, and strong timeseries evidence can now be used to inform the discussion. Through their use of eDNA metabarcoding, they were able to determine that richness was not declining, but there was evidence of beta diversity loss due to biotic homogenization occurring across different habitats. Furthermore, their application of qPCR to assess changes in eDNA copy number temporally and associate those changes with changes to arthropod biomass provided support to the argument that arthropod biomass is indeed declining. Taken together, these data add substantial weight to the current discussion regarding how arthropods are being affected in the Anthropocene.

      Thank you very much for the positive assessment of our work.

      I find the conclusions of the paper to be sound and mostly defensible, though there are some issues to take note of that may undermine these findings.

      Firstly, I saw no explanation of the requisite controls for such an experiment. An experiment of this scale should have detailed explanations of the field/equipment controls, extraction controls, and PCR controls to ensure there are no contamination issues that would otherwise undermine the entirety of the study. At one point in the manuscript the presence of controls is mentioned just once, so I surmise they must exist. Trusting such results needs to be taken with caution until such evidence is clearly outlined. Furthermore, the plate layout which includes these controls would help assess the extent of tag-jumping, should the plate plan proposed in Taberlet et al., 2018 be adopted.

      Second, without the presence of adequate controls, filtering schemes would be unable to determine whether there were contaminants and also be unable to remove them. This would also prevent samples from being filtered out should there be excessive levels of contamination present. Without such information, it makes it difficult to fully trust the data as presented.

      Finally, there is insufficient detail regarding the decontamination procedures of equipment used to prepare the samples (e.g., the cryomil). Without clear explanations of the steps the authors took to ensure samples were handled and prepared correctly, there is yet more concern that there may be unseen problems with the dataset.

      We are well aware of the potential issues and consequences of contamination in our work. However, we are also confident that our field and laboratory procedures adequately rule out these issues. We agree with the reviewer that we should expand more on our reasoning. Hence, we have now significantly expanded the Methods section outlining controls and sample purity, particularly under “Tree samples of the German Environmental Specimen Bank – Standardized time series samples stored at ultra-low temperatures” (lines 303-304), “Test for DNA carryover in the cryomill” (lines 448-464) and “Statistical analysis” (lines 570-575).

      We ran negative control extractions as well as negative control PCRs with all samples. These controls were sequenced along with all samples and used to explore the effect of experimental contamination. With the exception of a few reads of abundant taxa, these controls were mostly clean. We report this in more detail now in the Methods under “Sequence analysis” (lines 570-575). This suggests that our data are free of experimental contamination or tag jumping issues.

      We have also expanded on the avoidance of contamination in our field sampling protocols. The ESB has been set up for monitoring even the tiniest trace amounts of chemicals. Carryover between samples would render the samples useless. Hence, highly clean and standardized protocols are implemented. All samples are only collected with sterilized equipment under sterile conditions. Each piece of equipment is thoroughly decontaminated before sampling.

      The cryomill is another potential source of cross-contamination. The mill is disassembled after each sample and thoroughly cleaned. Milled samples have already been tested for chemical carryover, and none was found. We have now added an additional analysis to rule out DNA carryover. We received the milling schedule of samples for the past years. Assuming samples get contaminated by carryover between milling runs, two consecutive samples should show signatures of this carryover. We tested this for singletaxon carryover as well as community-wide beta diversity, but did not find any signal of contamination. This gives us confidence that our samples are very pure. The results of this test are now reported in the manuscript (Suppl. Fig 12 & Suppl. Table 3).

      Reviewer #2 (Public Review):

      Krehenwinkel et al. investigated the long-term temporal dynamics of arthropod communities using environmental DNA (eDNA) remained in archived leave samples. The authors first developed a method to recover arthropod eDNA from archived leave samples and carefully tested whether the developed method could reasonably reveal the dynamics of arthropod communities where the leave samples originated. Then, using the eDNA method, the authors analyzed 30-year-long well-archived tree leaf samples in Germany and reconstructed the long-term temporal dynamics of arthropod communities associated with the tree species. The reconstructed time series includes several thousand arthropod species belonging to 23 orders, and the authors found interesting patterns in the time series. Contrary to some previous studies, the authors did not find widespread temporal α-diversity (OTU richness and haplotype diversity) declines. Instead, β-diversity among study sites gradually decreased, suggesting that the arthropod communities are more spatially homogenized in recent years. Overall, the authors suggested that the temporal dynamics of arthropod communities may be complex and involve changes in α- and β-diversity and demonstrated the usefulness of their unique eDNA-based approach.

      Strengths:

      The authors' idea that using eDNA remained in archived leave samples is unique and potentially applicable to other systems. For example, different types of specimens archived in museums may be utilized for reconstructing long-term community dynamics of other organisms, which would be beneficial for understanding and predicting ecosystem dynamics.

      A great strength of this work is that the authors very carefully tested their method. For example, the authors tested the effects of powdered leaves input weights, sampling methods, storing methods, PCR primers, and days from last precipitation to sampling on the eDNA metabarcoding results. The results showed that the tested variables did not significantly impact the eDNA metabarcoding results, which convinced me that the proposed method reasonably recovers arthropod eDNA from the archived leaf samples. Furthermore, the authors developed a method that can separately quantify 18S DNA copy numbers of arthropods and plants, which enables the estimations of relative arthropod eDNA copy numbers. While most eDNA studies provide relative abundance only, the DNA copy numbers measured in this study provide valuable information on arthropod community dynamics.

      Overall, the authors' idea is excellent, and I believe that the developed eDNA methodology reasonably reconstructed the long-term temporal dynamics of the target organisms, which are major strengths of this study.

      Thank you very much for the positive assessment of our work.

      Weaknesses:

      Although this work has major strengths in the eDNA experimental part, there are concerns in DNA sequence processing and statistical analyses.

      Statistical methods to analyze the temporal trend are too simplistic. The methods used in the study did not consider possible autocorrelation and other structures that the eDNA time series might have. It is well known that the applications of simple linear models to time series with autocorrelation structure incorrectly detect a "significant" temporal trend. For example, a linear model can often detect a significant trend even in a random walk time series.

      We have now reanalyzed our data controlling for autocorrelation and for non-linear changes of abundance and recover no change to our results. We have added this information to the manuscript under “Statistical analysis” (lines 629-644).

      Also, there are some issues regarding the DNA sequence analysis and the subsequent use of the results. For example, read abundance was used in the statistical model, but the read abundance cannot be a proxy for species abundance/biomass. Because the total 18S DNA copy numbers of arthropods were quantified in the study, multiplying the sequence-based relative abundance by the total 18S DNA copy numbers may produce a better proxy of the abundance of arthropods, and the use of such a better proxy would be more appropriate here. In addition, a coverage-based rarefaction enables a more rigorous comparison of diversity (OTU diversity or haplotype diversity) than the readbased rarefaction does.

      We did not use read abundance as a proxy for abundance, but used our qPCR approach to measure relative copy number of arthropods. While there are biases to this (see our explanations above), the assay proved very reliable and robust. We thus believe it should indeed provide a rough estimate of biomass. As biomass is very commonly discussed in insect decline (in fact the first study on insect decline entirely relies on biomass; Hallmann et al. 2017), we feel it is important go include a proxy for this as well. However, we also discuss the alternative option that a turnover of diversity is affecting the measured biomass. A pattern of abundance loss for common species has been described in other works on insect decline.

      We liked the reviewer’s suggestion to use copy number information to perform abundance-informed rarefaction. We have done this now and added an additional analysis rarefying by copy number/biomass. A parallel analysis using this newly rarefied table was done for the total diversity as well as single species abundance change. Details can be found in the Methods and Results section of the manuscript. However, the result essentially remains the same. Even abundance-informed rarefaction does not lead to a pattern of loss of species richness over time (see “Statistical analysis”).

      The overall results are supporting a scenario of no overall loss of species richness over time, but a loss of abundance for common species. And we indeed see the pattern of declining abundance for once-common species in our data, for example the loss of the Green Silver-Line moth, once a very common species in beech canopy (Suppl. Fig. 10). We have added details on this to the Discussion (lines 254-260).

      These points may significantly impact the conclusions of this work.

      Reviewer #3 (Public Review):

      The aim of Weber and colleagues' study was to generate arthropod environmental DNA extracted from a unique 30-year time series of deep-frozen leaf material sampled at 24 German sites, that represent four different land use types. Using this dataset, they explore how the arthropod community has changed through time in these sites, using both conventional metabarcoding to reconstruct the OTUs present, and a new qPCR assay developed to estimate the overall arthropod diversity on the collected material. Overall their results show that while no clear changes in alpha diversity are found, the βdiversity dropped significantly over time in many sites, most notable in the beech forests. Overall I believe their data supports these findings, and thus their conclusion that diversity is becoming homogenized through time is valid.

      Thank you for the positive assessment.

      While overall I do not doubt the general findings, I have a number of comments. Firstly while I agree this is a very nice study on a unique dataset - other temporal datasets of insects that were used for eDNA studies do exist, and perhaps it would be relevant to put the findings into context (or even the study design) of other work that has been done on such datasets. One example that jumps to my mind is Thomsen et al. 2015 https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.12452 but I am sure there are others.

      We have expanded the introduction and discussion on this citing this among other studies now (lines 71-72, 276-278).

      From a technical point of view, the conclusions of course rely on several assumptions, including (1) that the biomass assay is effective and (2) that the reconstructed levels of OTU diversity are accurate,

      With regards to biomass although it is stated in the manuscript that "Relative eDNA copy number should be a predictor for relative biomass ", this is in fact only true if one assumes a number of things, e.g. there is a similar copy number of 18s rDNA per species, similar numbers of mtDNA per cell, a similar number of cells per individual species etc. In this regard, on the positive side, it is gratifying to see that the authors perform a validation assay on 7 mock controls, and these seem to indicate the assay works well. Given how critical this is, I recommend discussing the details of this a bit more, and why the authors are convinced the assay is effective in the main text so that the reader is able to fully decide if they are in agreement. However perhaps on the negative side, I am concerned about the strategy taken to perform the qPCR may have not been ideal. Specifically, the assay is based on nested PCR, where the authors first perform a 15cycle amplification, this product is purified, then put into a subsequent qPCR. Given how both PCR is notorious for introducing amplification biases in general (especially when performed on low levels of DNA), and the fact that nested PCRs are notoriously contamination prone - this approach seems to be asking for trouble. This raises the question - why not just do the qPCR directly on the extracts (one can still dilute the plant DNA 100x prior to qPCR if needed). Further, given the qPCRs were run in triplicate I think the full data (Ct values) for this should be released (as opposed to just stating in the paper that the average values were used). In this way, the readers will be able to judge how replicable the assay was - something I think is critical given how noisy the patterns in Fig S10 seem to be.

      We agree with this point, and this is why we do not want to overstate the decline in copy number. This is an additional source of data next to genetic and species diversity. We have added to our discussion of turnover as another potential driver of copy number change (lines 257-260). We have also added text addressing the robustness of the mock community assay (lines 138-141).

      However, we are confident of the reliability and robustness of our qPCR assay for the detection of relative arthropod copy number. We performed several validations and optimizations before using the assay. We have added additional details to the manuscript on this (see “Detection of relative arthropod DNA copy number using quantitative PCR”, lines 548-556). We got the idea for the nested qPCR from a study (Tran et al.) showing its high accuracy and reproducibility. We show that our assay has a very high replicability using triplicates of each qPCR, which we will now include in the supplementary data on Dryad. The SD of Ct values is very low (~ 0.1 on average). NTC were run with all qPCRs to rule out contamination as an issue in the experiments. We also find a very high efficiency of the assay. At dilutions far outside the observed copy number in our actual leaf data, we still find the assay to be accurate. We found very comparable abundance changes across our highly taxonomically diverse mock communities. This also suggests that abundance changes are a more likely explanation than simple turnover for the observed drop in copy number. A biomass loss for common species is well in line with recent reports on insect decline. We can also rely on several other mock community studies (Krehenwinkel et al. 2017 & 2019) where we used read abundance of 18S and found it to be a relatively good predictor of relative biomass.

      The pattern in Fig. S10 is not really noisy. It just reflects typical population fluctuations for arthropods. Most arthropod taxa undergo very pronounced temporal abundance fluctuations between years.

      Next, with regards to the observation that the results reveal an overall decrease in arthropod biomass over time: The authors suggest one alternate to their theory, that the dropping DNA copy number may reflect taxonomic turnover of species with different eDNA shedding rates. Could there be another potential explanation - simply be that leaves are getting denser/larger? Can this be ruled out in some way, e.g. via data on leaf mass through time for these trees? (From this dataset or indeed any other place).

      This is a very good point. However, we can rule out this hypothesis, as the ESB performs intensive biometric data analysis. The average leaf weight and water content have not significantly changed in our sites. We have addressed this in the Methods section (see ”Tree samples of the German Environmental Specimen Bank – Standardized time series samples stored at ultra-low temperatures”, lines 308-311).

      With regards to estimates of OTU/zOTU diversity. The authors state in the manuscript that zOTUs represent individual haplotypes, thus genetic variation within species. This is only true if they do not represent PCR and/or sequencing errors. Perhaps therefore they would be able to elaborate (for the non-computational/eDNA specialist reader) on why their sequence processing methods rule out this possibility? One very good bit of evidence would be that identical haplotypes for the individual species are found in the replicate PCRs. Or even between different extractions at single locations/timepoints.

      We have repeated the analysis of genetic variation with much more stringent filtering criteria (see “Statistical analysis”, lines 611-615). Among other filtering steps, this also includes the use of only those zOTUs that occur in both technical replicates, as suggested by the reviewer. Another reason to make us believe we are dealing with true haplotypic variation here is that haplotypes show geographic variation. E.g., some haplotypes are more abundant in some sites than in others. NUMTS would consistently show a simple correlation in their abundance with the most abundant true haplotype.

      With regards to the bigger picture, one thing I found very interesting from a technical point of view is that the authors explored how modifying the mass of plant material used in the extraction affects the overall results, and basically find that using more than 200mg provides no real advantage. In this regard, I draw the authors and readers attention to an excellent paper by Mata et al. (https://onlinelibrary.wiley.com/doi/full/10.1111/mec.14779) - where these authors compare the effect of increasing the amount of bat faeces used in a bat diet metabarcoding study, on the OTUs generated. Essentially Mata and colleagues report that as the amount of faeces increases, the rare taxa (e.g. those found at a low level in a single faeces) get lost - they are simply diluted out by the common taxa (e.g those in all faeces). In contrast, increasing biological replicates (in their case more individual faecal samples) increased diversity. I think these results are relevant in the context of the experiment described in this new manuscript, as they seem to show similar results - there is no benefit of considerably increasing the amount of leaf tissue used. And if so, this seems to point to a general principal of relevance to the design of metabarcoding studies, thus of likely wide interest.

      Thank you for this interesting study, which we were not aware of before. The cryomilling is an extremely efficient approach to equally disperse even traces of chemicals in a sample. This has been established for trace chemicals early during the operation of the ESB, but also seems to hold true for eDNA in the samples. We have recently done more replication experiments from different ESB samples (different terrestrial and marine samples for different taxonomic groups) and find that replication of extraction does not provide much more benefit than replication of PCR. Even after 2 replicates, diversity approaches saturation. This can be seen in the plot below, which shows recovered eDNA diversity for different ESB samples and different taxonomic groups from 1-4 replicates. A single extract of a small volume contains DNA from nearly all taxa in the community. Rare taxa can be enriched with more PCR replicates.

    1. Author Response

      Reviewer #1 (Public Review):

      Previous studies have linked several lifestyle-related factors, such as body mass index and smoking, alcohol use with accelerated biological aging measured using epigenetic clocks, however, most of them focused on single lifestyle factors based on cross-sectional data from older adults. The current study has a couple of major strengths: it has a decent sample size, lifestyle was measured longitudinally during puberty and adolescence, it looked at the effect of multiple lifestyle measures collectively, it looked at multiple epigenetic clocks, and due to the data from twins, it could examine the contribution of genetic and environmental influences to the outcomes. I have a couple of comments that are mainly aimed at improving the clarity of the methods (e.g. how was multiple testing correction done, how did the association model account for the clustering of twin data, how many samples were measured on 450k vs EPIC and were raw or pre-QC'd data supplied to the online epigenetic age calculator), and interpretation of findings (why were 2 measures of Dunedin PACE of aging used, how much are results driven by BMI versus the other lifestyle factors, and the discussion on shared genetic influences should be more nuanced; it includes both pleiotropic effects and causal effects among lifestyle and biological ageing).

      Thank you for the encouraging comments and important suggestions.

      Reviewer #2 (Public Review):

      Kankaanpää and colleagues studied how lifestyle factors in adolescence (e.g., smoking, BMI, alcohol and exercise) associate with advanced epigenetic age in early adulthood.

      Strengths:

      The manuscript is very well written. Although the analyses and results are complex, the authors manage very well to convey the key messages.

      The twin dataset is large and longitudinal, making this an excellent resource to assess the research questions.

      The analyses are advanced including LCA capitalizing on the strength of these data.

      The authors also include a wider range of epigenetic age measures (n=6) as well as a broader range of lifestyle habits. This provides a more comprehensive view that also acknowledges that associations were not uniform across all epigenetic age measures.

      Weaknesses:

      The accuracy of the epigenetic age predictions was moderate with quite large mean absolute errors (e.g., +7 years for Horvath and -9 years for PhenoAge). Also, no correlations with chronological age are presented. With these large errors it is difficult to tease apart meaningful deviations (between chronological and biological age) from prediction error.

      The authors claim that 'the unhealthiest lifestyle class, in which smoking and alcohol use co-occurred, exhibited accelerated biological aging...'. However, this is only partially true. For example, PhenoAge was not accelerated in lifestyle class C5. Similarly, all classes showed some degree of deceleration (not acceleration) with respect to DunedinPACE (Figure 3D). The large degree of heterogeneity across different epigenetic age measures needs to be acknowledged.

      The authors claim that 'Practically all variance of AAPheno and DunedinPACE common with adolescent lifestyle was explained by shared genetic factors'. However, Figure 4 suggest that most of the variation (up to 96%) remained unexplained and genetics only explained around 10-15% of total variation. The large amount of unexplained variation should be acknowledged.

      Thank you for the encouraging comments and important notes.

      We have now acknowledged that the standard deviations of epigenetic age estimates were high (lines 409-418). Due to the narrow age range of this study, the correlations between chronological age and epigenetic age estimates were weak. We aimed to overcome these weaknesses and calculated the epigenetic age estimates using recently developed principal component (PC)-based clocks, which are shown to improve the reliability and validity of epigenetic clocks (Higgins-Chen et al., 2022). In our data, the standard deviations of epigenetic age estimates were similar or even higher compared with those obtained with the original clocks, but the correlations between epigenetic age acceleration measures assessed with different clocks were consistently higher when PC-based epigenetic clocks were used. Importantly, the observed associations with the adolescent lifestyle behavior patterns did not substantially change.

      Moreover, we have now more carefully reported and interpreted the results obtained using different epigenetic aging measures and acknowledged their heterogeneity (lines 459-467).

      Figure 4 presents the genetic and environmental influences on biological aging shared with adolescent lifestyle and biological aging. There are also unique genetic and environmental influences on biological aging not shown in the figure. Therefore, the unexplained variation in biological aging was not that large. Most of the total variation in biological aging was explained by the genetic factors unique to biological aging. We have now clarified the description of the estimation of genetic and environmental influences (lines 283-300) and the presentation of the results (lines 437-449).

      References:

      Higgins-Chen, A. T., Thrush, K. L., Wang, Y., Minteer, C. J., Kuo, P.-L., Wang, M., Niimi, P., Sturm, G., Lin, J., Moore, A. Z., Bandinelli, S., Vinkers, C. H., Vermetten, E., Rutten, B. P. F., Geuze, E., Okhuijsen-Pfeifer, C., van der Horst, M. Z., Schreiter, S., Gutwinski, S., … Levine, M. E. (2022). A computational solution for bolstering reliability of epigenetic clocks: implications for clinical trials and longitudinal tracking. Nature Aging, 2(7), 644–661. https://doi.org/10.1038/s43587-022-00248-2

    1. Author Response

      Reviewer #1 (Public Review):

      This report describes evidence that the main driving force for stimulation of glycolysis in cultured DGC neurons by electrical activity comes from influx of Na+ including Na+ exchanging into the cell for Ca2+. The findings are presented very clearly and the authors' interpretations seem reasonable. This is important and impactful because it identifies the major energy demand in excited neurons that stimulates glycolysis to supply more ATP.

      Strengths are the highly rigorous use of fluorescent probes to directly monitor the concentrations of NADH/NAD+, Ca2+ and Na+. The strategies directly test the roles of Na+ and Ca2+.

      A weakness is an ambiguity about the effects of ouabain to inhibit the Na+/K+ ATPase directly and the absence of biochemical controls to validate the interpretation of the ouabain experiment.

      We appreciate the reviewer's comments about the work. While we can not rule out non-specific effects of ouabain at the concentrations needed to block Na+/K+ ATPase in these experiments, we do think that we can rely on the prior biochemical work characterizing the multiple components of ouabain binding in fresh mouse brain tissue, which is a close match to the acute mouse brain slice tissue used here.

      Reviewer #2 (Public Review):

      This study seeks to determine how neuronal glycolysis is coupled to electrical activity. Previous studies had found that glycolytic enzymes cluster within nerve terminals (in C. elegans) during activity. Furthermore, the glucose transporter GLUT4 is recruited to synaptic surface during activity. The authors previously showed that Ca2+ does not stimulate glycolysis in active neurons. Here, the authors show that the cytosolic Na+, not Ca2+, and the activity of the Na+/K+ pump drive glycolysis. However, it is important to note that in this study, glycolysis was examined in the soma, not nerve terminals, where some of the previous studies were conducted. A few other caveats in the interpretation of the findings are listed below:

      1) The NADH/NAD+ ratio is used throughout as the only measurement reflecting glycolytic flux.

      In this and previous work, we have validated that increased cytosolic NADH production (whose major sources are related to glycolysis), rather than altered NADH reoxidation, produces the changes in NADH/NAD+ ratio.

      2) It has been hypothesized that the close association of glycolytic enzymes with ion transporters (such as the Na+/K+ pump) is meant to provide localized ATP to power these pumps. How does bulk glycolysis (monitored with NADH/NAD+ ratio) relate to localized/compartmentalized glycolysis?

      Even if glycolysis is indeed localized to the plasma membrane (an interesting and difficult-to-address hypothesis), we believe that because the mitochondrial shuttles are the main pathway for NADH re-oxidation, and most mitochondria are not localized to the plasma membrane, changes in glycolytic NADH production are likely to be reflected in changes of the bulk cytosolic NADH/NAD+.

      3) Related to point 2, most of the Peredox measurements in the paper have been made at baseline, in the absence of electrical activity. Therefore, it is not clear how the findings relate to activity-driven glycolysis.

      The ion exchange experiments and even the faster Ca2+ puff experiments can mimic but indeed cannot match the speed of activity-driven changes in ion concentrations. Unfortunately, it is impossible to induce normal electrical activity in neurons in the absence of extracellular Na+. We believe that the complete inability of Ca2+ elevation alone (without Na+-Ca2+ exchange) to stimulate glycolysis, combined with the substantial Ca2+ contribution to activity-driven glycolysis, makes a good argument that Ca2+ entering during activity is likely to stimulate glycolysis via Na+ entry and the Na+/K+ ATPase.

      4) The finding that inhibition of SERCA during stimulation actually elevates cytosolic NADH level argues against Na+ being the only ion that regulates glycolysis.

      The ability of SERCA inhibition to produce a small increase in activity-driven glycolysis is consistent with the simple argument that reduced SERCA-driven uptake of Ca2+ into ER results in additional Ca+ removal via Na+/Ca2+ exchange (which can then affect glycolysis via Na+ levels).

      5) The finding that "SBFI ΔF/F transients were longer in duration than the RCaMP LT transient" does not necessarily mean that Na+ elevation lasts longer than Ca2+ in the cell. This could be an artefact of the SBFI on/off rate relative to RCaMP. In fact, prolonged elevation of cytosolic Na+ would make neurons refractive to depolarization in AP trains.

      The rates of Na+ binding and unbinding to SBFI are likely to occur on the microsecond timescale (based on the known properties of crown ether molecules), much faster than the observed transient duration of approximately one minute. Prolonged elevation of cytosolic Na+ alone (to the levels seen here) should not cause neurons to be refractory to firing; refractoriness typically occurs in the setting of prolonged depolarization and consequent inactivation of NaV channels.

      Reviewer #3 (Public Review):

      Meyer et al have studied the mechanisms of glycolysis activation in the hippocampus during neuronal activity. The study is logically laid out, uses sophisticated fluorescence lifetime imaging technology and smart experimental designs. The support for intracellular [Na+] vs [Ca2+] rise driving glycolysis is strong. The evidence for the direct involvement of the Na+/K+ pump is based only on pharmacology using ouabain but the Na+/K+ pump is admittedly not an easy subject for specific perturbations. I still think that the Authors should strengthen the support for the pathway.

      We are happy that the reviewer feels that the evidence for Na+ rather than Ca2+ as the effector of glycolysis is strong. The tools for investigating the role of the Na+/K+ pump (NKA) are indeed limited to pharmacology, because (as the reviewer says) there are not many other options. The requirement for Na+ elevation (which stimulates NKA activity) to trigger glycolysis and the ability of ouabain, a specific NKA inhibitor, to prevent this seem like strong implication of NKA in the mechanism of glycolysis activation. Genetic manipulation of the NKA may be unable to change the level of pump activity, because of compensation by altered expression of other subunits (PMID 17234593); it also is unclear how any chronic manipulation would shed light on the role of NKA in triggering glycolysis. But perhaps future studies of knock-in mice in which the α1 isoform of NKA has made more sensitive to ouabain (PMIDs 15485817; 34129092) might allow the identification of the NKA as the target of ouabain in this situation to be made even more secure.

      Also, there is a long list of publications on the connection between the Na+/K+ pump and glycolysis. It might be useful to highlight the role of the NCX- Na+/K+ pump coupling in the activation of glycolysis in the title.

    1. Author Response

      Reviewer #1 (Public Review):

      Dotov et al. took joint drumming as a model of human collective dynamics. They tested interpersonal synchronization across progressively larger groups composed of 1, 2, 4 and 8 individuals. They conducted several analyses, generally showing that the stability of group coordination increases with group numerosity. They also propose a model that nicely mirrors some of the results.

      The manuscript is very clear and very well written. The introduction covers a lot of relevant literature, including animal models that are very relevant in this field but often ignored by human studies. The methods cover a wide range of distinct analyses, including modelling, giving a comprehensive overview of the data. There are a few small technical differences across the experiments conducted with small vs. large groups, but I think this is to some extent unavoidable (yet, future studies might attempt to improve this). Furthermore, the currently adopted model accounts well for behaviors where all individuals produce a similar output and therefore are "equally important". However, it might be interesting to test to what extent this can be generalized to situations where each individual produces a distinct sound (as in a small orchestra) and therefore might selectively adapt to (more clearly) distinguishable individuals.

      We agree that this is important. We discuss this in a new section (4.1) at the end of the discussion. We suggest that heterogeneity makes it possible for other modes of organization to compete with the attractive tendency towards the global average. We also point out that factors such as individual skill, task difficulty, delays, and selective attention enable such heterogeneity in the ensemble.

      Similarly, it would be interesting to test to what extent the current results (and model) can be generalized to interactions that more strongly rely on predictive behavior (as there is not much to predict here given that all participants have to drum at a stable, non-changing tempo).

      We can only speculate that the present results are less relevant to interactions that rely strongly on predicitive behavior, as behaviour in our simple task could be modeled well by our hybrid single oscillator Kuromoto model. We inserted the idea that the presence of a group rhythm can diminish the demands for individuals to predict each other’s notes, the end of paragraph 1, page 27.

      An important implication of this study is that some well-known behaviors typically studied in dyadic interaction might be less prominent when group numerosity increases. I am specifically referring to "speeding up" (also termed "joint rushing") and "tap-by-tap error correction" (Wolf et al., 2019 and Konvalinka et al., 2010, also cited in the manuscript, are two recent examples). I am not sure whether this depends on how the data is analyzed (e.g. averaging the behavior of multiple drummers), yet this might be an important take-home message.

      Thank you for the suggestion. We edited to emphasize that the relevant part of the analysis of the drumming data was performed at the individual level and using the same methods as typically done in dyadic tapping (first sentences of Section 2.7.2). Speeding up was the only variable where we used group-averages. For consistency, and to avoid confusion, in the present version we re-did the stats (the changed statistical parameters are highlighted) and figures using the individual data points and we did not observe major changes.

      I am confident that this study will have a significant impact on the field, bringing more researchers close to the study of large groups, and generally bridging the gap between human and animal studies of collective behavior.

      Reviewer #2 (Public Review):

      In this manuscript Dotov et al. study how individuals in a group adjust their rhythms and maintain synchrony while drumming. The authors recognize correctly that most investigation of rhythm interaction examines pairs (dyads) rather than larger groups despite the ubiquity of group situations and interactions in human as well as non-human animals. Their study is both empirical, using human drummers, and modeling, evaluating how well variations of the Kuramoto coupled-oscillator describe timing of grouped drummers. Based on temporal analyses of drumming in groups of different sizes, it is concluded that this coupled oscillator model provides a 'good fit' to the data and that each individual in a group responds to the collective stimulus generated by all neighbors, the 'mean field'.

      I have concerns about 1) the overall analysis and testing in the study and about 2) specific aspects of the model and how it relates to human cognition. Because the study is largely empirical, it would be most critical for the authors to propose two - or more - alternative hypotheses for achieving and maintaining synchrony in a group. Ideally, these alternatives would have different predictions, which could be tested by appropriate analyses of drummer timing. For example, in non-human animals, where the problem of rhythm interaction in groups has been examined more thoroughly than in humans, many acoustic species organize their timing by attending largely to a few nearby neighbors and ignoring the rest. Such 'selective attention' is known to occur in species where dyads (and triads) keep time with a Kuramoto oscillator, but the overall timing of the group does not arise from individual responses to the mean field. Can this alternative be evaluated in the drumming data ? Would this alternative fit the drumming data as well as, or better than , the mean field, 'wisdom of the crowd' model ?

      These are very important points. The present paper is restricted to a simple task where participants are instructed to synchronize with each other. However, we now more explicitly acknowledge the limitations of our study and include a new section, “Beyond the group average” at the end of the Discussion that is dedicated to this issue and discussed other organizing tendencies that are particularly relevant in larger and more diverse ensembles. In the context of the present task, the relative difference between local and global interactions was likely negligible because of the small differences in timing, from 4 to 16 ms, between the closest and most distant pairs.

      It will be interesting in future studies to introduce acoustic heterogeneity by varying the timbre of the instruments, for example. In the present study, the instruments had the same timbre with narrowly varying fundamental frequencies (117-129 Hz in the duets/quartets and 249-284 Hz in the octets), a situation that encourages integration of all the acoustic information. We do point out that the present approach needs to be expanded to be able to account for competitive pressure and selective attention.

      The well-known Vicsek model (discussed briefly in paragraph 2, page 15), related to the Kuramoto under certain assumptions, can account for a variety of dynamic behaviors in flocking animals. The ability for selective attention in the form of a heterogeneous coupling matrix, combined with the existence of competitive pressure in the form of negative coupling terms can result in spontaneous formation of clusters and spatiotemporal patterns of movement. This is consistent with prior research in chorusing animals (insects and anurans). Large musical ensembles also involve groupings of instruments such as separate sections that change their relative loudness across time. Typically these are not spontaneous but composed and conducted, yet they may satisfy the same constraints.

      We also pointed out that we see these as complementary organizing principles. Even in the Vicsek model, there is a notion of a ‘local order parameter’ whereby individuals are coupled to a group average within a narrow interaction radius. The relative importance of other organization tendencies depends on the layout of the acoustic environment and the competitive and collaborative aspects of the task. Hence, parameters such as delay and individual heterogeneity could act as symmetry breaking terms that enable different stabilities from the basic global group synchrony.

      A second concern arises from relying on a hybrid, continuous - pulsed version of the Kuramoto coupled oscillator. If the human drummers in the test could only hear but not see their neighbors, this hybrid model would seem appropriate: Each drummer only receives sensory input at the exact moment when a neighbor's drumstick strikes the drum. But the drummers see as well as hear their neighbors, and they may be receiving a considerable amount of information on their neighbors' rhythms throughout the drum cycle. Can this potential problem be addressed? In general, more attention should be paid to the cognitive aspects of the experiment: What exactly do the individual drummers perceive, and how might they perceive the 'mean field' ?

      This is all very relevant. We instructed participants to focus on X’s in the centers of their drums and not look at their peers (edited to mention that in at the end of Section 2.4, page 9). Additionally, the pattern of results for tempo change, cross-correlations, and variability in the dyadic condition was consistent with previous studies that involved purely auditory tapping tasks (emphasized in the begging of paragraph 2, page 26). The best way to address this limitation would be to repeat the study and block the visual contact among participants, as well as include a condition emphasizing visual contact.

      It is beyond the scope of the present paper to make model-based predictions of effects of coupling and information availability, but this should be done in future work. For the present paper, we now include a simulation involving continuous coupling (end of section 2.9.2, page 16) and Supplementary Figure 8A) which fails to reproduce the results for variability, results that are well captured by the hybrid continuous-pulsed model we developed, see the Supplementary Materials.

      Reviewer #3 (Public Review):

      The contribution provides approaches to understanding group behaviour using drumming as a case of collective dynamics. The experimental design is interestingly complemented with the novel application of several methods established in different disciplines. The key strengths of the contribution seem to be concentrated in 1) the combination of theoretical and methodological elements brought from the application of methods from neurosciences and psychology and 2) the methodological diversity and creative debate brought to the study of musical performance, including here the object of study, which looks at group drumming as a cultural trait in many societies.

      Even though the experimental design and object of study do not represent an original approach, the proposed procedures and the analytical approaches shed light on elements poorly addressed in music studies. The performers' relationships, feedbacks, differences between solo and ensemble performance and interpersonal organization convey novel ideas to the field and most probably new insights to the methodological part.

      It must be mentioned that the authors accepted the challenge of leaving the nauseatic no-frills dyadic tests and tapping experiments in the direction of more culturally comprehensive (and complex) setups. This represents a very important strength of the paper and greatly improves the communication with performers and music studies, which have been affected by the poor impact of predictable non-musical experimental tasks (that can easily generate statistical significant measurements). More specifically, the originality of the experiment-analysis approach provided a novel framework to observe how the axis from individual to collective unfolds in interaction patterns. In special, the emergence of mutual prediction in large groups is quite interesting, although similar results might be found elsewhere.

      Thank you for these comments.

      On another side, important issues regarding the literature review, experimental design and assumptions should be addressed.

      I miss an important part of the literature that reports similar experiments under the thematic framework of musical expressivity/expression, groove, microtiming and timing studies. From the participatory discrepancies proposed in 1980's Keil (1987) to the work of Benadon et al (2018), Guy Madison, colleagues and others, this literature presents formidable studies that could help understand how timing and interactions are structured and conceptualized in the music studies and by musicians and experts. (I declare that I have no recent collaborations with the authors I mentioned throughout the text and that I don't feel comfortable suggesting my own contributions to the field). This is important because there are important ontological concerns in applying methods from sciences to cultural performances.

      Thank you for the suggestions. We included a brief discussion in the newly added “Beyond the group average” section at the end of the Discussion, specifically the first paragraph, pages 27-8. We think that expressive timing naturally fits in continuation with the other reviewers’ concerns about how much the idea of the group average generalizes to real musical situations. By design and instruction, we stripped individual expression from the present task. Specific cultural contexts and performance styles may want to escape or at least expressively tackle this constraint of our task, and we believe that now that we have established the mean field as one factor affecting group behaviour, further studies can take on the challenge of developing models that make predictions in more complex situations closer to real musical interactions – and testing those models empirically.

      One ontological issue that different cultural phenomena differ from, for example, animal behaviour. For example, the authors consider timing and synchrony in a way that does not comply with cultural concepts: p.4 "Here we consider a musical task in which timing consistency and synchrony is crucial". A large part of the literature mentioned above and evidence found in ethnographic literature indicate that the ability to modulate timing and synchrony-asynchrony elements are part of explicit cultural processes of meaning formation (see, for example, Lucas, Glaura and Clayton, Martin and Leante, Laura (2011) 'Inter-group entrainment in Afro-Brazilian Congado ritual.', Empirical musicology review., 6 (2). pp. 75-102.). Without these idiosyncrasies, what you listen to can't be considered a musical task in context and lacks basic expressivity elements that represent musical meaning on different levels (see, for example, the Swanwick's work about layers/levels of musical discourse formation).

      Indeed, this is an important issue. We often use cultural phenomena merely as a motivation but do not dive in the relevant details. Here, in addition to the previous discussion, we now reiterate that the tendency towards the group average is one organizing tendency but there are additional ones, enabled by individual heterogeneity and context. For example, marching bands and chanting crowds probably impose different constraints than individual artistic expression by skillful musicians.

      Such plain ideas about the ontology of musical activities (e.g. that musical practice is oriented by precision or synchrony) generate superficial constructs such as precision priority, dance synchrony, imaginary internal oscillators, strict predictive motor planning that are not present in cultural reports, excepting some cultures of classical European music based on notation and shaped by industrial models. The lack of proper cultural framing of the drumming task might also have induced the authors to instruct the participants to minimize "temporal variability" (musical timing) and maintain the rate of the stimulus (musical tempo), even though these limiting tasks mostly take part of musical training in some societies (examples of social drumming in non-western societies barely represent isochronous tempo or timing in any linguistic or conceptual way). The authors should examine how this instruction impacts the validity of results that describe the variability since it was affected by imposed conditions and might have limited the observed behaviour. The reporting of the results in the graphs must also allow the diagnosis of the effect of timing in such small time frame windows of action.

      We agree totally. We made changes and tried to be more specific about the cultural framing, delineating contexts where the present ideas are more relevant and where they are less relevant, or at least incomplete (the bottom of page 3, and pages 27-8).

    1. Author Response

      Reviewer #1 (Public Review):

      Mitotic spindles are macromolecular machines that accurately segregate duplicate chromosomes between two daughter cells during cell division. To perform this task, spindles exert forces that are orchestrated in space and time. On the other hand, non-functioning spindles can generate chromosome segregation errors, which are present in cancers, miscarriages, and Down syndrome. Therefore, understanding spindle mechanics is a big biological challenge. In this elegant study, the authors explore the mechanical properties of the mitotic spindle. They combine a variety of experimental biophysical approaches, including microneedle manipulation and quantitative imaging, with theoretical modeling. By systematically exploring the shape of kinetochore fibers that are not manipulated, they find the force and moments that exist in the native spindles. Analyzing previously published data obtained by microneedle manipulations, where kinetochore fibers were mechanically perturbed, the authors observe a dramatic change in the shape of the kinetochore fibers. Comparing this observation and theoretical predictions, they discover a lateral anchorage near the chromosome. Taken together, this paper nicely demonstrates existence of lateral anchorage near chromosomes, offering exciting ideas about the balance of forces of the entire mitotic spindle.

      We appreciate the reviewer’s enthusiasm about the work and their thoughtful questions and suggestions to improve the manuscript.

      Major points:

      (1) In order to describe the shape of unmanipulated kinetochore fibers, the authors use a simple physical model in which they describe these fibers as a single elastic rod. They find that the observed shape is a consequence of compressive forces, or a combination of bending moments and perpendicular forces. However, it is well known that kinetochores are under the tension. For this reason, the plus end of kinetochore fibers should be under tension rather than under compression. In order to describe forces that shape unmanipulated kinetochore fibers, the authors should revise the model by setting the tensile force at the plus end of the kinetochore fiber.

      We thank the reviewer for their comment on this important point .

      (2) The authors compare the shapes of inner and outer kinetochore fibers. By using the model, they find that the forces and moments are similar for both, the inner and outer kinetochore fibers, whereas the difference arises because these fibers have a different length. In classical beam theory, we distinguish between buckling (caused by a compressive force) and bending (caused by a bending moment). In the case of buckling, which is caused by a same critical force, different curvatures can be obtained, whereas in the case of bending the curvature is proportional to the bending moment. Based on the data presented by the authors, it seems that their model operates in the buckling regime. It would be important to elaborate on this more systematically. Also, one should warn the reader that in the case of bending, the inner and outer kinetochore fibers will be characterized by different bending moments.

      We thank the reviewer for raising this nuanced point on the shape generation mechanisms in inner and outer k-fibers. We believe that the mechanisms that the reviewer suggested are valid ways to generate varying k-fiber deflections in the scenario where the k-fiber end-to-end length is held fixed. However, we argue that the natural variability in the lengths of inner vs. outer k-fibers is alone sufficient to give rise to diverse k-fiber shapes without requiring the end-forces to change.

      We added a new Appendix section 1.4 (pages S4-S5 in the revised appendix) in our revised submission where we provide the details of our argument. We demonstrate analytically that when only a moment at the pole is present and held at a fixed value, then the normalized maximum deflection scales linearly with the k-fiber’s end-to-end length (Appendix 1 – figure 3a,b). And in the case where both a moment at the pole and an axial force are present and held at fixed values, the dependence on k-fiber length is stronger (faster than linear), thereby allowing for a wide range of k-fiber deflections created with identical end-forces (Appendix 1 – figure 3c,d).

      Reviewer #2 (Public Review):

      Suresh and co-workers apply classical beam bending theory to analyze shapes of the microtubule bundles that push and pull on mitotic chromosomes and drive chromosome separation in dividing cells. The bundles attach at one end to chromosomes via specialized protein assemblies called kinetochores, and at the other end they are associated with spindle poles. The shapes of these k-fiber bundles are analyzed in unperturbed control cells and in cells where the bundles have been forcibly deformed using microneedles. From their analysis, the authors infer the extent and nature of mechanical anchorage at each end of the bundles, finding that anchorage is more extensive and more restrictive at the kinetochore-attached ends compared to the pole-proximal ends. Anchorage at the pole-proximal ends is apparently limited to the bundle tips, allowing some swiveling of the bundles around the poles. In contrast, the kinetochore-attached ends appear to have "lateral anchorage", i.e. force-bearing connections to the sides of the bundles, that extend several micrometers away from the kinetochores. This lateral anchorage resists swiveling of the bundles around their kinetochore-attached ends.

      A major strength of this study is its high degree of novelty. The microneedle data on which the analyses are based have been published previously, but are entirely unique - based on classic, groundbreaking experiments performed nearly half a century ago on cells from grasshoppers and mantids, and now being done only in the Dumont lab, in mammalian cells for the first time, and with the benefit of modern fluorescence and molecular perturbation techniques. Such a unique and interesting dataset certainly deserves careful analytical scrutiny, which is the focus of this new paper.

      The application here of classical beam theory to analyze k-fiber shapes is also clever, apparently well done, and well described. The unique approach provides a direct way to assess the extent to which k-fiber bundles are mechanically linked to surrounding material, including to non-k-fiber microtubules and potentially to neighboring k-fibers. The main conclusion that lateral anchorage of the k-fibers in the local vicinity (within a few micrometers) of kinetochores is needed to explain the shapes that the k-fibers adopt during manipulations seems well justified by the data and analyses - particularly by the negative curvatures measured near the kinetochore-attached ends, and the tendency for the orientations of the kinetochore-proximal portions to be maintained even 1 to 3 micrometers away from the kinetochore-attached ends. The assumptions of the analysis also seem mostly reasonable and are clearly explained. Under these assumptions, the analysis shows convincingly that forces and moments applied only at kinetochore-attached ends would be insufficient to explain the observed shapes.

      We appreciate the reviewer’s enthusiasm about the work and their thoughtful questions and suggestions to improve the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper primarily assessed the host/phage interactions for bacteria in the order of Cornyebacteriales to identify novel host factors necessary for phage infection, in regards to genes responsible for bacterial envelope assembly. Bacteria in this order, such as Mycobacterium tuberculosis and Corynebacterium diphtheriae have unique, complex envelopes composed of peptidoglycan, arabinogalactan, and mycolic acids. This barrier is a potent protector against the therapeutic effects of antibiotics. Phages can be used to discover novel aspects of this bacterial envelope assembly because they engage with cell surface receptors. To uncover new factors, the researchers challenged a high-density transposon library of Corynebacterium glutamicum (called Cglu in the paper) with phages, Cog, and CL31. Results by transposon sequencing identified loci that were interrupted, leading to phage resistance. This study implicated the importance of Cglu genes, ppgS, cgp_0658, cgp_0391, and cgp_0393. They also identified a new gene called cgp_0396 necessary for arabinogalactan modification and recognized a conserved host factor called Ahfa (Cpg_0475) that plays a crucial role in Cglu mycolic acid synthesis. Ultimately, this work implicated the importance of mycomembrane porins, arabinogalactan, and mycolic acid synthesis pathways in the assembly of the Cornyebacteriales envelope.

      Strengths of the research:

      • Language choice: A major strength of the paper is that this could easily be given to an undergraduate student with introductory knowledge of biology and they would still be able to get the gist of this paper. The language is written in a clear, concise fashion with explanations of terms not everyone would immediately know unless they worked in the field specifically.

      • These figures are generally explained in a direct manner, clearly stating the major conclusions the reader should get after carefully analyzing the presented data

      We thank the reviewer for the enthusiasm for our work and our description of it.

      How the research could be strengthened:

      • It could be worthwhile to describe some of your results mathematically. For example, the differences you see in your phage infections relating to the differences in logs, etc. Bar graphs also should be described in mathematical terms, when "something is lower compared to the WT," how much is lower, etc?

      To keep the text streamlined, we refrained from adding descriptions of the results mathematically in the text. The reader can refer to the figures to get the magnitudes of any changes observed.

      • There were no p values relating to the statistical significance of any of the data presented, which should be changed for the final manuscript implicating the importance of this work.

      We added the p-values as requested.

      • Figure 8 was not entirely supported by the data, especially Figure 8A which either could be improved with better images that support the author's claims, etc.

      We do not understand why the reviewer believes that Figure 8A does not support our conclusions. The mutant cells do not label with the 6-TMR-Tre dye whereas the WT control does. The dye labels mycolic acid such that our conclusion that AhfA is involved in mycolic acid synthesis is valid. In any case, we have included an additional supplementary source data file of the uncropped image of the 6-TMR-Tre treated cells to show a larger number of mutant cells that fail to stain, further supporting our conclusion.

      Reviewer #2 (Public Review):

      In this manuscript, McKitterick and Bernhardt use genetic approaches to investigate genes in Corynebacterium glutamicum that are required for efficient phage infection. They make use of a high-density transposon library that was generated in the Bernhardt lab recently. They challenged the library with two phages, CL31 and Cog. Importantly, they elegantly adapted the phages to the laboratory strain MB001 before. The MB001 strain is ideal for genetic experiments since all prophage elements were removed in this strain. The evolved phages are likely a very useful tool for further investigations aiming to understand host/virus interactions in this model. The phage-infected libraries were plated and the collected colonies were sequenced. Genes involved in efficient phage infection had multiple transposon insertions. Using this method the authors identified specific genes required for infection with Cog and CL31. The Cog phage needs apparently the porin proteins in the mycolic acid membrane for efficient infection and the authors speculate that the porins may act as auxiliary receptors for phage adsorption. Furthermore, genes involved in putative arabinogalactan modification were found to be important. Mutants in these genes did not abolish phage adsorption and thus play a role in viral genome injection. For phage CL31 the authors show that in particular genes involved in mycolic acid synthesis are essential. The genes identified include one coding for a protein involved in protein mycoloylation. A candidate for such a lipidation is the porin protein complex PorAH. The trehalose-6-phosphate synthase OtsA was also identified as important for phage infection. Also strictly required for the establishment of the myco membrane, otsA deletions are viable in C. glutamicum. As part of their analysis, they also identified an unknown factor in mycolic acid synthesis in C. glutamicum. Analysis of a spontaneous resistant mutant to CL31 revealed a mutation in cg_0475 (renamed ahfA). Deletion of ahfA drastically reduced mycolic acid production. This was proven by thin layer chromatography and fluorescent staining. Interestingly, deletion of ahfA also results in a cell morphology defect, indicating the importance of a correct mycolic acid layer for cell shape.

      In summary, the authors provide an excellent paper that is clearly written and experiments are conducted nicely.

      We thank the reviewer for their kind words and enthusiasm for the work.

      Reviewer #3 (Public Review):

      In their manuscript, McKitterick and Bernhardt perform a screen to determine host factors, such as receptors, which are important for bacterial viruses (phages) to infect Corynebacterium glutamicum., an organism that shares the unique membrane of mycobacteria (mycomembrane), with M. tuberculosis. To do so, they challenge a previously described Tn-seq library with a high MOI of 2 phages - Cgl and Cog. The surviving strains are those in which genes important for phage infection (such as receptors) are disrupted. The authors' screen is successful, and the authors identify and validate several factors important for the infection of each phage, providing the first such screen in Corynebacterium. Moreover, the authors perform a suppressor screen to identify additional factors and experimentally follow up several genes of interest. Finally, the authors use the newly determined host specificity of te phages to implicate new genes in mycolic acid synthesis. As a whole, this is a strong work that paves the way to a deeper understanding of Corynebacterial and (by extension) Mycobacterial phages and should be of broad interest.

      Below, we suggest additional analyses, context, and elaboration that will help the ms. elaboration to fully realize its impact.

      Major points:

      1. Although the authors' experimental design is fundamentally sound, I am worried about the possibility of "jackpotting" in shaping their results, particularly in the uninfected control experiment. If the authors' Tn-seq library is ~200,000 strains, and they don't plate at least 10-100x times that many colonies then any given strain (regardless of its phenotype) may or may not be represented in the output of the experiment, causing false phenotypes to be ascribed to genes based on chance. This is particularly a problem for the uninfected control, where the authors choose to dilute the culture 1000fold to mimic the number of colonies that survive infection. They may be better served by plating the whole culture on the plates, to ensure adequate representation of the library. Part of the reason for this concern is that an overwhelming majority of statistically significant hits (something like 80-90%) appear to confer susceptibility rather than resistance (source data Fig 2) - something the authors' experimental design should not be able to measure. The lack of accurate representation of distributions of strains in the starting culture also calls into question the quantitative differences they present in the results

      We thank the reviewer for their thorough analysis of our experimental design. The Tn-Seq experiments were repeated with the uninfected controls plated at a density that maintains the representation of the original library. The overall results are largely unchanged because we maintain our focus on hits that become greatly enriched following phage infection not those that become depleted. The vast majority of these hits were validated for their involvement by constructing mutant strains, indicating the robustness of the current and previous analyses. With respect to the depletion of insertion mutants, we mentioned in the original submission that they are unlikely to be biologically meaningful.

      a. L138. Where the authors describe their initial experimental design it would be helpful to add more details. What is the size of the Tn library? What is the coverage in their experiment? Approximately how many colonies are recovered on the plates after phage infection and in the uninfected control?

      This information has been added (Fig. 2 table supplement 1).

      b. it is important to know how the number of colonies on the plates compares to the number of reads in the experiment. In the analysis of most HT screens, one implicitly assumes that each read corresponds to 1 cell, hence each read can be treated as statistically independent. This assumption is critical to the statistical methods used to analyze this data. By scraping a plate of colonies (which may be required for efficient phage infection), the authors potentially violate this assumption (since the number of cells → number of colonies, which are the actual statistically independent entities in the experiment). Does this assumption hold (or approximately hold) for the screen? If not, a different statistical method should be used to determine p-values.

      We respectfully disagree with the reviewer on this point. In our view, a slurry of colonies from a plate is no different than a culture. Both contain a mixture of cells containing an array of different transposon mutants each represented multiple times in the population due to replication of the original mutant. We do not think there is any meaningful difference to the analysis whether this replication occurs in liquid or on a plate. In both cases, a read corresponds to a single cell/molecule of purified genomic DNA from the population.

      1. The authors' Tn-seq methodology is different from previously published HT-phage screens (e.g. Mutalik et al., 2020 and Rousset et al., 2018). Based on my knowledge of classical phage biology, I agree that plating the infected cells has advantages. However, the rationale will not be clear for most people performing such experiments. Please explain the rationale for the experimental protocol.

      Although the authors in the Mutalik et al paper did do competition experiments in liquid over several infection cycles, they also made use of a solid platebased assay in which they adsorbed their phages to the library cells for 15 minutes before plating. These plates were incubated overnight and resistant colonies were scraped, pelleted, and DNA prepped in a similar manner to the approach we took.

      We prefer plating over liquid growth because colony formation is an easy way to ensure that the mutant population has undergone numerous rounds of doubling under a given condition before the analysis is performed.

      a. Why did the authors plate the cultures after initial phage absorption instead of remaining in liquid?

      We were concerned that some potential receptor-related mutants would be less fit and would therefore be lost in a competition experiment. As such, plating after phage adsorption would decrease the competition between phage survivors. Furthermore, we thought that plating would additionally ensure that the bacteria that are sequenced are true survivors and not just reflect remnant DNA in the culture.

      b. How reproducible are the authors' Tn-seq results? The SRA ascension shows multiple replicates but this is not described in the manuscript nor reflected in the supplementary data. Given the potential for bottleneck and jackpotting effects in this assay, some measure of reproducibility is important for interpreting the results (see point 1).

      We performed completely new Tn-seq experiments for each phage in duplicate. The hit lists remained largely unchanged from our initial analysis and those that were investigated further were enriched for insertions in both new data sets. Thus, the results are highly reproducible.

      c. L587 "Significant hits with fewer than 10 insertions on each strand were manually removed." Why did the authors choose this criterion? Almost all of the genes they removed have very asymmetric distributions (e.g. in the Cog experiment, cgp3051 has 47853 fwd reads and 6 rev reads. Asymmetric distribution of insertions suggests that overexpression of downstream genes has an important (positive or negative) effect. This is a worthwhile pursuit, and many automated analysis pipelines can disambiguate these effects, including those developed in the Walker Lab (e.g. doi: 10.1038/s41589018-0041-4). These genes shouldn't be thrown away when they are arguably some of the most informative hits!

      We have updated the criteria we used for selecting the most impactful insertion enrichments. Our concern in this report was to investigate mutants that affect phage infection when inactivated. We will pursue genes that affect phage infection when overexpressed (as indicated by asymmetric insertion orientation distributions) in a follow-on study. We think such a study would best be carried out with a different transposon containing a strong outward facing promoter.

      1. There is a somewhat extensive phylogeny of M. smegmatis phages (phagesdb.org). Are the phages that the authors work on related to any of these phages? If so, what cluster do they map to? What is the host range of other phages in that cluster? If not, may be worthwhile to mention that these are quite distinct from other studied phages.

      We agree that the phylogenetic history of corynephages is quite interesting. Very few phages that infect Cglu have been isolated and sequenced, let alone studied. Neither Cog nor CL31 share significant nucleotide identity with other sequenced phages, thus they do not have assigned clusters at the moment.

      1. Given that cgp_0475 was a strong hit in the Tn-seq, why was it not identified in the previous chemical genomics experiments from the lab (https://doi.org/10.7554/eLife.54761) ?

      We appreciate the reviewer’s interest in previous work from the lab. In the prior phenotypic analysis, cgp_0475 was identified as having severe fitness defects across many conditions. However, it was not possible to correlate its phenotype with other genes involved in mycolic acid synthesis like pks and fadD2 because they were found to be so sick in the phenotypic outgrowth that they were classified as essential.

      1. Is there any relationship between the growth-rate of the mutants and their phage susceptibility? This can be analyzed using the authors' previous studies of this library.

      While some of the phage resistant mutants are associated with poor fitness (namely those involved in mycolic acid synthesis), not all were associated with decreased growth. For example, there were minimal fitness defects associated with deletions of either porAH or the genes involved GalN decoration. However, loss of these genes greatly inhibited the ability of Cog to infect.

    1. Author Response

      Reviewer #1 (Public Review):

      Main concerns:

      1) Validation of the MCS reporters is not shown. This is particularly important for pCLIP and GoPo, which have not been reported before. Fluorescence complementation between two proteins that normally localize to different organelles is far from demonstrating the existence of a MCS between those organelles. It would be important to demonstrate using marker proteins and ideally electron microscopy/CLEM the existence of the mentioned MCS and the suitability of the fluorescent reporter.

      We thank the reviewer for pointing this out and have now added supplementary characterization of the pCLIP and GoPo contact sites. The pCLIP has been previously described by us (Shai et al. 2018 Nat Commun 9, 1761. doi:10.1038/s41467-018-03957-8) and so we have only added one new figure (Figure 2 S1A) which shows the co-localization of the contact site reporter with a LD marker (MDH) and a cell periphery marker (TRITC-ConA). For the GoPo, since this is the first demonstration of a reporter for this contact site, we have rigorously characterized it by looking at the frequency of co-localization between peroxisomes and the Golgi in the absence of the reporter (Figure 1 S1B), the co-localization of the contact site reporter with a peroxisome marker (CFP-SKL) and a Golgi marker (Sec7-mCherry) (Figure 1 S1C), and by identifying a condition where this contact site is increased (Figure 1 S1D).

      Since all supported their function as bone-fide reporters and since performing electron microscopy experiments on these reporters was not possible for us at this time and has not been the standard in the field for other reporters, we hope that this is satisfactory.

      2) As pointed out above, the identification of a phenotype in ergosterol distribution for Ypr097W/Lec1 is very interesting. However, it is unclear how this observation relates with the localization of Lec1 to LDs, which is observed only upon over-expression.

      We would like to clarify that at endogenous levels Lec1 also localizes to LDs. However, this localization is less pronounced. To clarify this in the text and show this experimentally we have now added an example of the endogenous GFP-tagged protein with the LD marker Faa4-mCherry (Figure 3 S1B), and added a section in the text.

      Instead, further characterization of Ypr097w phenotype (via mutagenesis, modulation of ergosterol biosynthetic pathway, test ability to bind ergosterol, etc) in ergosterol distribution would be a plus.

      To further characterize the Lec1 phenotype, we looked at changes in ADHpr-GFP-Lec1 localization in cells treated with 40 µg/ml of fluconazole for 3h (Figure 5 S2B-C). Fluconazole is a known inhibitor of Erg11 and treatment with this drug strongly reduces the overall levels of cellular ergosterol, which can be clearly observed by the loss of binding of mCherry-D4H to the plasma membrane (cytosolic signal) (Figure 5 S2B lower right panels). After 3h of treatment with fluconazole, there is a small increase in the number of cells with bud/ bud neck localization for GFP-Lec1. The GFP-Lec1 signal in these cells generally appears brighter than in untreated cells, suggesting that loss of ergosterol potentiates Lec1 accumulation at the bud / bud neck. This result suggests that Lec1 cellular localization is affected by the levels of ergosterol. However, since treatment with high concentration of fluconazole leads to growth arrest (Zhang et al. 2010. PLOS Pathogens 6:e1000939. doi:10.1371/journal.ppat.1000939), it is also possible that this signal increase is the result of Lec1 accumulation at the bud due to a stalling in budding. We now discuss this in the text.

      We have also extensively mutagenized Lec1 as requested in an attempt to find a mutant that is still localized to LDs and stable yet not causing sterol redistribution. However, despite great efforts this has proven to be challenging (See below in detailed response to this request from reviewer #2).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have employed a variety of techniques (single-molecule fluorescence kinetic and steady state measurements, cryo-EM structure determination, and in vivo measurements of protein synthesis and cell proliferation) to investigate the mechanism of action of two molecule products: Didemnin B and Ternatin-4. Both molecules have previously shown to target eEF1A and have potential as cancer therapeutics. In addition, the structure of Didemnin B, bound to eEIF1A and to an elongation complex, have previously been solved.

      The authors here show that both compounds disrupt the dynamic accommodation of tRNA driven by eEF1A and its activation by the GTPase activation center of the ribosomal large subunit, relying on previous assignment of the FRET intensities observed in pre-steady state single-molecule fluorescence experiments in which peptide-tRNA and incoming aminoacylated tRNA are labeled with donor and acceptor dyes, respectively. They further show that this inhibition is dose dependent for both compounds and sensitive to the A399V eEF1A mutant, which creates a steric clash with didemnin B in its usual binding site. Subsequent analysis of steady-state single-molecule FRET experiments shows that didemnin B more strongly inhibits transitions between the intermediate (0.45) FRET state and the high (0.8) FRET states (though the authors choose to focus only on the effect of transitions from 0.45 to 0.8) previously assigned to the GTPase activated and fully accommodated conformations of the ternary complex, respectively. Further single-molecule experiments provide initial evidence that Didemnin B remains more stably bound to elongation complexes than does Ternatin-4.

      The authors then turn to cryo-EM structures of each compound bound to elongation complexes purified either from lysate or assembled from purified components. The structure of the Ternatin-4 complex shows additional density in the same binding cleft observed for Didemnin B in a prior structure reported elsewhere, with which the Didemnin B structures reported here also agree. This binding location provides structural evidence for both compounds effects on ternary complex dynamics, as well as their previously described effects on tRNA accommodation and elongation. Further comparison of the Didemnin B and Ternatin-4 structures reveals decreased electron density in the Ternatin-4 structure for elements of eEF1A (switch loops 1 and 2 and helix alpha2), as compared to the Didemnin B structures. The authors interpret this as evidence for greater mobility of these elements, which might explain the more modest restriction of A-site tRNA dynamics they observe in the presence of Ternatin-4 (as opposed to Didemnin B). Certainly this decreased density (which might be more convincingly demonstrated using difference maps of the two structures) is consistent with that interpretation. That said, it is certainly not a smoking gun.

      We have worked to soften the language pertaining to this point and have updated Fig. 4 to more accurately highlight the observed differences between the didemnin and ternatin-4 structures.

      Finally, the authors turn to in vivo measurements of protein synthesis and effects on cellular proliferation or survival in the presence of both compounds. Consistent with their single-molecule experiments, they observe more severe and durable inhibition of protein synthesis in the presence of Didemnin B, whereas Ternatin-4 exhibits more modest effects that are more rapidly restored upon removal of the drug in solution. Interestingly, Ternatin-4 appears to elicit similar, and perhaps more rapid, effects on cellular survival, increasing apoptosis more rapidly than Didemnin B, though these effects (like those on protein synthesis rates) are once again more sensitive to removal of the drug. The authors describe these results as evidence that Didemnin-B "irreversibly inhibits" protein synthesis in cells. I find this assertion strange, given that the authors have previously measured a dissociation rate for this molecule from elongation complexes and they have not performed measurements to ensure that activity is not simply restored at timescales longer than their initial measurements. That said, I concede that this might be a semantic distinction if the vast majority of cells perish prior to dissociation of the drug. In either case, I would suggest the authors apply a somewhat more nuanced interpretation of these results lest they be misunderstood.

      We thank Reviewer 1 for bringing this point to our attention. We have changed the title of this section to “Protein synthesis inhibition by ternatin-4, but not didemnin, can be reversed in cells,” and have softened the interpretation.

      Overall, this is a rigorous and well reasoned study that employs multiple complementary techniques to investigate the mechanism of action of compounds of potential therapeutic interest. In places, the higher order interpretation of the experimental data leaks into the results section (as opposed to being fully explored in the discussion) and is at times somewhat aggressive. Nonetheless, the results presented here illuminate important questions at the intersection of translational mechanism, cell proliferation, and cancer.

      We are grateful to Reviewer 1 for their assessment of this work as rigorous and well-reasoned. We have made significant updates to the text and figures, and we hope they find that we have addressed all concerns.

      Reviewer #2 (Public Review):

      The manuscript of Juette et al presents a combined structural and dynamic view of how a class of inhibitors (Didemins) block human ribosomal elongation. Prior work had shown that these cyclic peptide drugs bind to eEF1A in the ternary complex on the ribosome, between the Domains I and III of the factor, blocking the dissociation of the elongation factor from elongator tRNA and ribosome during decoding. Here the authors use beautiful single-molecule and structural approaches to probe the mechanisms of two related drugs-Didemnin B and Ternatin-4. Their results expand on prior observations of drug mechanism, and provide clarity for the similarities and differences on how the two drugs work both in vitro and in vivo. Using single-molecule tRNA-tRNA FRET, the authors show that the drugs (at saturating concentrations) block progression of the tRNA from a mid-FRET (GTPase activating) state to the fully accommodated (high FRET) state; they observe slightly more transitions to high FRET in the presence Ternatin-4 than Didemnin B (more below on this). These results are consistent with the idea that the drugs trap the ternary complex on the ribosome after GTP hydrolysis. Using the fraction of ribosomes that lead to accommodation, the authors performed a titration to determine the apparent Ki for the drugs (which were similar in the range of 5-10nM). They also performed clever washout experiments (always in presence of cycloheximide to block further conformational dynamics once a tRNA accommodates). These experiments probed the drug dissociation rate and showed marked differences between Didemnin-B (slow rate) and Ternatin-4 (faster rate). The authors then recapitulate the prior structural work (at lower resolution in RRL) using a reconstituted system. Their results show a similar structure as that solved previously, but with more disordered loops in the presence of ternatin-4, although the resolution here is moderate (3.2 and 3.8Å for the two drug complexes). Finally, the authors perform in vivo analyses of drug action on protein synthesis using clickable amino acid incorporation. They show that the two drugs block protein synthesis in a dose dependent manner, and that the effect of ternatin-4 can be reversed by washout of the drug, whereas that of didemnin-2 is poorly reversed, explaining differences in drug action despite the similar binding site.

      Overall, this is a rigorous and well performed study probing the mechanisms of drug action in human translation elongation. The combination of dynamics measurements and structure are particularly novel, and will complement ongoing investigations (and publications) by the Blanchard lab on human elongation in general.

      We thank Reviewer 2 for their assessment of this work as rigorous and novel.

      Reviewer #3 (Public Review):

      In this article, Juette et al employed single-molecule FRET, cryo-EM, and Hpg incorporation (in cell translation assays) to compare the mechanisms by which Didemnin B and Ternatin-4 inhibit translation elongation. They found that, while binding to the same pocket of eEF1A and blocking accommodation after GTP hydrolysis, Didemnin B had an irreversible effect on protein synthesis, but Ternatin-4, while still a potent inhibitor, allowed more flexibility in complexes (increased disorder of regions in cryo-EM structures) that allowed increased sampling of on-pathway accommodated states (observed by smFRET), and reversibility of effects on protein synthesis in cultured cells (by Hpg incorporation). This is a straightforward study and the conclusions are well-supported by the data using appropriate techniques. The work will be of impact to the ribosome field, which may use these drugs in other mechanistic studies, and researchers wanting to employ the drugs to combat cancer and other diseases.

      We are thankful to Reviewer 3 for their assessment of this work as well-supported and impactful.

    1. Author Response

      Reviewer #2 (Public Review):

      I have only one concern with the study. I am not fully convinced that the disruption of behavioral updating is specifically due to NA signaling within OFC. In the first two studies, they observed non-specific anatomical effect likely due to the ablation of fibers of passage through OFC. The DREADD experiment is claimed to allay this concern. However, the DCZ was injected systemically. This means that any collaterals of LC NA neurons outside OFC will also be suppressed. While the lack of effect with the mPFC projection is interesting, this does not preclude an effect mediated in other target regions. Overall, I believe that none of the experiments truly demonstrate a specific effect of NA in OFC. A few experimental options that could be considered are injection of DCZ directly in OFC, optogenetic inhibition of fibers in OFC, or pharmacological disruption of NA signaling in OFC.

      The other options are to measure the effect of the toxin ablations from experiments 1 and 2 not just in mPFC but in other regions. If the non-specific effect is truly only in mPFC outside of OFC, that would lead to more confidence that mPFC projection is the only other viable pathway mediating the effect.

      As requested, we have quantified the effect of toxin ablations in neighbouring cortical regions known to be involved in the goal directed behavior, namely the insular cortex (IC, e.g., Balleine & Dickinson, 2000; Parkes & Balleine, 2013) the medial orbitofrontal cortex (MO, e.g., Bradfield et al., 2015; Gourley et al., 2016) and secondary motor cortex (M2, Gremel et al., 2016). Briefly, we found that injection of the saporin toxin in the VO and LO (Experiment 1) led to a significant decrease in NA fiber density in all examined regions. Injection of 6-OHDA also produced significant loss of NA fibres in MO and M2 but not insular cortex. These results are presented in Suppl. Figures 1 and 3 (pages 28 and 30) and the statistics are reported in the main text (page 6 and page 11)

      We have also added the following to our discussion on the reason for the off-target depletions that we observed and acknowledged the potential role of collateral LC neurons:

      Page 21, line starting 374: “The use of the saporin toxin led to a dramatic decrease of NA fiber density in all analysed cortical areas (Suppl Fig 1). This may be due to diffusion of the toxin from the injection site, the existence of collateral LC neurons and/or fibers passing through the ventral portion of the OFC but targeting other cortical areas (Cerpa et al 2019). However, injection of 6OHDA led to much less offsite NA depletion suggesting that a large part of the previous observation is toxin-specific. Indeed, no significant loss of NA fibers was visible in the insular cortex, which has been previously implicated in goal-directed behaviour (Balleine & Dickinson, 2000; Parkes et al., 2013; 2015; 2017). We did nevertheless observe an offsite depletion in more proximal prefrontal areas (prelimbic and medial orbitofrontal cortices) albeit a more modest depletion that what was observed using the saporin toxin. Several studies have described the projection pattern of LC cells. These studies, using various techniques, indicate that LC cells mainly target a single region, and that only a small proportion of LC neurons collateralize to minor targets (Plummer et al., 2020, Kebschull et al 2016, Uematsu et al 2017, Chandler et al 2014). Therefore, even if the OFC noradrenergic innervation is presumably specific (Chandler et al 2013), we cannot rule out a possible collateralization of some neurons toward neighbouring prefrontal areas (PL and MO). We have previously discussed that the posterior ventral portion of the OFC is an entry point for LC fibers en passant, which ultimately target other prefrontal areas (Cerpa et al 2019).

      To achieve a greater anatomical selectivity we used a CAV-2 vector carrying the noradrenergic promoter PRS to target either the LC:A32 or the LC:OFC pathways (Hayat et al., 2020; Hirschberg et al., 2017). It has been shown that the CAV-2 vector can infect axons-of-passage, however the vector does not spread more than 200 µm from the injection site (Schwarz et al 2015). Therefore, when targeting the OFC we injected anteriorly to the level where the highest density of fibers of passage is expected (Cerpa et al 2019) in order to minimize infection of such fibers and restrict inhibition to our pathway of interest.

      Overall, the current behavioural results are in line with our previous work showing that the ability to associate new outcomes to previously acquired actions is impaired following chemogenetic inhibition of the VO and LO (Parkes et al., 2018) or disconnection of the VO and LO from the submedius thalamic nucleus (Fresno et al 2019). These results point to a necessary role of the ventral and lateral parts of the OFC and its noradrenergic innervation for updating A-O associations. However, it is worth mentioning that different subregions of the OFC, both along the medio-lateral and antero-posterior axes of OFC, display clear functional heterogeneities (Dalton et al 2016, Izquierdo 2017, Panayi & Killcross, 2018, Bradfield et al 2018, Barreiros et al 2021). Therefore, while we have previously focused on the anatomical heterogeneity of the noradrenergic innervation in these prefrontal subregions (Cerpa et al 2019), a thorough characterization of its functional role in each of these subregions still needs to be addressed.”

      One last concern is that the lack of the effect due to disruption of the mPFC projection is not guaranteed to not be from experimental issues. If the authors have some evidence that the mPFC projection disruption produced some other behavioral effect, that would make the lack of effect in this case more convincing.

      Unfortunately, we do not provide evidence in the current paper that disrupting the LC:mPFC (now termed LC:A32 in the current study, based on the recommendation of reviewer 1) projection produces some other behavioural effect. However, in an on-going series of experiments, using the same tools as the current study, we found that inhibiting the LC:A32, but not LC:OFC, pathway impairs Pavlovian contingency degradation as shown in the figure below. We therefore believe that the failure of LC:mPFC pathway inhibition to effect outcome identity reversal in the present study is not due to experimental issues. Please note that in the figure below mPFC is referred to as area 32 (A32), as requested by reviewer 1.

      Figure 1. A) Experimental timeline for the Pavlovian contingency degradation procedure. Prior to behavioural training, rats were injected with CAV2-PRS-hM4D-mCherry into either the vlOFC or area 32 (A32). Number of food port entries during the non-degraded CS and degraded CS for rats injected with vehicle and rats injected with DCZ during degradation training (B, D) and the test in extinction (C, E). Inhibition of the LC:vlOFC had no effect on Pavlovian contingency degradation, whereas inhibition of LC:A32 during degradation training rendered rats insensitive to the change in the causal relationship between the CS and the US.

      Reviewer #3 (Public Review):

      I would be curious about the authors' thoughts regarding the recent Duan ... Robbins Neuron paper (https://pubmed.ncbi.nlm.nih.gov/34171290/), in which marmosets displayed paradoxical responses to VLO inactivation and stimulation in contingency degradation tasks. Are there ways to reconcile these reports?

      We previously argued that the updating processes underlying changes in causal contingency versus outcome identity may be supported by different prefrontal regions (Cerpa et al., 2021, Behav Neurosci). Unfortunately, the tasks used in the current study do not allow us to test if our rats are sensitive to changes in the action-outcome contingency. In fact, the effect of inactivation (or overactivation) of the ventral and lateral regions of OFC on an instrumental contingency degradation task similar to that used in Duan et al (2022) has not yet been examined in rats.

      Indeed, while it is stated in Duan et al (2022) that rats with lesions of lateral OFC are insensitive to contingency degradation, none of the citations provided support this conclusion (Balleine & Dickinson, 1998; Corbit & Balleine, 2003; Ostlund and Balleine, 2007; Yin et al., 2005). Balleine and Dickinson (1998) assessed the effect of prelimbic and insular cortex lesions (insular anteroposterior coordinate +1.2), with only the former affecting instrumental contingency degradation. Ostlund and Balleine (2007) assessed the effect of orbitofrontal lesions on Pavlovian contingency degradation (degradation of the S-O contingency) not instrumental contingency degradation. Finally, Corbit and Balleine (2003) and Yin et al (2005) assessed the effect of prelimbic and dorsomedial striatum lesions, respectively. Nevertheless, there are some reports on the effect of chemogenetic inhibition of VO/LO on degradation in a nose-poke response task but the results are conflicting (e.g., Whyte et al., 2019; Zimmerman et al., 2017; 2018). It would be very interesting to study the impact of both inactivation and overactivation of VO and LO in rats to compare with the results found in marmosets, using comparable tasks.

      We have added the following to our discussion, which cites Duan et al (2022) and the need to better understand the role of VO and LO in contingency degradation.

      Page 24, line starting 450: “However, it is not yet clear if the NA-OFC system is also involved in detecting the causal relationship between an action and its outcome (see Cerpa et al., 2021 for a discussion). Some have reported impaired adaptation to contingency changes following inhibition of VO and LO or BDNF-knockdown in these regions (Whyte et al., 2019; Zimmerman et al., 2017), while another study shows that inhibition of VO/LO leaves sensitivity to degradation intact, at least during an initial test (Zimmerman et al., 2018). Interestingly, a recent paper in marmosets demonstrates that inactivation of anterior OFC (area 11) improves instrumental contingency degradation, whereas overactivation impairs degradation (Duan et al., 2022). The potential role of the rodent ventral and lateral regions of OFC, and the NA innervation of OFC, in adapting to degradation of instrumental contingencies requires further investigation.”

  2. Sep 2022
    1. Author Response

      Reviewer #2 (Public Review):

      The role of cMAF in the formation of iNKT10 is only suggested ny the transcriptional signatures analyzed here. There is no direct evidence that cMAF is indeed needed to generate iNKT10. This should be investigated.

      We thank the reviewer for their comments on the link between IL-10, NKT10 cells, and cMAF. We agree that our study provides evidence that cMAF is a promising candidate regulator of IL-10 production by iNKT cells, and we attempted to address this using gene-specific knockout mice. Since mice lacking expression of cMAF exhibit post-natal mortality and severe developmental defects13⁠⁠ we attempted to breed Maffl/flCd4-cre mice, which have previously been used to study the role of cMAF in T cell function14⁠⁠. However, we were not able to successfully breed enough of these mice to assay whether or not cMAF is required for the production of IL-10 by iNKT cells. Therefore, our study can only suggest that cMAF is a promising candidate regulator of NKT10 cells based on our transcriptomic data and flow cytomery data showing that production of IL-10 is associated with expression of cMAF. However, we present further correlative or indirect evidence to this effect. It has previously been demonstrated that restimulation of activated iNKT cells at 72 hours post-⍺GalCer results in increased production of IL-10 compared to the stimulation of iNKT cells at steady state15⁠⁠. We found that the frequency of splenic cMAF+ iNKT cells was greatly increased at 72 hours post-⍺GalCer compared to steady state (Figure S3B, Figure S3D) and this increase in expression of cMAF correlated with increased production of IL-10 (Figure S3E-S3F). Therefore, we believe that cMAF is a promising candidate for future work examining the functional landscape of NKT10 cells and we anticipate that our study will be a useful transcriptomic reference for such studies.

      The Kronenberg group recently published a similar analysis, using RNAseq and ATACseq. Although I don't believe the cMAF signature was highlighted at the time, one could argue that this previously published study dampens the originality of this manuscript. Although this study (Murray et al.) is clearly acknowledged, the similarities and differences in both the methodology and findings should be clearly discussed.

      As the reviewer stated, the excellent study by Murray et al. (2021) did not identify or highlight a population of cMAF+ iNKT cells expressing a regulatory gene signature, as presented in our study, and as the reviewer mentions, we cite and discuss the Murray et al. study in our manuscript. We believe that both studies together provide a comprehensive transcriptomic analysis of iNKT cells after activation, and that ours provides unique insight not found in Murray et al. Our study uses scRNA-Seq rather than bulk RNA-Seq or bulk ATAC-Seq methods, enabling us to study transcriptomic characteristics of activation among heterogeneous iNKT cell subsets without needing to sort pre-identified iNKT cell populations or subsets. It is the use of unbiased scRNA-Seq that allowed us to identify cMAF+ iNKT cells, since this population has not been previously described in the literature. Notably, we also sequenced the largest number of iNKT cells to date, 48,813 cells, to the best of our knowledge, which provides deeper insight. We also performed transcriptomic characterization of activated iNKT cells at different stages of activation to those characterized by Murray et al. Importantly, we profiled the phenotype of iNKT cells at 4 hours post-⍺GalCer and 72 hours post-⍺GalCer, when iNKT cells engage in a rapid cytokine production or undergo proliferation and expansion. This revealed several novel transcriptional insights including rapid metabolic gene reprogramming that occurs twice during this activation timeline. By contrast, Murray et al. focused on analysis of iNKT cells at steady state and 6 days post-⍺GalCer. Finally, we performed transcriptional characterization of adipose iNKT cells in our study, which are known to represent an unusual regulatory population of iNKT cells at steady state16⁠⁠, whereas the study by Murray et al. (2021) did not study adipose iNKT cells. Therefore, we propose that our study complements the excellent work performed by Murray et al. (2021) but provides novel insight in terms of focus, discovery, and scope.

      The authors should clearly describe the genes that were used to define iNKT1/2/17 identity in their study. This is important in order to track that identity over time following activation, at it is well known that the expression of some of the markers typically used change following activation. This would bring clarity to the manuscript.

      We agree with the reviewer and we had originally removed two clarifying figures for iNKT cell subset identification due to space, but now we have included these two clarifying supplemental figures (Figure S4, Figure S5) to illustrate how we identified NKT1, NKT2, and NKT17 cell subsets in our scRNA-Seq data. We have also added further details to the Methods section (please see the “Downstream scRNA-Seq data analysis” section) and we have changed the title of the activated iNKT cell data in Figure 2A-2D and Figure 4C from “4 hours post-⍺GalCer” to “Activated (4 hours post-⍺GalCer)” to reflect our subset identification protocol as accurately as possible (please see below).

      Steady state and activated splenic NKT1, NKT2 and NKT17 cell subset identification was performed as follows: We identified spacial separation and graph-based clustering of five main populations of cells at steady state (Figure S4A). We then used the expression of the published marker genes Tbx21, Zbtb16, Rorc and Mki67 to identify NKT1, NKT2, NKT17 and Cycling cells (Figure S4B). We identified spacial separation and graph-based clustering of three main populations at 4 hours post-αGalCer (Figure S4D). However, we found that Tbx21 and Zbtb16 expression was increased across multiple clusters and did not effectively demarcate NKT1 and NKT2 cells at the RNA level (Figure 2B), and so we instead used the flagship cytokines Ifng, Il4, Il13 and Il17a and Il17f to demarcate NKT1, NKT2 and NKT17 cells. We then combined the identified NKT1, NKT2 and NKT17 cell populations from steady state and 4 hours post-⍺GalCer together (i.e. cells from the same subset at the two different time points were combined together) and performed reclustering of the cells within each subset (Figure S5). It has previously been shown that there can be differences in the activation kinetic of different splenic iNKT cell subsets, for example NKT1 versus NKT2 cells, which may be in part due to physical localization, for example in the red pulp versus the white pulp of the spleen (see Lee et al. 2015)17⁠. We observed a similar phenotype for NKT2 cells in our data, whereby a proportion of NKT2 cells at 4 hour post-⍺GalCer clustered with NKT2 cells from mice that received no ⍺GalCer (Figure S5A). To prevent differences in activation kinetic from biasing our analysis of transcriptional signatures of iNKT cell subset activation, we performed low-level graph-based reclustering within each iNKT cell subset to accurately segregate activated and steady state iNKT cells (Figure S5B). We validated our reclustering using the expression of activation markers and flagship cytokines (Figure S5C-S5D). Finally, these reclustered subset data were recombined and renormalized to generate the final analysis as shown in Figure 2 of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Champer et al. evaluate two homing drives that have been developed in the Anopheles mosquito. Variants of one of these (zpg) are possibly being further investigated for an eventual release. Work with the other has seemingly been discontinued because of unintended fitness costs. The authors argue that this second drive may be in fact better if the experimental results are interpreted more favourably. An important point if true, but somewhat separate from the findings in the paper. To a large extent, this point could be made without any of the results in the paper. However, the authors do show through modelling that this difference may in fact be relevant.

      This careful justification of the model parameters increases its relevance to the evaluation of those specific gene drives. The zpg drive will likely be extensively investigated and the specific relevance of this work is a valuable contribution. While a range of parameters is tested for each expression pattern, there are no step-by-step investigations of how the drive outcomes are effect by changes to the underlying DNA-repair/deposition/fitness parameters. So while a reader may learn one drive is better than the other, the ability to get a deeper understanding of the underlying relationship is limited. This means this work has a more limited scope and relies on the relevance of the chosen parameters. In that regard, there may be room for improvement. The chosen parameters for zpg and nos may not be completely fair in regards to the target site and I believe this needs to be addressed.

      The second aspect of this paper is the comparison between the commonly used panmictic modelling approach and spacial models. This also somewhat relies on the drive parameters being chosen well, as a more comprehensive evaluation of the spacial approach has been done in prior work by this group. However, showing that these particular extremely efficient drives may still struggle when additional spacial factors are considered is useful and relevant. That a second Anopheles-specific spacial model further reduces the drive performance is a relevant finding. This is helped by a specific analysis of the effect of changes to the migration rates and the low-density growth rate. This spacial modelling also has relevant findings for the homing X-shredder design.

      In our previous study (Champer, Kim, et al, 2021 in Molecular Ecology) that we reference, we varied some of these drive performance parameters, which may address some of the reviewer’s concerns. We view this study as building off that one, but with a more specific focus (mosquitoes and existing drives). We also now discuss how using parameters for a different target site may have affected our results (see below - nos may actually have been shortchanged since zpg performs better at dsx than at nudel).

      Reviewer #2 (Public Review):

      Champer and colleagues present forward simulations of several gene drive systems that have been designed to suppress the malaria mosquito, Anopheles gambiae. These gene drives have all been validated in laboratory cage experiments but have not yet progressed to field trials. The authors are particularly concerned with the phenomenon of "chasing," in which local success of the drive will lead to continuous cycles of recolonization by wildtype mosquitoes, preventing complete suppression of the population. In addition to their spatially-explicit model, they additionally present results from a model in which the parameters are tuned to the ecology of the mosquito.

      Though there are a few additions that would improve the manuscript, the authors achieved these aims and their conclusions are supported by their modeling, which appears to be technically sound and well executed.

      Strengths:

      The work represents a useful, model-based comparison of the various Anopheles gene drives that have had success in laboratory conditions. With the incorporation of spatial dynamics, the authors are thereby able to focus on the problem of chasing, or a fluctuating equilibrium state that is impossible to study in laboratory colonies. Through a comparative framework, the authors additionally provide key information on the differences in predicted success between the various gene drive systems. For these reasons, the work will be a useful addition to the other published forecasts of gene drive success. Given the importance of the topic to diverse stakeholders that vary in their familiarity with gene drives and ecological modeling, I was glad to see the authors summarize their findings cogently and accessibly.

      We thank the reviewer for these kind comments.

      Weaknesses:

      The main area in which the manuscript could be strengthened is the description of the Anopheles-specific model. Based in part on the differences observed between their discrete generation model and Anopheles-specific model, the authors correctly note that "the outcome of a drive release could be very sensitive to the precise ecological characteristics of the targeted population." It was sometimes unclear which model parameter choices were informed by literature and how much confidence was had in each. A more explicit summary or perhaps a table of parameters, references, and estimates with confidence ranges informed by the authors' knowledge of literature would strengthen this section.

      We now have added Table S1 showing all model parameters. The parameters themselves often require more text for justification than just references, but we have improved our methods section throughout to increase the visibility and clarity of these sections. Note that mosquito ecology is a fairly understudied field, resulting in widely varying parameter ranges throughout different studies, so it is difficult to provide confidence ranges for our parameters, other than that they are designed to fall within estimates from different studies (see “Anopheles-specific spatial model” methods subsection).

      In the framing of the work, the authors imply that their modeling study "suggest(s) an alternative interpretation of [the] performance [of homing gene drives]" from recent studies (e.g., Simoni et al. 2020 and Kyrou et al. 2018). I am not certain this framing is justified, given the original authors' circumspection in correctly noting their drives had success in the cage experiments, without claiming they would be successful in the wild. I would prefer this study be presented as building on those previous studies and extending their work.

      In crafting this sentence, we had in mind very specific technical interpretations of drive performance mechanisms (paternal deposition vs. more somatic fitness cost, existing of somatic fitness cost in nos males) rather than general performance in any given environment. To more clearly convey our intended meaning here, we have adjusted our wording. The sentence now reads: “Here, we analyze data associated with each of these gene drives and consider both the original and alternative interpretations of these drives’ characteristics and performance parameters.”

      Likely impact:

      This work will be of interest to research scientists whose interests range from transgenic mosquitoes to ecological modelers to post-release assessment. The authors correctly note that additional refinement of the ecological parameters will increase the utility of the model, but the framework as it stands will be an important contribution to the literature. Given the timeliness of this topic, the subject is of interest to other stakeholders in the regulatory or policymaking realm, as well as governmental and funding agencies deciding between gene drive systems.

      Reviewer #3 (Public Review):

      This is a computational modeling study to evaluate the merits (likely success) of different 'suppression' gene drive systems. Gene drives offer a possible simple and low-effort means of suppressing or even extinguishing pest populations. Using CRISPR technology, several gene drive systems have been developed in the last decade for key mosquito vector species. As no gene drive has been approved for release in the wild, efforts to evaluate their likely success are limited to cage trials and modeling, the latter as done here. In contrast to some modeling studies, the effort here is to develop and analyze models that match the gene drive and mosquito biology closely. The models are thus parameterized with values representative of what is known about mosquito biology and of the various gene drive constructs that have been developed for lab studies.

      In these models, gene drive success or failure in population suppression largely depends on (i) how well the drive spreads throughout the population, and (ii) whether the population persists because of a type of ongoing spatial 'group selection' in which local pockets invaded by the drive die out and are then repopulated by migrants lacking the drive. Formal evolution of functional resistance is not allowed. The numerical results show striking differences in suppression success with different gene drive constructions, and these differences are likely to be of use when designing drives for actual releases.

      The basic group selection outcome that allows population persistence amid a suppression gene drive has been shown before, as cited in the ms. The novelty provided by the present study is to tie the models to the biology of known gene drive constructions. Given the high specificity of the models, the audience for this work is likely to be somewhat narrow, confined to those involved in gene drive design. The work is nonetheless significant in view of the strong potential of gene drives in global public health efforts.

      The software used to generate the trials is freely available from one of the authors for anyone wishing to repeat the simulations. There is an extensive supplement of results referenced (but not otherwise included) in the main text.

      We thank the reviewer for these comments, and we note that an analysis of functional resistance was performed in our previous study. Because the results of this study are likely to be fully applicable to our new results, we did not repeat it with our mosquito model and with our parameterized drives. However, it is certainly an important topic, and we explicitly mention in the discussion how the possibility of chasing requires further consideration in the acceptable rate of functional resistance allele formation. The text reads, “We did not consider the possibility of.... functional resistance, which can evolve more readily during lengthy chases11.”

    1. Author Response

      Reviewer #1 (Public Review):

      Overall, it is an interesting work exploring stochastic and deterministic aspects of embryonic cell division in plants. The power of the authors' approach lies in the quantitative analysis of 3D cell geometries combined with quantitative computer modelling.

      I am a bit confused about how authors relate stochasticity as an emergent property of a deterministic process. Typically, stochasticity is the low-level process resulting in variation of subcellular components those also related to the positioning of the cell division plane. Perhaps a more elaborated and clearer connection between stochasticity at the subcellular level and phenotypic variability should be provided.

      Actually, our interpretation does not directly relate variability in division patterns to a deterministic selection of division plane orientation. A key intermediate between these two scales is the variability in cell geometry. Based on our results, we propose that the selection of division plane orientation would obey a deterministic principle based on geometrical constraints. Variability in cell division patterns would ensue from the expression of this deterministic rule in the variable context of cell geometries. Variability in cell geometry would itself result from noise in the precise positioning of the division plane along the optimal, deterministic orientation. We have added a new summary Figure 10 to illustrate and clarify this interpretation.

      I have a number of specific questions/concerns that I would like the authors to address as listed below:

      1) Major variability of cell shapes is observed in the apical domain as opposed to the basal domain. What would be an underlying principle to asymmetric shaping of the apical-basal domain? The authors describe beautifully the observations but give relatively little discussion on this matter, leaving the reader guessing.

      We added a new paragraph (before the last) in the Discussion on this point. The origin for a larger variability in the apical than in the basal domain can be found in part in the different cell shapes present in the two domains at stage 16C. Along the path tetrahedron -> triangular prism -> cuboid, the apical domain indeed appears farther from the final absorbing state represented by the cuboid shape, hence more time will be spent in the intermediate shapes in this domain. In addition, the tetrahedral and triangular prism shapes are closer to rotational symmetry and thus represent a larger source of variability in division plane orientation. Lastly, apical and basal cells have distinct environments, the basal cells being constrained between the suspensor, on one side, and apical cells, on the other. We can only speculate about the functional significance, if any, of the larger variability in the apical domain. For example, we can relate it to the future morphological transition that will characterize the apical domain with the emergence of the cotyledons at the heart stage. A variable pattern of cell walls could be required to establish a specific mechanical pattern at the tissue scale to favor this shape transition.

      2) Authors used graph theory to explain variability in cell division for the same topological feature. In light of quite a discrepancy between predictions and observations (i.e., Figure 4C) question arises of how this prediction could be affected by undergoing cell expansions as this element I believe is neglected in their graph theory approach?

      It is indeed the case that cell edge lengths are ignored in the graph-theoretical approach described in Section 2.4. In this part of the paper, our objective was to objectively test if non-topological factors were implicated in the determination of division orientations. The rationale was thus to compare observed patterns to predictions obtained using topological information only. The strong discrepancy between observations and predictions (Figure 4C) confirmed that topology alone was not sufficient to predict observed patterns. The integration of cell geometry (including edge lengths) into the predictions is considered in the next sections (Sections 2.5 and 2.6).

      We believe the modifications we made in Section 2.4 to answer Reviewer 3’s comment on this part (see below) should make this point clearer.

      3) Tetrahedron shape repeats in only 4% of embryos at the 16-cell stage. What could be the criticality of this shape for the entire embryo patterning?

      At the 16C stage, there are four domains (apical/basal x inner/outer) represented each by exactly one cell shape. The triangular prismatic shape is observed in two domains (outer apical and inner basal). The two other cell shapes, cuboid and tetrahedron, are specific to the two other domains, respectively outer basal and inner apical. Hence, the tetrahedron shape represents 25%, not 4% of cell shapes at the 16-cell stage, a proportion large enough to potentially impinge on embryo patterning.

      Our graph analysis shows that, under a topologically random regime of cell division, the tetrahedron shape should progressively vanish because, due to 4-way junction avoidance, a tetrahedron cannot divide into two tetrahedra and because divisions of triangular prisms and cuboids generate a minor proportion of tetrahedra only in comparison with the other cell shapes (Figure 4C). In addition, our analysis also shows that the triangular prismatic shape is a necessary intermediate to transit from a tetrahedron to a cuboid shape.

      Altogether, the presence of the tetrahedron shape in the inner apical domain at 16-cell stage could be responsible for the large variability in cell shape subsequently observed in this domain. (See also our answer to Comment 1 of the Reviewer).

      4) It is not clear to me whether stochastic cell division modelling takes into account the mechanical influence of adjacent cells? In any case, authors should discuss how this could potentially affect their analysis.

      Indeed, our stochastic cell division model only takes into account the geometry of the mother cell, ignoring the possible influence of the environment of the cell within the tissue (through mechanical signals or other, such as hormonal signals) - except of course for indirect effects that the environment could exert on cell shape. The possible mechanical influence of the cell environment was already discussed in the original version of our manuscript (last paragraph of the Discussion). In particular, we mentioned that the specific localization deep inside the embryo of the inner basal cells could mechanistically influence the positioning of the division plane, thus explaining the strong discrepancy between observations and predictions in this domain. This negative result illustrates how the cell-autonomous model can be useful in pointing to possible environmental influence (or alternative geometrical rules yet to be identified) and in suggesting future directions of investigation.

      To remove any ambiguity, we have made more explicit the cell-autonomous nature of the model (Section 2.5 and Material and Methods).

      5) Authors should perform model parameter sensitivity analysis (i.e., position of surfaces) to confirm the convergence and robustness of their approach.

      In the first version, we already reported in Supplementary Figures S8 to S15, for each domain and each orientation of division, the distribution plots (surface area x distance to cell center) of simulations performed in different cells. In each of these figures, the cells shared the same shape (cuboid, triangular prism or tetrahedron) but differed in their exact geometry. As can be seen from these graphs, similar point distributions were obtained for different cells and, more importantly, the simulations matching best with observed patterns shared the same relative localization within these distributions (except of course when the geometrical rule was not valid, as in the basal inner domain). Therefore, these results already provide a sensitivity analysis to shape fluctuations. To make this point more explicit, we now show in a new Supplementary Figure S12 the 3D shapes associated with the first set of graphs (basal outer domain; Figure S11) and added reference to this figure in Section 2.5.

      In addition to biological variability, possible minor errors and uncertainties at the image processing and segmentation step may also affect mother cell geometry. To illustrate the robustness of our approach to this potential source of geometrical variability, we have added a new Supplementary Figure S10 showing the distribution plots of simulations performed within a raw mother cell mask and within its mask following filtering using a mathematical morphological opening with radius of 1, 2, or 3 voxels. Morphological opening is an image processing operation that smoothes binary objects, removing extrusions having a radius smaller than the prescribed radius. The obtained results show the robustness of our results to such alterations of mother cell geometry. In the four conditions (R=0,1,2,3), the simulated patterns matching best with the observed pattern are located at the same bottom left position of the plot, corresponding to the geometrical rule.

      Lastly, we have also added a new Supplementary Figure S9 to illustrate the reproducibility of simulations results obtained within a given mother cell. In this figure, we show the distribution plots for two independent sets of 1000 simulations each. The graphs show similar distributions, with identical locations at the bottom left of the distribution of the simulations that best matched with the observed division plane.

      Concerning the convergence of the 3D cell division computer model, we have added in Section 4.3 (Material and Methods: Computer modeling of cell divisions) the justification about the number of Monte Carlo cycles. We have added a new Supplementary Figure S7 illustrating the convergence of the algorithm over different independent runs. We also corrected a typo on the number of Monte Carlo cycles (which was 500 instead of 5000 as initially written).

      Reviewer #2 (Public Review):

      This is an interesting manuscript aiming at identifying minimal rules that account for cell divisions in early Arabidopsis embryos. This research has two main strengths. The authors consider cell division in 3 dimensions, whereas most other studies on the orientation of cell divisions are restricted to 2 dimensions. Based on their observations, the authors proposed that the previously proposed probabilistic rule for cell division can be replaced by a deterministic rule, with sources of stochasticity coming from irregularities/imperfections in cell geometry. The manuscript is overall well-written. I nevertheless have a few concerns.

      1) What is the effect of embryo fixation on cell geometry? Could the irregularities be an artefact due to fixation? How robust are the conclusions to numerical perturbations of the position of cell surfaces?

      We used the fixation and staining protocol developed by one of us (JCP) (Truernit et al 2008). Yoshida et al. Developmental Cell (2014) used this same protocol, which they validated by comparison with live imaging data. The fixation and the following treatment could have an impact on cell geometry. For this reason, we have selected among a thousand embryo acquisitions, the embryos that are not or very few damaged with this treatment. The robustness of our results and conclusions to variability and alterations of cell geometries was also questioned by Reviewer #1. Please see above our in-depth answer to this point.

      2) Section 2.7 on attractor patterns is essentially descriptive and the conclusions seem to be based on qualitative observation of a few cases. Can the authors support them with quantitative measures? Or with simulations?

      We have completed this section with quantitative data when it was missing. In the apical outer domain, we had 135 observations at G6, which had been reached from G4 according to one or the other of the two main pathways shown in Figure 9A. These two possibilities accounted for 40% and 42% of observations, respectively.

      The other attractors shown in Figure 9 are rare cases (Fig. 9B: 1 case over 173; Fig. 9C: less than 9 cases over 309). The case shown in Figure 9B was previously documented in Scheres et al 1995 (cited in the manuscript).

      The lower frequency in the basal domain of alternative sequences leading to a same attractor pattern is consistent with the lower variability in this domain. However, the conclusion is the same as in the apical domain where the distribution between alternative sequences is more balanced: different sequences of division over several generations can lead to similar cell patterns.

    1. Author Response

      Reviewer #2 (Public Review):

      Fibular hemimelia (FH) is a rare genetic disorder with unknown mechanisms. In this study, the authors generated Axin1 conditional knockout (cKO) mice by depleting Axin1 gene specifically in Prx-1 expressing mesenchymal cells and demonstrated that Axin1 cKO mice developed FH phenotype with various severities. FH phenotype in Axin1 cKO mice can be rescued by either β-catenin or BMP inhibition if the inhibition was applied to the pregnant mother at E9.5 to E12.5. For mechanistic study, the authors showed elevated expression of BMP signaling molecules in limb tissue of Axin1 cKO mice and Axin1 regulated the degradation of pSmad5 in mesenchymal cells.

      The study has many strengths. 1) The study was performed with high rigor. Utilization of various cre lines to conditional KO Anix1 in different cell types to formally demonstrate the expression of Anix1 in Prx-1-expressing mesenchymal cells, but not in Sox9-, Col2-, and Osx-expressing cells is required for normal fibular development. 2) Treatment of Axin1 cKO mice with β-catenin and BMP inhibitor at different time points to demonstrate that inhibitors should be given during the early embryonic development, a very important point for considering the translational potential of the study. 3) Detailed in vitro experiments were performed to investigate the molecular mechanisms of Axin 1 on Smad 5 stability. 4) Both β-canenin and BMP signaling pathways are important, including skeletal development, this study used Axin1 cKO mice to integrate these two pathways together, which is a important and new contribution.

      Weaknesses of the study have been described below:

      1) Authors need to report/describe findings/pheotypes in bones other than fibula in Axin1 cKO mice (4-8-week-old) first, and then focus on fibular development. From the X-ray data shown in Figure 1D-E, it appears that Axin1 cKO mice have high bone mass or osteopetrosis. Thus, histology of bones (femur, tibia, knee joint) other than fibula should be provided.

      We have performed histology in femur, tibia and knee joint in Axin1 KO mice as the reviewer suggested.

      2) Fig. 2 described Axin1/2 dKO mice. I suggest to remove Figure 2 or move it to supplemental data. Including Axin1/2 dKO mice in the main text makes the story complicated and difficult to explain because the most of figures in this manuscript were on Axin 1, such as rescue experiments and molecular mechanistic study. Further, various severity of FH in Axin1 cKO mice are closer to human FH cases (various severity) than Axin1/2 dKO mice that have a completed loss of fibula. The title is also on Axin1. If Axin1/2 dKO mouse data are included in the main text, authors need to provide molecular explanation why Axin1/2 dKO mice have more severe phenotypes.

      To make the entire story more straightfoward, we have removed the Axin1/2 double KO data (Fig. 2) as the reviewer suggested.

      3) Please include a paragraph in the discussion regarding the limitation of the study. Is there any human report that FH patients have mutation in Axin 1 and its related downstream signal proteins such as β-catenin and BMP? Can FH being directed before birth and to treat pregnant mother? Do authors plan to use unbiased approaches such as RNAseq or proteomics to discover new gene/proteins that are regulated by Axin1 in mesenchymal cells?

      We have added a paragraph to discuss the limitation in the discussion section as the reviewer suggested. We have collaborated with Dr. Qinglin Kang and collected 9 samples from patients with FH disease and identified a mutation of β-catenin gene, encoding a potential phosphorylation site, which may lead to upregulation of β-catenin protein levels. In the future, we will investigate if the mutation of β-catenin affects its function in mesenchymal cells. We are currently planning to perform the RNA-Seq and proteomics experiments to identify novel downstream target gene(s) of Axin1.

    1. Author Response

      Reviewer #1 (Public Review):

      The present study by Zander et al. aims at improving our understanding of CD4+ T cell heterogeneity in response to chronic viral infections. The authors utilize the murine LCMV c13 infection model and perform single cell RNA seq analysis on day 10 post infection to identify multiple, previously unappreciated, T cell subsets. The authors then go on and verify these analyses using multi-color flow cytometry before comparing the transcriptome of CD4 T cells from chronic infection to a previously generated data set of CD4 T cells obtained from acutely-resolved LCMV infection.

      The analyses are very well done and provide some interesting novel insights. In particular, the comparison of CD4 T cell subsets across acute and chronic infections is very exciting as they provide a very valuable platform that can answer a long-standing question: do CD4 T cells in chronic infection undergo exhaustion similar to CD8 T cells. While this has been proposed for an extended period, this new dataset by Zander et al. can provide some novel insights by comparing individual cell subsets cross-infection. The manuscript would, however, benefit from a more extensive analysis and focus on this interesting point.

      We thank the reviewer for their time and careful assessment of our manuscript. We were happy to hear that the reviewer found our work interesting.

      On that note, the authors should take advantage of more accurate and present gene datasets to compare the 'dysfunctional' state of CD4 T cells in chronic infection vs acute infection. Also, a different illustration to demonstrate the module score analyses would be more intuitive.

      We have now included T cell “exhaustion” genesets from recently published data (Zander et. al 2019 Immunity), and we have also displayed the relative expression of select signature genes from these genesets in an updated supplemental figure 3.

      Also, at multiple sections in the manuscript, the authors are missing the accurate citations as they are still mentioned as '(Ref)'.

      We apologize for this oversight and have corrected these citations.

      Nevertheless, this study does not require major revisions.

      Reviewer #2 (Public Review):

      In their study "Delineating the transcriptional landscape and clonal diversity of virus-specific CD4+ T cells during chronic viral infection" Zander and co-workers analyze the phenotypic and clonotypic distributions of T cells specific to a LCMV epitope following infection with a chronic LCMV strain in mice. The paper largely follows an earlier study from the same group (Khatun JEM 2021) that has used a similar experimental strategy to analyze T cells responding to an LCMV strain establishing acute infection, and it adds a scTCRseq component to another earlier study of chronic LCMV (Zander Immunity 2022). The main contributions of the paper are to demonstrate that interesting differences between gene expression profiles between chronic and acute LCMV exist, and to identify a new T cell subset (of unknown functional significance).

      While the paper is framed around differences between T cell responses to acute and chronic infections, all analysis is done on T cells at day 10 post primary infection. At such an early time point even the acute LCMV strain virus is likely not completely cleared, or at the very least viral antigens are still presented. The relevance of the presented phenotypic differences to other settings with long-term chronic infection is thus questionable. Additionally, there are a number of methodological concerns regarding the robustness of the statistical and bioinformatic analyses that put in doubt some of the conclusions. Most notably, the analysis of fate biases needs to be substantiated by tests against baseline expectations from random assortment to test for statistical significance.

      We thank the reviewer for their careful review of our manuscript as well as their helpful comments.

      Regarding the day 10 time point-post LCMV Armstrong infection, several groups have previously reported that LCMV viral load is undetectable by day 10 post-infection (see one published example below), although we completely agree with the reviewer that there is still likely to be viral antigens being presented at this time point, as well as ongoing inflammation, which we believe (and as discussed further below) is actually a strength of the study as it allows for a more fair comparison of the transcriptional state of recently stimulated virus-specific CD4 T cells under different contexts (acute vs chronic LCMV infection) . We chose day 10 post LCMV Cl13 and LCMV Armstrong infections as the timepoint for analysis, as this is approximately the peak of the endogenous Gp66-77 CD4+ T cell response (see previously published data below), and is also when there is a more balanced distribution of Th1, Tfh, and T central memory precursor (Tcmp)/ or memory-like cells in these settings, thereby allowing for sufficient numbers of cells/cluster to conduct an in-depth analysis and high-resolution comparison of these subsets between the two different infections. Further, as some degree of TCR stimulation is still likely being experienced at this timepoint during LCMV Armstrong infection, we believe that this is a more useful comparison than at a memory time point (when CD4 T cells are in a quiescent state) as it gives us a better picture of the differentially expressed genes at the peak of the CD4 T cell response, and also provides insight into how chronic viral infection perturbs the transcriptional program of CD4 T cells.

    1. Author Response

      Reviewer #1 (Public Review):

      Several questions have remained regarding the characteristics of these cells:

      1) Based on the transcriptome data in Figure 2, the authors inferred that thymic macrophages are "specialized in lysosome degradation of phagocytosed material and antigen presentation" yet did not show functional data to support these claims. Functional assays such as phagocytosis and antigen presentation are desirable, especially in comparison to other well characterized macrophage populations.

      We agree with the reviewer that additional functional characterization of thymic macrophages will strengthen the conclusions of our manuscript. We have performed antigen presentation assay and in vitro phagocytosis assay to functionally characterize the thymic macrophages. Indeed, thymic macrophages seem to be quite good antigen presenting cells – not as good as thymic DCs, but much better than peritoneal macrophages. This is documented in Fig. 3A and B. They were also good phagocytes both in vitro and in vivo as demonstrated in Fig. 3C-G. Surprisingly, peritoneal macrophages were better in the in vitro phagocytosis assay. We attribute this result to thymic macrophages’ poor survival during the sorting and in vitro culture.

      2) Do transcriptomes of CX3CR1+ thymic macrophages in old mice significantly differ from those of young mice?

      This is a very interesting question that we plan to explore in the future, but we feel it is beyond the scope of the current manuscript.

      3) It would be helpful to better graphically show the compositions (both cell number and cell ratio) of thymic macrophage subsets (TIM4+, CX3CR1+, and others) in mice at different ages (1 week, 6 weeks, and 4 months old). It is not straightforward to deduce all the information based on the current data presentation.

      We thank the reviewer for the suggestion! Plotting the cell numbers did reveal a peak in young age and then significant decline in the number of Tim4+ cells and a trend for accumulation of Tim4+ cells with age. Unfortunately, older mice show great variability in thymus size, which prevented the Tim4- result from being statistically significant. We have added these data to Fig. 8F.

      4) The description of the gating strategy of thymic macrophages for Figure 1 is quite verbose. Adding a step-wise gating strategy of thymic macrophages as a figure panel would be helpful for readers to follow the experimental details.

      We thank the reviewer for the suggestion. The description of the gating strategy has been stripped to 2 panels that capture its essence (Fig. 1B).

      Reviewer #2 (Public Review):

      This work provides by far the most thorough characterization of thymic macrophages. The authors used bulk RNA-seq, single-cell seq and fate mapping animal models to demonstrate the phenotype, origin and diversity of thymic macrophages. Overall the manuscript is well written and the conclusions of the paper are mostly well supported by data.

      Some aspects of data acquisition and data analysis need to be clarified.

      1) the authors should state what does row min row max in figure2 b,d refer to. is this expression value on log scale? In figure 2d, the authors compared their own RNAseq data with ImmGen seq data, what kind of normalization did the authors apply?

      We appologize for not making this clear. The values in Fig. 2b and d (current Fig. 2A and C) are expression values on log scale. We have included this information in the figure.

      Our data is part of the IMMGEN dataset. We sorted the cells and sent them to the US for RNA sequencing. That is why we referred to it as “our” data. However, to avoid confusion we changed the wording to clearly reflect that the data are from IMMGEN.

      2)The authors used immunofluorescent to identify the localization of two populations of macrophages, where they used merTK staining to indicate all macrophages. However, MerTK expression may not restrict to immune cells. The authors are encouraged to confirm that MerTK only labels macrophages in thymus by co-staining with F4/80 or CD45. Tim4 can also be used in immunofluorescence.

      We agree that staining with additional macrophage markers will strengthen our conclusions about ThyMacs localization. We have performed staining with CD64 together with MerTK or Tim4. CD64 and MerTK almost completely overlapped and so did CD64 and Tim4 in the cortex. We could not stain MerTK and Tim4 together because the antibodies are raised in the same species (rat). Additional evidence for the specificity of these markers for thymic macrophages comes from Fig. 3E and F showing the high degree of co-localization of apoptotic cells (TUNEL+) with MerTK or Tim4. Finally, Fig. 4 figure supplement 1 also clearly shows the distribution of TIM4 and CD64 in the whole thymus.

      3) The data of Cx3cr1+ cells accumulation with age in thymus is very interesting, and as the author has discussed, might indicate their contribution to thymus involution. However, the authors only showed change of percentage. As the total macrophages numbers decreased with age, it is not clear whether these cells actually "accumulate" with age. It will help us to assess if this increased percentage of Cx3Cr1+ cells is an actual increase of "influx" or due to the decrease of the self-maintain Tim4+ macrophage subsets.

      The reviewer is raising a very important point. As the changes in the Tim4+ and Tim4- thymic macrophages proportions with age occur at the background of thymic involution, it is difficult to judge whether Tim4+ cells self-maintain and whether Tim4- cells accumulate. Plotting the cell numbers revealed a peak in young age and then significant decline in the number of Tim4+ cells and a trend for accumulation of Tim4+ cells with age. Unfortunately, older mice show great variability in thymus size, which prevented the Tim4- result from being statistically significant. We have added these data to Fig. 8F.

      Reviewer #3 (Public Review):

      This study by Zhou et al. focuses on thymic macrophages and shows that two populations can be distinguished with different identities, localization and origin. Authors use several murine reporter and fate-mapping models, coupled with flow cytometry and transcriptomics approach to support their claims.

      Overall, the question tackled by this study is interesting, thymic macrophages having a bit being forgotten in the last decade which has seen many studies similar to the one presented here in other organs. So, the stated aim to closing this gap is relevant. But the actual version of the study suffers from many defects, more or less severe, which affect the clarity and the persuasiveness of it.

      • About the plan, authors study the origin of the thymic population and provide data in fig 2, 3 & 4 assuming that thymic macs form a homogeneous population. But from fig 5, they distinguish 2 populations and study them separately. So the end of the paper renders obsolete the beginning, that asks for a revision of the whole plan.

      We agree with the reviewer that there is more than one way to tell this story and we have been agonizing over our plan. However, we respectfully disagree that the beginning of the paper is made obsolete by the ending for several reasons:

      1) The initial figures in our manuscript contain very fundamental characterizaition of ThyMacs. Just as the revelation of a heterogeneity in liver macrophages or lung macrophages (ref) does not render all prior research on these cells obsolete, the initial figures in our manuscript are an essential part of the story. Such data are available for all other studied tissue resident macrophage populations. Removing them will be a disservice to the community.

      2) Another reviewer asked for deeper characterization of ThyMacs based on the data in Fig. 2. Accommodating this request will be very difficult if we remove this part.

      Nevertheless, we agree that ThyMacs heterogeneity is the central claim of the manuscript and should be introduced earlier. Now, the original figure 5 (current Fig. 4) that described the heterogeneity has been moved before the original figures 3 and 4 (current Fig. 5 and 6). Additional analyses distinguishing Tim4+ and Tim4- ThyMacs has been incorporated in current Fig. 5 and 6.

      • The figure 1 is not very clear. The backgating should be added in 1a. Or why not using the color map axis mode from FlowJo to show 3 parameters at a glance? The gating strategy should be more clearly displayed on the figure. On fig 1S3, there are clearly 2 pops in the CX3CR1-GFP mice. Why not starting from this to introduce the two populations?

      We thank the reviewer for the suggestion. We have included a color map axis to show MerTK, CD64, and F4/80 in one plot. The description of the gating strategy has been stripped to 2 panels that capture its essence. \We agree that there are several indications for heterogeneity among thymic macrophages, starting with Fig. 1E – the expression of Tim4, and Fig S4c – the expression of CX3CR1-GFP. We have added extra text at the beginning of the paragraph describing current Fig. 4 to point out these facts.

      • The figure 2 could be revised also. First, the panel 2a is useless and should be removed. A PC analysis of all the macs would be more useful here. Also, the color code used for the genes is confusing. Why genes up in ThyMacs are red in 2b but only half of them in 2d? Info can be found in the legend but it should be more clear on a graphical point of view.

      We have revised Fig. 2 according to the reviewer’s suggestions. The PCA analysis is consistent with the hierarchical clustering and shows that splenic and liver macrophages are most closesly related to ThyMacs. We agree that the presence of red in both heatmaps is confusing and we have changed the color code – color was removed from current Fig. 2A but retained in Fig. 2C.

      • For figure 3, what is the timepoint of the panel 3b? Here, authors should show microglia and ThyMacs for both timepoints and conclude based on the comparison. If ThyMacs are as stable as the microglia, no replacement. If not, replacement. For the panel 3f, n=3 is too low to be convinced notably with the standard variation here. And displaying the dot plot with 11% of blood mono from donor while the median being around 20 is not fair, authors should present the most representative plot. For the panel 3h, there are more GFP (in term of MFI) for TEC and ThyMacs than for total cells. How is it possible? TECs and ThyMacs should be in the total cells? Or the gating is not clear enough?

      We thank the reviewer for pointing our omissions. Fig. 3b (current Fig. 5B) is from E19.5 and we have added this information to the figure. We also agree that in Fig. 3f (current Fig. 5F) the sample number is too small and the variation too large to make solid conclusions. That is why we have repeated the partial chimeras experiment trying to irradiate as much as possible of the mice without affecting the thymus. We have substituted the data in the Fig. 3e and 3f with the new data. For Fig. 3h, we appologize for not labeling the data clearly. The panels labeled “single, live cells” should be labeled as “thymocytes” as they were obtained without enzymatic digestion that is essential for both TECs and ThyMacs. However, we found an important caveat in the thymus transplant experiment. It appeared that some of the thymus macrophages were GFP positive not because they express GFP but because they have engulfed GFP+ cells. As a result our experiments with embryonic GFP+ thymus transplants overestimate the percentage of donor-derived ThyMacs (all of them were GFP+). We have repeated the thymus transplantation experiments with congenically marked thymuses (CD45.2 donor and CD45.1 host). While this set up did not allow us to use the thymic epithelial cells as positive control because they are CD45-, we did identify host-derived ThyMacs, consistent with Tim4- cells originating from adult HSCs. Thus, we have replaced the previous data in Fig. 3H and 3I with current figures 5H and 5I.

      • For figure 4, the EdU staining (4e) is not convincing at all. The signal is very low (as compared to 4c for example.

      We agree that signal after 21d chase is a lot weaker than after 2 h (Fig. 4c) or 21d (Fig. 4e) of EdU pulse. The reason we decided to keep this data is that: 1) the thymocytes also have much lower EdU staining after 21d chase compared to 2h and 21d of EdU pulse; 2) The results from EdU staining are very consistent with the data from Ki67 staining, cell cycle analysis, and scRNA-Seq revealing a small population (~5%) of cycling ThyMacs.

      • For figure 7, the interpretation of the data and the way to present them are not clear. Authors use an inducible fate-mapping model. The fact that Tim4- loose their signal with time argue for a replacement by non-labelled cells (blood monocytes) whereas Tim4+ ones are stable meaning they self-maintain. It is what authors claim. But how it fits with previous data where they say that Tim4+ derived form CX3CR1+? The explanation that is a bit subtended here but not enough clearly shown is that CX3CR1+ give rise to Tim4+ during embryonic development but is stops after, Tim4 self-renew independently, and CX3CR1+ are slowly replaced by monocytes. As this is the central claim of the paper, it should be most clearly reported and for this, a substantial change of the whole plan is required.

      We thank the reviewer for pointing out the need for better explanation. The maintenance of the different populations of ThyMacs is indeed complex and proceeds in different ways in the different periods of life. We have added some extra data to Fig. 7 (current Fig. 8) that we hope will add some clarity to the maintenance of thymic macrophages with age. The new Fig. 8F shows the dynamics of the cell numbers of Tim4+ and Tim4- macrophages with age. Tim4+ cells reach a peak in young mice and decline significantly as mice age. So, we do not think that they are self-maintaining but instead, undergo slow attrition with very limited replacement. These results are consistent with Fig. 6I showing low levels of Mki67 in Tim4+ cells. Tim4- are a different story: they progressively accumulate with age. Although the variability in thymus size and Tim4- macrophages in very old mice is too great for the data to reach significance, the trend is clear.

      As for the dynamics of the populations in the embryonic period, we added data formally demonstrating that TIM4+CX3XR1- are derived from CX3CR1+ cells by fate mapping (Fig. 7E-G). We induced re-combination in pregnant ROSA26LSL-GFP mice pregnant from Cx3cr1CreER males at E15.5 when almost all ThyMacs are Cx3cr1+ (Fig. 7A). Just before birth, at E19.5, we could find a substantial proportion of TIM4+CX3CR1- cells among the fate mapped GFP+ macrophages, indicating that Cx3cr1+ cells, indeed, give rise to TIM4+CX3CR1- cells. As pointed out before, this pathway gets exhausted by the first week after birth – at d7 all ThyMacs are TIM4+.

    1. Author Response

      Reviewer #1 (Public Review):

      High resolution mechanistic studies would be instrumental in driving the development of Cas7-11 based biotechnology applications. This work is unfortunately overshadowed by a recent Cell publication (PMID: 35643083) describing the same Cas7-11 RNA-protein complex. However, given the tremendous interest in these systems, it is my opinion that this independent study will still be well cited, if presented well. The authors obviously have been trying to establish a unique angle for their story, by probing deeper into the mechanism of crRNA processing and target RNA cleavage. The study is carried out rigorously. The current version of the manuscript appears to have been rushed out. It would benefit from clarification and text polishing.

      We thank the reviewer for the positive and helpful comments that have made the manuscript more impactful.

      To summarize the revisions, we have resolved the metal-dependence issue, updated the maps in both main and supplementary figures that support the model, re-organized the labels for clarity, and added the comparison between our and Kato et al.’ structures.

      In addition, we describe a new result with an isolated C7L.1 fragment that retains the processing and crRNA binding activities.

      Reviewer #2 (Public Review):

      In this manuscript, Gowswami et al. solved a cryo-EM structure of Desulfonema ishimotonii Cas7-11 (DiCas7-11) bound to a guiding CRISPR RNA (crRNA) and target RNA. Cas7-11 is of interest due to its unusual architecture as a single polypeptide, in contrast to other type III CRISPR-Cas effectors that are composed of several different protein subunits. The authors have obtained a high-quality cryo-EM map at 2.82 angstrom resolution, allowing them to build a structural model for the protein, crRNA and target RNA. The authors used the structure to clearly identify a catalytic histidine residue in the Cas7-11 Cas7.1 domain that is important for crRNA processing activity. The authors also investigated the effects of metal ions and crRNAtarget base pairing on target RNA cleavage. Finally, the authors used their structure to guide engineering of a compact version of Cas7-11 in which an insertion domain that is disordered in the cryo-EM map was removed. This compact Cas7-11 appears to have comparable cleavage activity to the full-length protein.

      The cryo-EM map presented in this manuscript is generally of high quality and the manuscript is very well illustrated. However, some of the map interpretation requires clarification (outlined below). This structure will be valuable as there is significant interest in DiCas7-11 for biotechnology. Indeed, the authors have begun to engineer the protein based on observations from the structure. Although characterization of this engineered Cas7-11 is limited in this study and similar engineering was also performed in a recently published paper (PMID 35643083), this proof-of-principle experiment demonstrates the importance of having such structural information.

      The biochemistry experiments presented in the study identify an important residue for crRNA processing, and suggest that target RNA cleavage is not fully metal-ion dependent. Most of these conclusions are based on straightforward structure-function experiments. However, some results related to target RNA cleavage are difficult to interpret as presented. Overall, while the cryo-EM data presented in this work is of high quality, both the structural model and the biochemical results require further clarification as outlined below.

      We thank the reviewer for the positive and helpful comments that have made the manuscript more impactful.

      To summarize the revisions, we have resolved the metal-dependence issue, updated the maps in both main and supplementary figures that support the model, re-organized the labels for clarity, and added the comparison between our and Kato et al.’ structures.

      In addition, we describe a new result with an isolated C7L.1 fragment that retains the processing and crRNA binding activities.

      1. The DiCas7-11 structure bound to target RNA was also recently reported by Kato et al. (PMID 35643083). The authors have not cited this work or compared the two structures. While the structures are likely quite similar, it is notable that the structure reported in the current paper is for the wild-type protein and the sample was prepared under reactive conditions, resulting in a partially cleaved target. Kato et al. used a catalytically dead version of Cas7-11 in which the target RNA should remain fully intact. Are there differences in the Cas7-11 structure observed in the presence of a partially cleaved target RNA in comparison to the Kato et al. structure? Such a comparison is appropriate given the similarities between the two reports. A figure comparing the two structures could be included in the manuscript.

      We have added a paragraph on page 12 that describe the differences in preparation of the two complexes and their structures. We observed minor differences in the overall protein structure (r.m.s.d. 0.918 Å for 8114 atoms) but did observe quite different interactions between the protein and the first 5’-tag nucleotide (U(-15) vs. G(-15)) due to the different constructs in pre-crRNA, which suggests an importance of U(-15) in forming the processing-competent active site. We added Figure 2-figure supplementary 3 that illustrates the similarities and the differences.

      2.The cryo-EM density map is of high quality, but some of the structural model is not fully supported by the experimental data (e.g. protein loops from the alphafold model were not removed despite lack of cryo-EM density). Most importantly, there is little density for the target RNA beyond the site 1 cleavage site, suggesting that the RNA was cleaved and the product was released. However, this region of the RNA was included in the structural model. It is unclear what density this region of the target RNA model was based on. Further discussion of the interpretation of the partially cleaved target RNA is necessary. Were 3D classes observed in various states of RNA cleavage and with varied density for the product RNAs?

      We should have made it clear in the Method that multiple maps were used in building the structure but only submitted the post-processed map to reviewers. When using the Relion 4.0’s local resolution estimation-generated map, we observed sufficient density for some of the regions the reviewer is referring to. For instance, the site 1 cleavage density does support the model for the two nucleotides beyond site 1 cleavage site (see the revised Figure 1 & Figure 1- figure supplement 3).

      However, there are protein loops that remain lack of convincing density. These include 134141 and 1316-1329 that are now removed from the final coordinate.

      The “partially cleaved target RNA” phrase is a result of weak density for nucleotides downstream of site 1 (+2 and +3) but clear density flanking site 2. This feature indicates that cleavage likely had taken place at site 1 but not site 2 in most of the particles went into the reconstruction. To further clarify this phrase, we added “The PFS region plus the first base paired nucleotide (+1*) are not observed.” on page 4 and better indicate which nucleotides are or are not built in our model in Figure 1.

      1. The authors argue that site 1 cleavage of target RNA is independent of metal ions. This is a potentially interesting result, but it is difficult to determine whether it is supported by the evidence provided in the manuscript. The Methods section only describes a buffer containing 10 mM MgCl2, but does not describe conditions containing EDTA. How much EDTA was added and was MgCl2 omitted from these samples? In addition, it is unclear whether the site 1 product is visible in Figures 2d and 3d. To my eye, the products that are present in the EDTA conditions on these gels migrated slightly slower than the typical site 1 product. This may suggest an alternate cleavage site or chemistry (e.g. cyclic phosphate is maintained following cleavage). Further experimental details and potentially additional experiments are required to fully support the conclusion that site 1 cleavage may be metal independent.

      As we pointed out in response to Reviewer 1’s #8 comment, this conclusion may have been a result of using an older batch of DiCas7-11 that contains degraded fragments.

      As shown in the attached figure below, “batch Y” was an older prep from our in-house clone and “batch X” is a newer prep from the Addgene purchased clone (gel on right), and they consistently produce metal-independent (batch Y) or metal-dependent (batch X) cleavage (gel on left). It is possible that the degraded fragments in batch Y carry a metal-independent cleavage activity that is absent in the more pure batch X.

      We further performed mass spectrometry analysis of two of the degraded fragments from batch Y (indicated by arrows below) and discovered that these are indeed part of DiCas7-11. We, however, cannot rationalize, without more experimental evidence, why these fragments might have generated metal-independent cleavage at site 1. Therefore, we simply updated all our cleavage results from the new and cleaner prep (batch X) (For instance, Figure 3c). As a result, all references to “metal-independence” were removed.

      With regard to the nature of cleaved products, we found both sites could be inhibited by specific 2’-deoxy modifications, consistent with the previous observation that Type III systems generate a 2’, 3’-cyclic product in spite of the metal dependence (for instance, see Hale, C. R., Zhao, P., Olson, S., Duff, M. O., Graveley, B. R., Wells, L., ... & Terns, M. P. (2009). RNA-guided RNA cleavage by a CRISPR RNA-Cas protein complex. Cell, 139(5), 945-956.)

      We added this rationale based on the new results and believe that these characterizations are now thorough and conclusive

      1. The authors performed an experiment investigating the importance of crRNA-target base pairing on cleavage activity (Figure 3e). However, negative controls for the RNA targets in the absence of crRNA and Cas7-11 were not included in this experiment, making it impossible to determine which bands on the gel correspond to substrates and which correspond to products. This result is therefore not interpretable by the reader and does not support the conclusions drawn by the authors.

      Our original gel image (below) does contain these controls but we did not include them for the figure due to space considerations (we should have included it as a supplementary figure). We have now completely updated Figure 3e with much better quality and controls. Both the older and the updated experiments show the same results.

      Original gel for Figure 3e containing controls.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper showing that postsynaptic bursts in the presence of dopamine produce input-specific LTP in hippocampal synapses 10 minutes after they were primed with negatively coincident pre- and postsynaptic activity. LTP requires NMDAR activation during priming and involves a cAMP-PKA cascade and protein synthesis. When this synaptic rule is incorporated into a computational model, reinforced learning is possible through selective reactivation of neurons. Experiments in behaving mice confirmed that neurons reactivated after an exploratory period display more activity than non-reactivated neurons.

      We thank the Reviewer for their positive comments on our manuscript. We have incorporated the Reviewer‘s suggestions.

      Reviewer #2 (Public Review):

      Building on their previous 2015 study with Brzosko, Fuchsberger et al. propose a potential solution for how the brain associates with memory events that are separated in time. The authors find that in the presence of dopamine, postsynaptic bursts produce input-specific LTP at hippocampal CA3-CA1 synapses ten minutes after priming with a post-before-pre spiking-pairing protocol. They explore the signalling somewhat, for example showing a need for postsynaptic NMDARs as well as for protein synthesis. Using a computer model, they find that this form of plasticity enables reinforcement learning. A few key predictions were verified using an in-vivo spatial learning model.

      This is a strong study that addresses a long-standing fundamental problem in modern neuroscience research, namely the temporal credit assignment problem of how temporally well-separated signals can be meaningfully associated and learned in the brain. The experiments are carefully executed, the rationale is clearly explained, and - excepting Fig 6-8 - the figures are for the most part easy to understand. The study ranges from in-vitro electrophysiology across computer modelling to awake-behaving in-vivo experiments to persuasively argue that their novel findings may provide a candidate solution to the temporal credit assignment problem. Taken at face value, this work is likely to be highly impactful, however, some control experiments were missing or are perhaps just not shown (e.g., stability, stability in the presence of anisomycin, the effect of anisomycin on firing, and similar), which makes the validity of the findings a bit hard to evaluate at times.

      We thank the Reviewer for their positive evaluation of our study and address all the points raised below.

      Reviewer #3 (Public Review):

      Fuchsberger et al. demonstrate that an otherwise LTD-inducing STDP protocol can produce LTP if followed by burst reactivation of post-synaptic neurons in the presence of dopamine. Using computational modeling and single-photon imaging in the CA1 in mice, they propose these findings are relevant to spatial over-representation at a reward location.

      This is a follow-up of the two previous studies from the same group (Brzosko et al., 2015 and Andrade-Talavera et al., 2016) where they showed a post-before-pre STDP protocol, which by default induces a (pre-synaptic) LTD, will induce synaptic potentiation in the presence of dopamine and continuous synaptic activity. The main conceptual difference between this manuscript and these previous studies is that continuous synaptic activity can be replaced by post-synaptic burst. This means that reactivation of post-synaptic neurons without any further pre-synaptic instruction is sufficient for successful LTP induction.

      Mechanistically, the two protocols (continuous vs burst activation) appear to be similar (but not identical). For example, both require the activation of post-synaptic NMDAr during STDP pairing, and both depend on the AC/PKA pathways. Additionally, there are two new observations here: The activity of voltage-gated calcium channels during bursting is required for potentiation; the burst-induced potentiation also requires protein synthesis.

      The evidence provided at this stage is strong.

      Major point:

      It is not clear to me how the STDP studied here relates to the next part of the study, the reward-based navigation task. My interpretation is that the authors consider the activity before reaching the reward location (approaching time) as resembling the STDP priming protocol, the activity at the reward location as equivalent to the bursting protocol, and consumption of the reward as similar to dopamine application. If so, what is the circumvential evidence that the activity during the approach induces any form of plasticity?

      The link between the two is not obvious and I see the manuscript as two interesting but not naturally linked stories.

      The Reviewer’s interpretation is correct. We considered the activity during navigation on the maze as the animal approaches the reward resembling the STDP priming protocol. Substantial evidence supports a role of NMDAR-dependent STDP in the formation of place fields during navigation (Mehta, Hippocampus 2015; Moore et al., 2021). It has been postulated that both LTP and LTD are involved in place field formation. This was based on the observation that place fields shift backwards with experience (Mehta & McNaughton PNAS 1997), and a computational model predicted that without LTD place field broadening would occur (Mehta et al. Neuron 2000). Thus LTP is required when entering the place field, and LTD when the animal exits the place field (Mehta et al. Neuron 2000). This is specific to navigation, as opposed to just walking on a linear track without task, and place field plasticity is predictive of navigational performance (Moore et al. Nature 2021).

      We have added this to the Discussion section (page 13, line 344).

      Mehta MR. 2015. From synaptic plasticity to spatial maps and sequence learning. Hippocampus 25:756-62.<br /> Mehta MR, Quirk MC, Wilson MA. 2000. Experience-dependent asymmetric shape of hippocampal receptive fields. Neuron. 25: 707-15. Moore JJ, Cushman JD, Acharya L, Popeney B, Mehta MR. 2021. Linking hippocampal multiplexed tuning, Hebbian plasticity and navigation. Nature. 599: 442-448.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors ask a key question in the field of adult plasticity, and in particular, amblyopia treatment: whether transient dark exposure followed by light re-introduction disrupts neural representation for basic stimulus attributes in a manner that could negatively impact vision. Prior work by Rose and colleagues using calcium imaging showed that closing one eye in adult mice leaves the responsiveness of V1 neurons unchanged but alters their orientation preference and pairwise correlations; such representational drift may require downstream areas to adjust how they readout V1 signals. The question posed here is whether binocular visual deprivation in adult mice does the same. The authors use 2-photon calcium imaging in 6 awake, head-fixed [transgenic - GCaMP6f driven by the EMX1 promoter] mice before and after transient dark exposure to record ensemble responses of layer 2/3 excitatory V1 neurons to oriented gratings of varying spatial frequencies. Data were acquired twice at baseline (allowing for an assessment of representational drift during exposure to the natural [cage] environment), once immediately after 8 days of dark exposure and once about 8 days after animals were once again exposed to their natural [cage] environment.

      The study appears to be generally well designed with multiple analytical approaches trained on the same questions. Major strengths include the ability to analyze a large number of neuronal responses simultaneously in the awake-behaving state using calcium imaging in transgenic mice, and the ability to record activity in the same neurons across several weeks and following different behavioral manipulations. A relative weakness was the implication of only being able to elicit relevant visual responses from a small fraction of V1 neurons for comparison purposes. This begs the question of what may have happened to the neurons that were not tracked, and whether this in fact may have been significant.

      A consist finding across laboratories is that 30-50% of the neural population in rodent V1 is visually responsive to grating stimuli, and drifting gratings recruit neurons to a greater extent that static gratings1–5. This is unrelated to tracking, as it is the case for single-session analysis. The reviewer brings up an interesting question, given we are tracking neurons across sessions we are in a unique position to gain insight into properties that might correlate with responsiveness. To that end, we performed additional analysis to determine whether low trial reliability is predictive of whether a specific neuron will ‘drop out’ from being visually responsive on a subsequent session. The new analysis shows that under control conditions, trial reliability is correlated with reliability on the subsequent session. Consistent with our observation that reliability across the population decreases following dark exposure and then improves during light reintroduction, the new analysis also shows that the change in reliability for individual neurons is significantly skewed to lower values in the DE condition (single-sample KS test), while in the light reintroduction condition values are significantly skewed in the positive direction (Figure 3 – figure supplement 3A).

      1. Ohki, K., Chung, S., Ch’ng, Y. H., Kara, P. & Reid, R. C. Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature 433, 597–603 (2005).
      2. Montijn, J. S., Meijer, G. T., Lansink, C. S. & Pennartz, C. M. A. Population-Level Neural Codes Are Robust to Single-Neuron Variability from a Multidimensional Coding Perspective. Cell Rep. 16, 2486–2498 (2016).
      3. Ko, H., Mrsic-Flogel, T. D. & Hofer, S. B. Emergence of feature-specific connectivity in cortical microcircuits in the absence of visual experience. J. Neurosci. 34, 9812–9816 (2014).
      4. Jeon, B. B., Swain, A. D., Good, J. T., Chase, S. M. & Kuhlman, S. J. Feature selectivity is stable in primary visual cortex across a range of spatial frequencies. Sci. Rep. 8, 15288 (2018).
      5. de Vries, S. E. J. et al. A large-scale standardized physiological survey reveals functional organization of the mouse visual cortex. Nat. Neurosci. 23, 138–151 (2020).

      Reviewer #3 (Public Review):

      This paper uses transient dark exposure to induce plasticity in the adult visual cortex. It shows that transient dark exposure in the adult mice has opposing effects at the single neuronal level versus the population level. At the population level, the stimulus representation is degraded following dark exposure but rebounds back to normal within 8 days of light re-introduction. Thus, dark exposure does not have a lasting negative impact on the visual cortex. Unexpectedly, at the single neuronal level, following dark exposure a fraction of neurons show more stable responses and higher correlations among pairs of neurons. It is inspiring to hypothesize that this fraction of neurons may form a plastic substrate for representation of complex natural scenes.

      Strengths:

      The paper uses a combination of single neuron and population analyses to identify the effects of transient dark exposure on visual responses in the adult mouse visual cortex. It succeeds in identifying degradation of stimulus representation at the population level following dark exposure, and stabilization of visual stimulus preference at the single neuron level as well as stabilization of stimulus correlations among pairs of neurons. This success is in part due to an impressively large set of simple visual stimuli used (180 different stimuli). This large set allows the authors to identify even small changes in stimulus preferences at the single neuronal level. This paper uses transient dark exposure to induce plasticity. An alternative and commonly used method to induce plasticity is monocular deprivation. This paper shows that at the single neuron level, the effects of transient dark exposure are different from the previously reported effects of monocular deprivation. This is an important finding for the field.

      Weaknesses:

      The analysis methods used are thoughtful and complementary. The statistical tests are mostly performed on visual responses pooled across 6 mice. These statistical tests support the claims of the paper. However, we are left wondering whether the effects identified would also be significant for visual responses of each individual mouse.

      Further analysis of individual mice is now included. From this analysis we can verify that the effects observed are not driven by one or two animals, rather are representative of the majority of the animals included in the study.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors' results revolutionize our understanding of the mechanism of arrestin-mediated GPCR internalization. They identified previously unknown elements on the non-receptor-binding side of arrestins participating in the process. The findings are ground-breaking and very important to the large field of GPCR signaling.

      We are pleased that the reviewer appreciates the significance of our findings. We appreciate the important critiques and corrections, and have done our best to address them.

      Reviewer #2 (Public Review):

      This manuscript from the Von Zastrow laboratory proposes an additional site on Beta-arrestin2 (arrestin 3) to the well characterised Ctail (AP-2+clathrin binding) is responsible in significant part for the downregulation and likely onward signalling from endosomes of a range of GPCRs. The cell biology appears to me to be thoroughly carried out and data presented in a statistically appropriate manner.

      The conclusions made seem appropriate and justified although considerably more information could be extracted with little extra effort I think - including formerly proving that internalisation is by CME by using CME-specific CME inhibitors or inhibitory constructs.

      The major weakness is the lack of mechanistic information, most specifically what does the Clobe bind to in order to allow Beta arrestin2 incorporation into CCVs?

      The referencing of the relevant literature is sometimes careless or inappropriate, especially with respect to CME.

      We are pleased that the reviewer found our conclusions generally appropriate and well-justified. The reviewer is correct that we presently do not know the interaction(s) responsible for CLB activity. We have addressed the reviewer’s critiques with new data and / or changes to the text as follows.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors ask an interesting question as to whether working memory contains more than one conjunctive representation of multiple task features required for a future response with one of these representations being more likely to become relevant at the time of the response. With RSA the authors use a multivariate approach that seems to become the standard in modern EEG research.

      We appreciate the reviewer’s helpful comments on the manuscript and their encouraging comments regarding its potential impact.

      I have three major concerns that are currently limiting the meaningfulness of the manuscript: For one, the paradigm uses stimuli with properties that could potentially influence involuntary attention and interfere in a Stroop-like manner with the required responses (i.e., 2 out of 3 cues involve the terms "horizontal" or "vertical" while the stimuli contain horizontal and vertical bars). It is not clear to me whether these potential interactions might bring about what is identified as conjunctive representations or whether they cause these representations to be quite weak.

      We agree it is important to rule out any effects of involuntary attention that might have been elicited by our stimulus choices. To address the Reviewer’s concern, we conducted control analyses to test if there was any influence of Stroop-like interference on our measures of behavior or the conjunctive representation. To summarize these analyses (detailed in our responses below and in the supplemental materials), we found no evidence of the effect of compatibility on behavior or on the decoding of conjunctions during either the maintenance or test periods. Furthermore, we found that the decoding of the bar orientation was at chance level during the interval when we observe evidence of the conjunctive representations. Thus, we conclude that the compatibility of the stimuli and the rule did not contribute to the decoding of conjunctive representations or to behavior.

      Second, the relatively weak conjunctive representations are making it difficult to interpret null effects such as the absence of certain correlations.

      The reviewer is correct that we cannot draw strong conclusions from null findings. We have revised the main text accordingly. In certain cases, we have also included additional analyses. These revisions are described in detail in response the reviewer’s comments below.

      Third, if the conjunctive representations truly are reflections of working memory activity, then it would help to include a control condition where memory load is reduced so as to demonstrate that representational strength varies as a function of load. Depending on whether these concerns or some of them can be addressed or ruled out this manuscript has the potential of becoming influential in the field.

      This is a clever suggestion for further experimentation. We agree that observing the adverse effect of memory load is one of the robust ways to assess the contributions of working memory system for future studies. However, given that decoding is noisy during the maintenance period (particularly for the low-priority conjunctive representation) even with a relatively low set-size, we expect that in order to further manipulate load, we would need to alter the research design substantially. Thus, as the main goal of the current study is to study prioritization and post-encoding selection of action-related information, we focused on the minimum set-size required for this question (i.e., load 2). However, we now note this load manipulation as a direction for future research in the discussion (pg. 18).

      Reviewer #2 (Public Review):

      Kikumoto and colleagues investigate the way visual-motor representations are stored in working memory and selected for action based on a retro-cue. They make use of a combination of decoding and RSA to assess at which stages of processing sensory, motor, and conjunctive information (consisting of sensory and motor representations linked via an S- R mapping) are represented in working memory and how these mental representations are related to behavioral performance.

      Strengths

      This is an elaborate and carefully designed experiment. The authors are able to shed further light on the type of mental representations in working memory that serve as the basis for the selection of relevant information in support of goal- directed actions. This is highly relevant for a better understanding of the role of selective attention and prospective motor representations in working memory. The methods used could provide a good basis for further research in this regard.

      We appreciate these helpful comments and the Reviewer’s positive comments on the impact of the work.

      Weaknesses

      There are important points requiring further clarification, especially regarding the statistical approach and interpretation of results.

      • Why is there a conjunction RSA model vector (b4) required, when all information for a response can be achieved by combining the individual stimulus, response, and rule vectors? In Figure 3 it becomes obvious that the conjunction RSA scores do not simply reflect the overlap of the other three vectors. I think it would help the interpretation of results to clearly state why this is not the case.

      Thank you for the suggestion, we’ve now added the theoretical background that motivates us to include the RSA model of conjunctive representation (pg. 4 and 5). In particular, several theories of cognitive control have proposed that over the course of action planning, the system assembles an event (task) file which binds all task features at all levels – including the rule (i.e., context), stimulus, and response – into an integrated, conjunctive representation that is essential for an action to be executed (Hommel 2019; Frings et al. 2020). Similarly, neural evidence of non-human primates suggests that cognitive tasks that require context-dependency (e.g., flexible remapping of inputs to different outputs based on the context) recruit nonlinear conjunctive representations (Rigotti et al. 2013; Parthasarathy et al. 2019; Bernardi et al. 2020; Panichello and Buschman, 2021). Supporting these views, we previously observed that conjunctive representations emerge in the human brain during action selection, which uniquely explained behavior such as the costs in transition of actions (Kikumoto & Mayr, 2020; see also Rangel & Hazeltine & Wessel, 2022) or the successful cancelation of actions (Kikumoto & Mayr, 2022). In the current study, by using the same set of RSA models, we attempted to extend the role of conjunctive representations for planning and prioritization of future actions. As in the previous studies (and as noted by the reviewer), the conjunction model makes a unique prediction of the similarity (or dissimilarity) pattern of the decoder outputs: a specific instance of action that is distinct from others actions. This contrasts to other RSA models of low-level features that predict similar patterns of activities for instances that share the same feature (e.g., S-R mappings 1 to 4 share the diagonal rule context). Here, we generally replicate the previous studies showing the unique trajectories of conjunctive representations (Figure 3) and their unique contribution on behavior (Figure 5).

      • One of the key findings of this study is the reliable representation of the conjunction information during the preparation phase while there is no comparable effect evident for response representations. This might suggest that two potentially independent conjunctive representations can be activated in working memory and thereby function as the basis for later response selection during the test phase. However, the assumption of the independence of the high and low priority conjunction representations relies only on the observation that there was no statistically reliable correlation between the high and low priority conjunctions in the preparation and test phases. This assumption is not valid because non-significant correlations do not allow any conclusion about the independence of the two processes. A comparable problem appeared regarding the non-significant difference between high and low-priority representations. These results show that it was not possible to prove a difference between these representations prior to the test phase based on the current approach, but they do not unequivocally "suggest that neither action plan was selectively prioritized".

      We appreciate this important point. We have taken care in the revision to state that we find evidence of an interference effect for the high-priority action and do not find evidence for such an effect from the low-priority action. Thus, we do not intend to conclude that no such effect could exist. Further, although it is not our intention to draw a strong conclusion from the null effect (i.e., no correlations), we performed an exploratory analysis where we tested the correlation in trials where we observed strong evidence of both conjunctions. Specifically, we binned trials into half within each time point and individual subject and performed the multi-level model analysis using trials where both high and low priority conjunctions were above their medians. Thus, we selected trials in such a way that they are independent of the effect we are testing. The figure below shows the coefficient of associated with low-priority conjunction predicting high-priority conjunction (uncorrected). Even when we focus on trials where both conjunctions are detected (i.e., a high signal-to-noise ratio), we observed no tradeoff. Again, we cannot draw strong conclusions based on the null result of this exploratory analysis. Yet, we can rule out some causes of no correlation between high and low priority conjunctions such as the poor signal-to-noise ratio of the low priority conjunctions. We have further clarified this point in the result (pg. 14).

      Fig. 1. Trial-to-trial variability between high and low priority conjunctions, using above median trials. The coefficients of the multilevel regression model predicting the variability in trial-to-trial highpriority conjunction by low-priority conjunction.

      • The experimental design used does not allow for a clear statement about whether pure motor representations in working memory only emerge with the definition of the response to be executed (test phase). It is not evident from Figure 3 that the increase in the RSA scores strictly follows the onset of the Go stimulus. It is also conceivable that the emergence of a pure motor representation requires a longer processing time. This could only be investigated through temporally varying preparation phases.

      We agree with the reviewer. Although we detected no evidence of response representations of both high and low priority action plans during the preparation phase, t(1,23) = -.514, beta = .002, 95% CI [-.010 .006] for high priority; t(1,23) = -1.57, beta = -.008, 95% CI [-.017 .002] for low priority, this may be limited by the relatively short duration of the delay period (750 ms) in this study. However, in our previous studies using a similar paradigm without a delay period (Kikumoto & Mayr, 2020; Kikumoto & Mayr, 2022), response representations were detected less than 300ms after the response was specified, which corresponds to the onset of delay period in this study. Further, participants in the current study were encouraged to prepare responses as early as possible, using adaptive response deadlines and performance-based incentives. Thus, we know of no reason why responses would take longer to prepare in the present study. But we agree that we can’t rule this out. We have added the caveat noted above, as well as this additional context in the discussion (pg. 16-17).

      • Inconsistency of statistical approaches: In the methods section, the authors state that they used a cluster-forming threshold and a cluster-significance threshold of p < 0.05. In the results section (Figure 4) a cluster p-value of 0.01 is introduced. Although this concerns different analyses, varying threshold values appear as if they were chosen in favor of significant results. The authors should either proceed consistently here or give very good reasons for varying thresholds.

      We thank the reviewer for noting this oversight. All reported significant clusters with cluster P-value were identified using a cluster-forming threshold, p < .05. We fixed the description accordingly.

      • Interpretation of results: The significant time window for the high vs. low priority by test-type interaction appeared quite late for the conjunction representation. First, it does not seem reasonable that such an effect appears in a time window overlapping with the motor responses. But more importantly, why should it appear after the respective interaction for the response representation? When keeping in mind that these results are based on a combination of time-frequency analysis, decoding, and RSA (quite many processing steps), I find it hard to really see a consistent pattern in these results that allows for a conclusion about how higher-level conjunctive and motor representations are selected in working memory.

      Thank you for raising this important point. First, we fixed reported methodological inconsistencies such as the cluster P-value and cluster-forming threshold). Further, we fully agree that the difference in the time course for the response and conjunctive representations in the low priority, tested condition is unexpected and would complicate the perspective that the conjunctive representation contributes to efficient response selection. However, additional analysis indicates that this apparent pattern in the stimulus locked result is misleading and there is a more parsimonious explanation. First, we wish to caution that the data are relatively noisy and likely are influenced by different frequency bands for different features. Thus, fine-grained temporal differences should be interpreted with caution in the absence of positive statistical evidence of an interaction over time. Indeed, though Figure 4 in the original submission shows a quantitative difference in timing of the interaction effect (priority by test type) across conjunctive representation and response representation, the direct test of this four way interaction [priority x test type x representation type (conjunction vs. response), x time interval (1500 ms to 1850 ms vs. 1850 to 2100 ms)] is not significant, t(1,23) = 1.65, beta = .058, 95% CI [-.012 .015]). The same analysis using response-aligned data is also not significant, t(1,23) = -1.24, beta = -.046, 95% CI [-.128 .028]). These observations were not dependent on the choice of time interval, as other time intervals were also not significant. Therefore, we do not have strong evidence that this is a true timing difference between these conditions and believe this is likely driven by noise.

      Further, we believe the apparent late emergence of difference in two conjunctions when the low priority action is tested is more likely due to a slow decline in the strength of the untested high priority conjunction rather than a late emergence of the low priority conjunction. This pattern is clearer when the traces are aligned to the response. The tested low priority conjunction emerges early and is sustained when it is the tested action and declines when it is untested (-226 ms to 86 ms relative to the response onset, cluster-forming threshold, p < .05). These changes eventually resulted in a significant difference in strength between the tested versus untested low priority conjunctions just prior to the commission of the response (Figure 4 - figure supplement 1, the panel on right column of the middle row, the black bars at the top of panel). Importantly, the high priority conjunction also remains active in its untested condition and declines later than the untested low priority conjunction does. Indeed, the untested high priority conjunction does not decline significantly relative to trials when it is tested until after the response is emitted (Figure 4 - figure supplement 1, the panel on right column of the middle row, the red bars at the top of panel). This results in a late emerging interaction effect of the priority and test type, but this is not due to a late emerging low priority conjunctive representation.

      In summary, we do not have statistical evidence of a time by effect interaction that allows us to draw strong inferences about timing. Nonetheless, even the patterns we observe are inconsistent with a late emerging low priority conjunctive representation. And if anything, they support a late decline in the untested high priority conjunctive representation. This pattern of the result of the high priority conjunction being sustained until late, even when it is untested, is also notable in light of our observation that the strength of the high priority conjunctive representation interferes behavior when the low priority item is tested, but not vice versa. We now address this point about the timing directly in the results (pg. 15-16) and the discussion (pg. 21), and we include the response locked results in the main text along with the stimulus locked result including exploratory analyses reported here.

      Reviewer #3 (Public Review):

      This study aims to address the important question of whether working memory can hold multiple conjunctive task representations. The authors combined a retro-cue working memory paradigm with their previous task design that cleverly constructed multiple conjunctive tasks with the same set of stimuli, rules, and responses. They used advanced EEG analytical skills to provide the temporal dynamics of concurrent working memory representation of multiple task representations and task features (e.g., stimulus and responses) and how their representation strength changes as a function of priority and task relevance. The results generally support the authors' conclusion that multiple task representations can be simultaneously manipulated in working memory.

      We appreciate these helpful comments, and were pleased that the reviewer shares our view that these results may be broadly impactful.

    1. Author Response

      Reviewer #2 (Public Review):

      Zuber, et al. report structural and thermodynamic properties of 6 domains from the NusG superfamily of transcription factors, conserved in all kingdoms of life. This superfamily is characterized by an N-terminal NGN domain that binds RNA polymerase, affecting its activity. NGN domains are covalently linked to C-terminal domains (CTDs) that typically assume a single completely beta-sheet (KOW) fold. Recent work has shown, however, that one such domain, from E. coli RfaH, can switch from a completely alpha-helical fold into the all-beta-sheet KOW fold. Here, the authors identify a second fold-switching member of the NusG superfamily and investigate the physical basis of the dramatic switching transition by comparing thermodynamic and structural properties of fold-switching and single-folding CTDs.

      Strengths:

      To my knowledge, this is the first in-depth thermodynamic analysis of fold switching in the NusG protein family. One striking result is the stability difference between E. coli NusG (single-folding) and E. coli RfaH (fold-switching). It can be difficult to compare stabilities across organisms since their environments differ. For example, a fold-switching domain from a thermophile and single-folding domain from a mesophile might have similar stabilities. Clearer stability differences can be seen by comparing variants from the same species, which the authors show.

      The NMR experiments showing minor species in both fold-switching CTDs and one single-folding CTD suggest that the unfolded state plays an important role in fold switching. The 13C-alpha CEST experiments showing that the minor species E. coli RfaH CTD has helical character hints at a mechanism for how the RfaH CTD is poised to assume two different folds.

      Weaknesses:

      The thermodynamic and structural properties one single-fold domain (hSpt5-KOW) do not differ appreciably from a fold-switching domain, suggesting an incomplete mechanistic explanation of fold switching. Specifically, both the thermostabilities and the folding free energies of hSpt5-KOW (single-folding) and VcRfaH-KOW (fold-switching) were comparable. Furthermore, their 15N shift differences from CEST experiments (Figure 5 supplement 1B&C) appear similar. Thus, it is possible that the minor species of hSpt5-KOW has helical character like Ec- and VcRfaH. Furthermore, the secondary structure predictions showing hSpt5-KOW has largely beta-sheet propensities are suspect because the secondary structure predictions of MtNusG-KOW (single-folding) are inaccurate-they show helical propensities comparable to Ec- and VcRfaH (fold-switching, Figure 5 supplement 3). These propensities are not experimentally supported for MtNusG-KOW, indicating that predicted secondary structures are not always reliable.

      It is not clear why the authors state that the minor species of EcRfaH-KOW is in exchange between helical and completely unfolded conformations. The chemical shift differences in Figure 6A appear comparable, indicating one population.

      We agree that the chemical shift differences are similar (for both 15N and 13C). However, the increased 15N R2 values of the minor species indicate further exchange processes and together with the NMR-based chemical denaturation experiments our interpretation of this finding is that the minor species is an ensemble of largely unfolded species, some states of which are completely unfolded and some of which exhibit helical elements in regions 1 and 2. A detailed explanation is given in our reply to “Essential revisions #3”) and the manuscript has been modified to make clear which conclusions are experimentally proven and where we hypothesize.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well-done analysis using the very robust Swedish national population registry.

      The study strengths include large size, prolonged follow-up, and use of two comparison populations.

      Thank you for the encouraging comments on our study.

      The main limitations which need to be addressed by the authors are accounting for reverse causality, namely if a psychiatric illness (PI) developed before or about the same time as the CVD. The much steeper risk relationships early after a CVD event are so suggestive. Some further analyses to tease out those with clearly NO PI before CVD would be in order.

      Thank you for the comment. Previous studies have consistently reported an association between psychiatric disorders and CVD [1,2], thus, we agree that reverse causality may, in principle, explain some of the observed results indicating a rise in incident psychiatric disorders after incident CVD, particularly during the immediate period. Yet, it is reasonable to assume that a diagnosis of a lifethreatening disease, such as CVD, is in many cases a traumatic experience resulting in an immediate rise in risks of psychiatric disorders. Others have reported such associations e.g. after natural disasters and we have indeed observed such a pattern in our previous work, e.g., after cancer diagnosis [3]. However, we agree that reverse causality cannot be excluded and may partly contribute to the highly increased risk of psychiatric disorder immediately after CVD diagnosis. Indeed, some of these patients may have been attended for their psychiatric disorders in primary care before the incident CVD. As the Patient Register only captures in- and outpatient hospital care, we have conducted an additional analysis, also excluding individuals with previous prescriptions of psychotropic drugs (ATC codes: N05, N06) before their incident CVD – thereby adding a detection of patients with prevalent mental health problems attended by primary care. The results show similar point estimates (Supplementary Appendix Table S5, listed also as below) thus not supporting the notion that reverse causality is a major concern. Furthermore, the association is noted up to 28 years after CVD diagnosis, which is unlikely due to reverse causality.

      We have now added our motivation for this additional analysis on the Method (Page 9), as below. “Because the Swedish Patient Register includes only information related to specialist care, we might have misclassified patients with a history of milder psychiatric disorders diagnosed before index date attended only in primary care. To account for the reverse causality of having undetected psychiatric disorders or symptoms before the incident CVD, we performed a sensitivity analysis additionally excluding study participants with prescribed use of psychotropic drugs before the index date (ascertained from the Swedish Prescribed Drug Register including information on all prescribed medication use in Sweden since July 2005), and followed the remaining participants from 2006 to 2016.”

      Second, for the robust matched cohort design, the authors age and sex matched each patient with 10 individuals from the general population and then also stratified their model by the matching variables. Could adjusting for matched factors in such cohort studies re-introduce bias into these estimates?

      Thank you for the comment. Adjusting for matching factors should provide estimates with the same validity as using a stratified model. In our study, we matched individuals diagnosed with a CVD with their unaffected full siblings as well as 10 randomly selected, unexposed individuals, on the same age and sex, without such diagnosis. As controlling for matching variables is recommended when there are additional confounders [1,2], we used a stratified Cox model commonly applied in family-based studies [3,4].

      References:

      1.Sjölander A, Greenland S. Ignoring the matching variables in cohort studies - when is it valid and why? Stat Med. 2013 Nov 30;32(27):4696-708.<br /> 2.Mansournia MA, Hernán MA, Greenland S. Matched designs and causal diagrams. Int J Epidemiol. 2013 Jun;42(3):860-9.<br /> 3.D'Onofrio BM, Lahey BB, Turkheimer E, Lichtenstein P. Critical need for family-based, quasiexperimental designs in integrating genetic and social science research. Am J Public Health. 2013 Oct;103 Suppl 1(Suppl 1):S46-55.<br /> 4.Song, H., Fang, F., Arnberg, F. K., Mataix-Cols, D., de la Cruz, L. F., Almqvist, C., ... & Valdimarsdóttir, U. A. (2019). Stress related disorders and risk of cardiovascular disease: population based, sibling controlled cohort study. bmj, 365.

      Third, the range of PIs associated with CVD is a lot broader than would be expected or unexpected (eg eating disorders!).

      Thank you for the comment. We agree with the reviewer that the strong association between CVD and incident eating disorders is somewhat surprising although the link between cardiovascular risk factors (e.g. obesity) and binge eating have indeed been reported [1,2]. We have now performed the analysis on the association between first-onset CVD and following incident eating disorder, additionally excluding individuals with a history of psychotropic medication use. We found that the associations became even stronger after this exclusion (Supplementary table 5). It is possible that individuals suffering their first CVD indeed drastically alter their lifestyle, in some cases resulting in dysfunctional eating and may therefore be vulnerable to eating disorders. Given that the evidence assessing the risk of eating disorder among CVD patients is still limited, our study adds a valuable piece of knowledge on this regard and calls for further investigations to better understand this association.

      References:

      1.Mitchell JE. Medical comorbidity and medical complications associated with binge-eating disorder. Int J Eat Disord. 2016 Mar;49(3):319-23.<br /> 2.Bulik CM, Sullivan PF, Kendler KS. Medical and psychiatric morbidity in obese women with and without binge eating. Int J Eat Disord 2002;32:72–78.

      Lastly, the authors should try to account for secular changes in smoking and alcohol consumption or BMI over the study period. In particular, while Sweden never had very high smoking rates (due to Snus) alcohol use within specific cohorts might have both affected CVD risk (particularly stroke) and PI risk. Examining trends in for example liver cirrhosis over the study time period might help or use sales/consumption data. The authors do recognize a limitation in being unable to adjust for smoking, alcohol, and adiposity.

      Some additional analyses to address these points and some more caution in the discussion are required.

      Thank you for the comment. As the reviewer points out, we do recognize the potential unmeasured influence of lifestyle factors (e.g. smoking and alcohol consumption) on the studied associations as these data are not collected in the Swedish registries. However, the associations between CVD and psychiatric disorders were quite stable across calendar time, although somewhat stronger by the end of the observation period. The evidence does not suggest a drastic change in lifestyle factors in Sweden during the latter part of the observation period except for a slight increase in alcohol consumption [1,2] and liver cirrhosis [3]. Although we find it implausible that such underlying secular trends in lifestyle are a major contributor in the reported associations, we have now conducted additional analyses, excluding individuals with alcoholic cirrhosis of liver (ICD-10 code: K70.3) or COPD (chronic obstructive pulmonary disease, ICD-10 code: J44) as a proxy for heavy drinking or smoking. The results remained virtually unchanged.

      We have now added reasons for stratified analysis by calendar years in Method (Pages 8-9), and as below:

      “We performed subgroup analyses by sex, age at index date (<50, 50-60, or >60 years), age at follow-up (<60 or ≥60 years), history of somatic diseases (no or yes), and family history of psychiatric disorder (no or yes). We also performed subgroup analysis by calendar year at index date (1987-1996, 1997-2006, or 2007-2016) to check for potentially different associations over time (i.e., due to lifestyle factors that changed over time, including smoking and alcohol use).”

      We found somewhat higher risk of psychiatric disorder observed in recent calendar years than earlier years (as in shown Supplementary Table S3).

      We found similar associations between first-onset CVD and incident psychiatric disorder with and without excluding individuals with a history of alcoholic cirrhosis of liver or COPD, used as a proxy for heavy drinking or smoking. The table has now added as Supplementary Table S8, and also shown as below).

      We have now added justifications in Method (Page 10) and in Discussion (Page 21), and as below: In method, Page 10:

      “To account for potential impact of unmeasured confounding due to lifestyle factors, we performed a sensitivity analysis excluding individuals with a history of alcoholic cirrhosis of liver (ICD-10 code K703) or chronic obstructive pulmonary disease (COPD, ICD-10 code J44), as proxies for heavy drinking or smoking.”

      In Discussion (Page 21):<br /> “although we found similar results with and without excluding individuals with a history of liver cirrhosis or COPD, as proxies for heavy drinking or smoking (Supplementary Table S8). We did not have direct access to hazardous behaviors that could potentially modify this association, and therefore cannot exclude the possibility of residual confounding not fully controlled for in the sibling comparison.”

      References:

      1.Statista. https://www.statista.com/statistics/693505/per-capita-consumption-of-alcohol-in-thenordic-countries/. Retrieved on 19 Aug.<br /> 2.Alcohol and Drug Report. Nordic Baltic Region. https://www.nordicalcohol.org/swedenconsumption-trends. Retrieved on 19 Aug. 3.Gunnarsdottir SA, Olsson R, Olafsson S, Cariglia N, Westin J, Thjódleifsson B, Björnsson E. Liver ;cirrhosis in Iceland and Sweden: incidence, aetiology and outcomes. Scandinavian journal of gastroenterology. 2009 Jan 1;44(8):984-93.

      Reviewer #2 (Public Review):

      Shen et. al investigated the associations between CVD and subsequent risk of psychiatric disorders using a prospective study design. The authors also performed subgroup analysis by sex, age at cohort entry and at follow-up, calendar year, history of somatic diseases, family history of psychiatric disease, and finally assessed the potential role of psychiatric comorbidity in cardiovascular mortality in CVD patients. The main takeaway of the analyses are the increased risk of psychiatric disorders in CVD patients compared to the different comparison groups.

      Though the study uses nationwide registers in a prospective study design setting, there are some methodological flaws with respect to study design.

      For assessing the primary aim the authors chose a rather unusual starting point by preselecting the exposure (CVD) group, rather than depicting the nationwide cohort of the general population followed up for a disease outcome with each category having exposed and unexposed individuals. Assuming that the population comparison group comes from the same study population as CVD patients, it is not clear why a similar strategy of study design as those cited in the manuscript (Zhang et. al 2015, Kivimäki et. al 2012, Godin et. al, 2012) was not followed. Similarly, one would expect sibling comparison group w.r.t outcome (psychiatric disorders) and not for exposure (CVD).

      Thank you for the comment. As correctly pointed out by the reviewer, we used a matched cohort design, both in the population- and sibling comparison. We firstly identified a nationwide cohort of general population who were born after 1932 and were residing in Sweden 1987-2016. We then identified all exposed individuals with first-ever diagnosis of CVD and matched population controls from this same nationwide population.

      A matched cohort design is applied here due to the strong confounding effects of some variables, e.g., age and sex, on the studied association between CVD and risk of psychiatric disorder. Exact matching on age and sex in our study makes the exposed and unexposed groups comparable and relief the confounding effects from matching factors in the design phase. Another practical viewpoint for why we use a matched cohort is a straightforward understanding of the comparison between exposed and unexposed groups being always at the same time, providing measures (such as risks and rates) during the follow-up period that are easily interpreted. Further, we have used this matched cohort design in many of our previous works [1,2] to maintain an identical design in both sibling and population comparison, so that the point estimates can be directly compared. The matched cohort design generates results of equal validity of the more conventional cohort design suggested by the reviewer [3] but has the additional quality of making the results from the various cohorts (here: population- and sibling comparison) more comparable. Our study therefore takes advantage of using a siblingcontrolled matched cohort, which is indeed a cohort design recommended for family-based studies [4] and provides results with similar validity as a full cohort.

      We have now added a sentence and a reference in Method to motivate the use of matched cohort design (Page 7).

      “We constructed a sibling-controlled matched cohort to control for the familial confounding according to guidelines for designing family-based studies.24”

      We have now updated the flowchart to add a box in the top reflecting the source population where both groups were identified from, shown in Supplementary Figure S1.

      References:

      1.Song H, Fang F, Arnberg FK, Mataix-Cols D, Fernández de la Cruz L, Almqvist C, Fall K, Lichtenstein P, Thorgeirsson G, Valdimarsdóttir UA. Stress related disorders and risk of cardiovascular disease: population based, sibling controlled cohort study. BMJ. 2019 Apr 10;365:l1255.<br /> 2.Song H, Fang F, Tomasson G, Arnberg FK, Mataix-Cols D, Fernández de la Cruz L, Almqvist C, Fall K, Valdimarsdóttir UA. Association of Stress-Related Disorders With Subsequent Autoimmune Disease. JAMA. 2018 Jun 19;319(23):2388-2400.<br /> 3.Sjölander A, Greenland S. Ignoring the matching variables in cohort studies–when is it valid and why?. Statistics in medicine. 2013 Nov 30;32(27):4696-708. 4.D'Onofrio BM, Lahey BB, Turkheimer E, Lichtenstein P. Critical need for family-based, quasiexperimental designs in integrating genetic and social science research. Am J Public Health. 2013 Oct;103 Suppl 1(Suppl 1):S46-55.

      Reviewer #3 (Public Review):

      Shen et al. investigated the relationship between the diagnosis of cardiovascular disease (CVD) and subsequent diagnosis of psychiatric disorders using national databases and health records over a 30year period in Sweden. They also investigated the association between the diagnosis of psychiatric disorder and subsequent CVD-related mortality. Comparisons were made between participants diagnosed with CVD and siblings without CVD, and between the CVD participants and random age- and sex-matched controls from the general population.

      They show that diagnosis of all types of CVD investigated was associated with increased risk of all types of psychiatric disorders considered, both in comparison to non-CVD siblings and general population controls. They also showed that diagnosis of psychiatric diagnosis subsequent to CVD diagnosis was associated with greater CVD-related mortality.

      A key strength of this study is the use of national databases and populations, as it has allowed for sufficiently large numbers for important subgroup analyses investigating specific types of CVD and psychiatric disorders. In addition to disease and disorder subtypes, the authors have investigated many other factors that may be important for understanding these relationships, including time of diagnosis during follow-up, year of diagnosis, age of participant, and various comorbidities. The duration of follow-up is another important strength of this study, as is the use of sibling controls to mitigate the potential confounding effect of genetic and early-life environment.

      However, while it is acknowledged as a limitation by authors, the lack of lifestyle data is a notable weakness of the study. The authors allude to causal inference in the abstract and discuss controlling for important confounding factors, but this is somewhat undermined by not being able to account for lifestyle factors, particularly since there are shared biological pathways such as inflammation linked to both CVD and many psychiatric disorders. As such, the associations reported in this study are potentially influenced substantially by unmeasured confounding related to lifestyle factors.

      Overall, this is important data, and the conclusions around these findings supporting surveillance of psychiatric disorders in individuals diagnosed with CVD due to its association with increased risk of mortality may be of interest to clinical settings.

      Thank you for the very positive comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper of De Agro et al. proposes a new paradigm to measure wanting (binary choices) and liking (pheromone deposition) in ants in order to test bundling and segregation effects on reward processing.

      By using three different treatments: A) rewards (sugar drops) and costs (runway segments) are segregated; B) rewards are segregated and costs bundled; C) rewards and costs are bundled, the authors observed that the main predictor of pheromone release was the segregation of the runaway segments rather than segregation of the reward. Furthermore, no effect of treatment was observed on preferences for the odor associated with the treatment.

      The authors interpret their finding as a clear demonstration of segregation effects on liking, but not wanting, which was present only for costs but not rewards.

      Strengths: I appreciated the creativity and effort in conducting complex experiments and measurements in insects. Overall, the paper is the first of its kind to propose a method to test reward processing in insects. The design is well thought and the results are straightforward. The analyses seem to be appropriate.

      Weaknesses: My main concern relates to the interpretation of the pheromone release as an index of liking. I am not an expert in the field, but I would probably go for a more parsimonious explanation: the effect could be simply due to the quantity of liquid ingested (and therefore corresponding caloric intake). Did you check whether, in the conditions showing the biggest pheromone release, the ants consumed the biggest quantity?

      First, this could explain for example the puzzling difference observed in the 3 cohorts and the sequence effects.

      Second, a reduced overall caloric intake could also explain why segregated costs seem to drive the results. Digestive processes are possibly kicking in at different times in the segregate all conditions compared to the other two, due to the more time-delayed ingestion of food (i.e. we tend to eat less if we have longer time between meals).

      Finally, this account may also explain the reported difference between wanting and liking, as here the release of pheromone is simply the byproduct of how much sugar has been ingested (and possibly nothing to do with reward processes).

      If pheromones are released proportionally to sugar intake and if sugar intake was different between conditions, is an important point that should be clarified in the manuscript, in order to guarantee interpretability of the results.

      We understand the reviewer’s position, and agree with the need of reserving “high-level” processes as explanation to situations that have no alternative, more parsimonious ones. Indeed, we cannot be certain of what is happening in the ant’s mind, and if its hedonistic experience is indeed separate to its memory evaluation process. To this end, we propose a mechanism that can explain this difference in terms of the memories formed for food quality and path length.

      We have now reduced our claims on the Liking vs Wanting framework. Regarding the origin of pheromone deposition being linked to caloric consumption, we believe this is not the case.

      Reviewer #2 (Public Review):

      Only a few decades ago ants were considered little machines without learning capabilities or personality. Ever since then, we have been able to attribute more and more personality to them. In this study, De Agro et al. have been able to use psychological tricks to manipulate the decision-making process of an ant species. By bundling or segregating costs (distance) and gains (food) they were able to demonstrate that ants, just as humans, experience gains and costs (in most cases) on a logarithmic scale. Moreover, they suggest a quantitative way to disentangle "wanting" and "liking" in ants, allowing for further interesting scientific designs to test theories long applied by behavioral economists on humans.

      The strength of this study clearly lies in the simplicity of its design and its strong foundation on current theories and models. It is clearly written and easily followed even by a non-specialist reader.

      I also particularly liked the exhaustive discussion and the interdisciplinary links it proposes. Including (but not limited to) the potential ecological implications in plant-pollinator interactions, with flowering plants potentially abusing segregation of flower rewards to manipulate the pollinator.

      The weaknesses:

      The statistics seem to lack any control for random factors like individual ant or colony of origin. While the results are quite clear and will likely not change with these additions they could add a little bit more resolution in some cases or help explain certain trends better. Especially since apparently a result with a highly significant p-value of 0.0036 is considered a false positive due to a lack of rational explanations. Individual experience, age, or fitness/size of the colony of origin could all affect the decision-making processes in individuals and should be controlled for (and discussed).

      We thank the reviewer for the comment. All the models we used in the analysis did include a random factor, always specified as ants nested into colony of origin, as appreciable in the R full script available in ESM2. Indeed, we failed to mention it in the paper and discuss it, which the reviewer is correct in requiring. We now added this specification in L261-262 and L271-274.

      Moreover, based also on the comment of another reviewer, we reconsidered our random-effects structure. See point 5 for the full explanation.

      In line with my previous comment, I would also have liked to see a bit more data on individual variation to better appreciate cross-condition comparisons. For instance, the fact that in Figure 4 ants that experienced the "segregated all" effect laid overall more pheromones than the ones that experience "bundled" first is barely acknowledged in the manuscript. These kinds of variations in pheromone deposition rate (not just relative, but in absolute numbers) need to be better discussed.

      Yes, there do indeed seem to be interesting patterns in figure 4, and spent many hours exploring the data in great depth, looking into visit-level pheromone depositions (see updated supplementary figure), to try to understand the patterns we see. We then discussed in detail how to present our findings. The main finding to explain is in condition 3: the overall pheromone deposited in the “good (in this case segregated all) encountered first” is lower, whereas it was higher for all the others conditions.

      We did develop an explanation for the pattern of findings (see below). However, we freely admit to being unsatisfied with it – the explanation is ad-hoc, and there are no strong biological or psychological reasons for it to be true, apart from fitting the data. Ultimately, we decided not to discuss these patterns very extensively, since we felt it added greatly to the length and complexity of the discussion, while not adding a lot of biological insight. Nonetheless, we crafted a manuscript-ready addition outlining our current best guess (and it is a guess) explaining the patterns in absolute pheromone deposition level. The text would be added directly to the end of the paragraph ending at line 428. If the editor and reviewer agree that this is a worthwhile addition, we would be happy to add it.

      Note that there are no significant differences between any of the groups or interactions in condition 1, although a purely visual inspection of the figure might suggest one.

      "In addition, the absolute amount of pheromone deposited (independently from the currently experienced option) varies depending on the first encounter treatment. This effect is puzzling, as it seems to cause a reverse in the absolute pheromone deposition in conditions 2 and 3, where the lower amount is in correspondence with the “bad” option being encountered first in the one, but the “good” option in the other. Indeed, we do not have a fully satisfying explanation for it. Our best guess is that it may be due to an inertia effect in pheromone deposition: when low to no pheromone is deposited for 2 or more visits in a row (the first visit being always low, and the second being of a non-preferred treatment), the ants never subsequently raise their pheromone deposition. This pattern is visible in Condition 3, as both option available are of low value, while in Condition 2, the Reward Segregated option is marginally better, and as such at least some pheromone is deposited even in visit 1 . We provided a visit-by-visit analysis and graphs in the supplement S2. Similar patterns have been reported for this species (Beckers et al., 1993)."

      I would also have liked to see a graph or results focusing on the pheromone deposition rate ONLY at the first experience trial, rather than always in combination with the subsequent trials.

      We would too! Again, we extensively discussed including one, as the first visit is very valuable – it is the only visit in which we can exclude contrast effects, making the results in principle much easier to interpret.

      However, the problem is that these ants deposit a lot less pheromone on their very first visit. This makes biological sense – they may be lost, and don’t know how reliable the food source is yet (see e.g. Czaczkes et al., 2013 figure 5A, where this pattern can be clearly seen). The same is true in the current dataset – which is why data from visit 1 is excluded from the figure (we repeat the analysis with and without visit 1, and find no differences in the results – see supplement).

      As a visual demonstration, we provide two (ugly, sorry) figures below: the pheromone deposition per treatment for only visit 1, and for visits 2-8. Note the massive zero inflation in visit 1.

      Pleasingly, the broad pattern (considering the mean) in just visit 1 follows our expectations. However, any reasonable statistical test on data from just the first visit would find no significant difference.

      In addition, even though the study focuses strongly on differences between "wanting" and "liking" it barely touches upon the data looking at "wanting". A graphical illustration of the Y-maze experiment and the binomial decision would have helped appreciate this result better (even if it is non-significant).

      We thank the reviewer for the comment. We generally try not to overburden our manuscripts with figures, as we aim to maintain the message of the paper focused on what we believe to be the most important finding. For this reason, we believe that a figure for the binomial response would be somewhat wasted, as all it would show are 3 points for each of the 3 conditions, all around 0.5 probability of choosing the predicted option. Below is an example of what such a figure would look like:

      On the other hand, we agree that a graphical illustration of the Y-maze may be of use. We now added Figure 2, showing both the Y maze and the pheromone deposition behavior, as the two main behaviours recorded.

      I also believe that the authors are overstating their claims of showing for the first time that ants prefer closer food sources. The cost of distance has already been demonstrated indirectly in Frank & Linsenmair 2017: "Individual versus collective decision making: optimal foraging in the group-hunting termite specialist Megaponera analis" for instance. While the current study does more directly imply the preference for closer food in a controlled experimental design I would argue that there is sufficient knowledge with indirect observations in natural settings, making the claim of showing it here for the first time unnecessarily hyperbolic.

      We agree with the reviewer. We have now added a reference to Frank and Linsenmair 2017 and weakened our claim. L534-535

      While the results of this study are novel and very interesting to a broad readership, I would suggest including in the discussion and introduction also a newer study on "food wanting is mediated by transient activation of dopaminergic signaling in the honey bee brain" by Huang et al. 2022 in Science and also recommend the accompanying perspective article by Garcia and Dyer on "Why do animals want what they like?".

      Thank you for the comment. We are aware of this new paper but we could not reference it in the earlier version of this manuscript as our submission to eLife happened on the 10th of April, prior to the publication of Huang et al. Considering the suggestions of the other reviewers, we have now reduced our claims about the liking vs wanting framework (see point 1). The reference has now been added in L518-532.

      Reviewer #3 (Public Review):

      This work aims at testing hypotheses derived from the field of behavioral economics (Kahneman's theories), related to subjective value perception in ants foraging for food. The work was conceived to test how ants react to a specific feature which is the segregation or the bundling of food resources. Behavioral economics posits that individuals value more segregated resources than the same amount of resources presented in a bundled way. At the same time, if accessing the segregated resources implies an increase in energetic costs to access them (i.e. longer displacements), then costs would be also perceived as higher in the segregated-resource case than in the bundled-resource case.

      Whether ants conform or not to this model is an interesting question, and irrespective of the results obtained, the experiments presented by the authors have been conceived to address this model as the experimental parameters varied refer to resource separation (drops of sucrose solution with different degrees of spacing between them) and to walking distances.

      Yet, the manuscript suffers from various serious deficits that preclude being enthusiastic with respect to its present form. Various problems are listed below, which reduce the quality of this work. Hopefully, the authors can amend some of these problems to reach a more consistent version.

      1. The inconsistent and unjustified "wrapping" with a "wanting vs liking" framework

      While it is unquestionable that the question raised by the authors revolves around behavioral-economic hypotheses on value perception and is fully addressed by the experiments performed, the "extra wrapping" of the "wanting/liking" framework added, probably to make the manuscript more attractive, is unjustified and excessively speculative. The use of a "wanting vs liking" interpretation framework is inappropriate as neither the experiments were conceived to address this topic, nor the results allow any robust conclusion on this point. These concepts originate in neuroscience analyses of neural-circuit activation in the mammalian brain upon situations that allow distinguishing several components related to reward: 1) the hedonic effect of pleasure itself (liking); 2) motivation to obtain the reward (wanting or incentive salience); and 3) and reward-related learning(1-3). These components refer to different identified neural circuits and brain areas as wanting for reward is generated by a large and distributed dopaminergic brain system including the frontal cortex, while liking is generated by a smaller set of hedonic hot spots within limbic circuitry and which are not dopamine-dependent.

      Clearly, the use of the wanting vs liking terminology requires accuracy and appropriate studies to support it. This is not the case in the present manuscript which was not conceived to tackle this issue. Moreover, inconsistent testing procedures (see below point 3) undermine the use and interpretation of choice data as wanting. The authors have no proof of the involvement of wanting vs. liking systems in their design and even more, cannot disentangle between these components based on their behavioral data. Considering that pheromone deposits after food experience express "liking" can be questioned as it does not dissociate between individual liking and social information transfer (the liking and wanting systems are individually based systems). Moreover, the assignment of a choice in a binary-choice test to a wanting system is also questionable as the experiments cannot disentangle between the eventual individual wanting and the reward-related learning as animals are making choices based on odorant cues they have learned during their previous foraging bouts. In the absence of neurobiological data, the hypotheses of wanting vs. liking remain on a shaky, highly speculative ground.

      Thus, the whole "wanting vs liking interpretation" (which attains alarming speculative levels in the Discussion section) should be omitted entirely from the manuscript if the authors want to provide a solid convincing framework articulated exclusively around the bundling vs. the segregation effects, which is precisely what their experiments tested. The rest is speculation in the absence of analyses supporting the wanting vs liking dissociation. An example of the kind of analysis necessary to go in this direction is provided by a recent work in which a dopamine-based wanting system was shown in honey bees(4), a work that the authors did not consider. We are clearly far from this kind of analysis in the present manuscript. As the authors wrote, "the present study is the first to examine bundling vs. segregation in an animal (line 99)", yet not liking vs. wanting.

      The reviewer makes a very well-argued case for this study not being sufficient evidence for distinct “wanting” and “liking” systems in an insect – a point echoed by the other reviewers. Their comments were helpful and insightful, and we fully agree with them. We have thus omitted the concepts of “wanting” and “liking” from the title, introduction, methods, and results.

      However, we feel that, especially given the results of Huang et al. (which were not published when we submitted the manuscript), the idea that the mismatch between the choice and pheromone data is driven by them acting on two separate systems reasonable: while it is not well supported, it is certainly consistent with this. The discussion seems to us to be the appropriate place to speculate about the meaning of results – especially results we do not fully understand. We would thus like to maintain a short discussion of this hypothesis in the discussion. Perhaps other researchers will be inspired to collect the necessary data to test whether such segregation effects really do affect “liking” but not “wanting” – something which is beyond the capabilities of our strictly behavioral lab.

      1. Some experimental assumptions are not substantiated by data

      The experimental procedure relies on separating or aggregating reward (drops of sucrose solution) and determining the impact of this variation on pheromone deposition while returning to the nest and subsequent choice in a dual test situation in which two of the three treatments designed - distinguished by the odorant experienced en route to reward - were presented. While the "Segregated All Treatment" (Fig. 2A) managed to space the 0.2 µl reward drops by significant 25-cm segments, thus enhancing potentially both reward appreciation (segregated food drops) and cost appreciation (successive segments to be negotiated), the "Segregate Reward Treatment" (Fig. 2B) raises doubts about its validity.

      In this case, three drops were offered at the end of three consecutive 25-cm segments, with the assumption that drops spaced by 5 mm should be perceived as being segregated (two of 0.2 µl and 1 ad libitum). Yet, there is no proof - at least in the manuscript - that spacing two food drops by 0.5 mm induces a segregated perception in ants. The first experience with the first drop may induce both sensitization and a local search that may last until the very close next drop is detected so that for the ant, these drops would be perceived as belonging to the same resource rather than being perceived as segregated resources. The same applies to the vicinity between the 0.2 µl drop and the ad libitum drop.

      This raises the question of the real volume of the ad libitum drop, which is not mentioned (it is just described as beings "large"; line 205). One could argue that if drops separated by 5 mm were bound together, the results would be similar to those of the "Bundled Treatment" (Fig. 2C). Strictly speaking, this is not necessarily true if the volume of the large drop was known. If this were the case, the Bundled Treatment offered a volume that was 0.4 µl smaller than the total food provided in the "Segregate Reward Treatment".

      Overall, further controls are needed to support the assumptions of the different treatments chosen.

      See detailed response to main concern 4 – “The segregation effect”. In brief: we agree that the current experiment cannot distinguish not sensing a difference between a big drop and three little drops from sensing a difference but not responding. However, the inclusion of the “segregated reward” treatment was only added to aid result interpretation in the event of reward segregation fully balancing out cost segregation. Since the response of ants to “all bundled” and “all segregated” treatments were different, the “segregated reward” treatment is in fact not needed to support our claim that segregation affects perceived value in these ants.

      1. Unclear design in the testing procedures

      The authors did not specify in the methods if a reward was provided in the tests in which a Y maze was presented to the ants having experienced a succession of short and long segments. This information was provided later, in the Results section (line 309) and, as expected, no reward was provided during the tests, thus raising the question of the necessity of the three consecutive tests, with no refreshment trials in between. This procedure is puzzling because it induces extinction of the odor-length association - as verified by the authors (see lines 306-309) - and makes the design questionable. Only the results of the very first test should be kept and analyzed in the manuscript.

      The same remark applies to the three tests performed after comparing the experimental treatments, which - one discovers only in the Results Section - were also performed in the absence of refreshment trials. In fact, the absence of coherence in the results of these tests (e.g. lines 328-332) could be precisely due to a change of strategy between the tests following the absence of reward in the first test. This underlines the necessity of focusing exclusively on the first test and dismissing the data of the 2nd and 3rd tests in which performance may have been affected by extinction and strategy change. This again shows why speaking about "wanting" in this inconsistent framework makes no sense at all.

      We thank the reviewer for the comment. Please see point 2 were we provide the full answer. We initially included the subsequent testing in our experimental design so as to gather as much information as possible. A change in preference linked to the absence of the reward is indeed expected. However, the rapidity and direction of change can give valuable information that would be lost if the data were to not be collected. We agree with the reviewer that in this specific experiment the data was not particularly useful, but we believe it would be wrong from us to just not report it. As we wrote above, please note that in ESM2 we report the choice probability of the first choice only, showing the same exact result as when all three choice are considered.

      Reviewer #4 (Public Review):

      The manuscript reports an experiment testing how the distribution of rewards and costs influences perceived reward value in ants. Using a bundling manipulation where rewards and costs were presented either in small separated amounts (segregated) or together in a larger amount (bundled), the results show that ants deposited a greater quantity of pheromones (which was used as an index of "liking") when rewards were segregated and costs bundled compared to when both rewards and costs were bundled (although that difference was statistically significant only in ants experiencing the segregated reward condition first during training) and when both rewards and costs were segregated. By contrast, no evidence was found for a bundling effect in terms of choice behaviour (which was used as an index of "wanting"). The authors suggest that these findings demonstrate a bundling effect and a dissociation between "wanting" and "liking" in ants.

      Overall, the experiment provides a worthy contribution to the study of the biases that affect the perceived value of rewards in a translational perspective from humans to invertebrate animals. The experimental manipulation is clever, and the results clearly indicate that manipulating bundling affected pheromone deposition in ants. However, the data reported do not appear to fully support the conclusions of an increased "liking" of the segregated rewards and bundled costs compared to bundled rewards and costs. In addition, more evidence (along with stronger justifications) would be needed to establish that choice behaviour and pheromone deposition are appropriate and sensitive measures of "wanting" and "liking", respectively. This aspect renders any claim of a dissociation between "wanting" and "liking" in ants somewhat premature and speculative at this stage. I describe these concerns in more detail below.

      1. The main hypothesis tested is that segregated rewards with bundled costs should be the most "liked" option relative to bundled rewards and costs and segregated rewards and costs. The results are interpreted as fully in line with this hypothesis. However, the data reported do not suggest this is the case: The difference between the 'segregated rewards' condition and the 'bundled' condition is not statistically significant when all ants are considered (that difference being statistically significant only for ants that first experienced the 'segregated rewards' condition during training). Although this point is briefly acknowledged in the discussion, more nuance and extra caution are needed in the overall interpretation of the findings, so that this statistically nonsignificant result does not appear as being treated as if it were statistically significant.

      We thank the reviewer for the comment. Indeed, our initial hypothesis was the one described here. However, the results of the segregated rewards vs bundled condition, being not significantly different, forced us to consider an alternative hypothesis. We believe that our current experiment managed only to bundle and segregate costs, not gains. Given this, we would expect segregated rewards vs bundled to perform at chance level, since in both the cost is equally bundled. As such, we are ultimately treating the result as non-significant. We are, however, clearly stating our initial hypothesis, and discussing how the data fits it, as we feel it would be dishonest of us to give the impression we had the second hypothesis from the start, or on the other hand to treat a p-value so near 0.05 as definitely random.

      1. An important requirement to adequately evaluate the findings from the choice behaviour test is to ensure that ants did learn the associations between the reward conditions and the runway scents. Ruling out potential learning confounds is in fact essential to interpret the results as reflecting the operation of motivational processes such as "wanting". Whereas the results from the pilot experiment suggest that ants learned the contingencies between the runway length and its associated scent, the pilot experiment and the main experiment differ in significant ways. Therefore, it is unclear whether the ants learned the contingencies in the main experiment, which could be advanced as an alternative explanation for the lack of preferences between the two scented arms of the Y-maze during the choice test. Another important aspect to consider is that the reward still has to be valued by the organism to appropriately assess "wanting" processes. Indeed, "wanting" is generally conceptualised as conjointly determined by the associative history between the cue or context (scent) and the reward (sucrose solution) on one hand, and the organism's homeostatic or physiological needs such as hunger on the other hand (e.g., Zhang et al., 2009. https://doi.org/10.1371/). In the main experiment, the question arises as to whether reward devaluation could have occurred-resulting in the reward having a diminished value as the ants were able to consume the sucrose solution to satiation multiple times across the experiment. For these reasons, it would be important to provide information showing that (a) the ants learned with which condition the scent was associated and (b) that the reward was still valued during the choice test. These points are key preconditions that need to be fulfilled for ruling out potential confounds that could explain the findings of the choice test as well as for suggesting a dissociation between "wanting" and "liking".

      We thank the reviewer for the comment. As stated in point 1, the “liking” vs “wanting” framework has been greatly reduced in the paper, only being raised as a possible explanation of the observed results, with the lack of learning being raised as a reasonable alternative explanation. We have reason to believe that the ants are actually learning the association presented, as we detail in point 2.1. Of course, we cannot be completely certain, as it is impossible to disentangle preference from learning in such experiments. As such, the possibility is mentioned in the manuscript.

      1. Relatedly, a strong justification needs to be formulated to substantiate that the choice test provides a reliable indicator of "wanting". This is critical to conclude that the results can be interpreted as reflecting a dissociation between "wanting" and "liking". In rodents and humans, "wanting" is typically measured as an increased effort mobilisation during the presentation of a cue associated with a reward (e.g., Pool et al., 2016. https://doi.org/10.1016/j.). It remains however unclear how choice can capture such effects. This questions the extent to which choice represents an adequate operationalisation and measure of "wanting" as described in the incentive salience hypothesis (Berridge & Robinson, 2016. https://doi.org/10.1037/). Moreover, it should be clearly explained and motivated whether, and if so how, choice purely measures "wanting" without being contaminated or influenced by liking-based processes, such as preferences or expected pleasantness for instance.

      We agree with the reviewer. Indeed, our linking of choice with “wanting” and pheromone with “liking” is highly speculative. According to point 1, we strongly reduced our claims and propose the association only as one of several potential explanations.

      1. Little information is provided on how pheromone deposition was measured and on the specificities of this measure, such as its physiological bases, timing properties, and granularity. However, detailed information about this measure is of high relevance to be able to assess if pheromone deposition represents a sensitive measure of "liking". "Liking" is typically measured as hedonic reactions during reward consumption across the rodent and human literature (e.g., Pool et al., 2016. https://doi.org/10.1016/j.). Accordingly, a good index of "liking" should be specifically responsive to reward consumption. By extension, an increased pheromone deposition should be particularly evident after the ants consumed the sucrose drop. As it stands, it is unclear whether this is the case as the pilot experiment showed no statistically significant difference in pheromone deposition between the way towards the sucrose drop or back. If the measure of pheromone deposition allows for distinguishing between pheromones deposited on the way towards the drop and pheromones deposited on the way back in the main experiment, a further test that could be run would be to compare the pheromone deposition on the way towards the drop in the 'segregated all' condition versus the 'segregated rewards' and 'bundled' conditions. A higher pheromone deposition on the way towards the sucrose drop in the 'segregated all' condition could provide converging evidence that pheromone deposition is a sensitive indicator of "liking".

      Unfortunately, in our current setup it was impossible to collect pheromone deposition data on the way towards the drop. Pheromone deposition has to be collected by eye, and the experimenter needed to maintain attention on the delivery of the successive rewards. A camera was not an option either, as the distance and resolution needed to record the whole runway would be insufficient to notice deposition, which instead requires the experimenter to follow the ants and count the individual stereotyped behaviours. We do observe the effect of higher deposition near the drop relative to further down the runway on the way back, which seems to be congruent with the response to consumption. We are however aware this is not sufficient, as it may just be linked with the distance. Regardless, as per point 1, we are decreasing our claims for the liking vs wanting framework.

    1. Author Response

      Reviewer #1 (Public Review):

      I'm curious about whether the microscopy provided any information about when secretory vesicles leave the TGN. Do they leave throughout the lifetime of a TGN structure, or do they leave in a burst when a TGN structure disperses as marked by loss of Sec7? This information might take us a step closer to understanding how secretory vesicles are made.

      Given the limitations of our current imaging set-up with regards to high-speed 3D two-color microscopy, we were unable to capture a large number of these events and therefore cannot make concrete statements about this, however, the quantified events did not appear to be preceded or followed by additional events, suggesting some temporal separation.

      Reviewer #2 (Public Review):

      The authors are encouraged to integrate their data together better with published biochemistry and structural work into more complete mechanisms for vesicle trafficking, tethering and fusion. The manuscript would be improved by a clearer model(s) of how these factors come together to carry out exocytosis.

      This suggestion has been addressed by the addition of a new model figure (Figure 9).

      Moreover, many conclusions (especially as they appear in the Results and Figures) are written as if they are well supported by the data (or others' data), when they are often speculative, or reasonable alternative explanations exist. The authors should be clear about which conclusions are well supported, and which are hypotheses. (e.g. Fig 6I, which is a terrific figure, but some of the "conclusions/statements" are speculations).

      We have made textual changes to make clearer distinctions between conclusions that are supported by the data, and which are more speculative.

      The mechanistic and experimental definitions for the start/end of "tethering" and "fusion" are not clearly stated in the main text, which leads to confusion when examining the arrival of different factors (and seems to lead to circular arguments about what is defining what). Are these definitions well supported by the previously published and current data? E.g. is the disappearance of GFP-Sec4 really equal to the fusion event? Without data showing membrane-merger or content delivery, this needs to be described as an assumption that is being made.

      Early in the results, we now define precisely what we interpret as the start of tethering and time of fusion. Unfortunately, thus far, all attempts at designing a cargo marker suitable for defining membrane fusion have not succeeded, however, we believe the observations in Figure 4 strongly support assumption that loss of GFP-Sec4 signal coincides with fusion.

      The Sro7 results and conclusions are complicated, and not always carefully supported, for several reasons: there is a functionally redundant paralog Sro77, and data shows Sro7 can bind to Sec4, Sec9 and Exo84 in exocyst (Brennwald, Novick and Guo labs). The authors should be clearer, as they seem to pick and choose which interactions they think are relevant for different observations.

      We did not intend to “pick-and-choose” relevant interactions and now more clearly state what our Sro7 results mean.

      The assumption that yeast Sec1 behaves similarly to other Sec1/Munc18 proteins for "templating" SNARE complex assembly, e.g. Vps33 in Baker et al, is unlikely, given the binding studies from a number of labs (Carr, McNew, Jantti). Furthermore, the evidence for Sec1 interaction with exocyst suggests that they may work together (Novick, Munson labs). Previous data from the Guo lab (Yue et al 2017) and new BioRxiv data from the Munson/Yoon labs suggest that exocyst may play key roles in SNARE complex assembly and fusion.

      We did not mean to imply that the exocyst does not play a meaningful and critical role in SNARE complex assembly and fusion. This was an unintentional omission, which we have now addressed in the text. Our interpretation of the published meaning of SM-protein “templating” is that SM’s facilitate the alignment of the critical zero-layer ionic residues in the SNARE motifs, which may be possible regardless of affinity to single SNARE motifs. Indeed, for Sec1 specifically, it may be possible that this exact function is of lower importance relative to, perhaps, the stabilization and protection of trans-SNARE complexes prior to membrane fusion. Future studies may clarify this.

      There is concern that the number of molecules of each of the factors measured is accurate, and how the authors really know that they are visualizing single vesicle events (especially with data showing that "hot-spots" may exist). For example, why is the number of molecules of exocyst is ~double or more than that previously observed (Picco et al; Ahmed et al with mammalian exocyst).

      Estimating the numbers of molecules is subject to some variation due to fluorescent tags used and to some extent where the protein is tagged. Since different tags were used in the earlier studies, being within a factor of two is not that surprising.

      For puncta of exocyst subunits in the mother or moving towards the plasma membrane, what is the evidence that they are actually on vesicles? The clearest argument seems to be the velocity at which they move, but this could be due to the direct interaction of exocyst with the myosin (which is a tighter interaction in vitro than exocyst-Sec4 binding), rather than being on vesicles. Furthermore, do all the exocyst complexes in the cell show this behavior, or could these be newly synthesized/assembled complexes?

      Transport of the exocyst by myosin alone without a vesicle seems very unlikely, as this myosin V needs to be activated by binding vesicle-associated Sec4 (Donovan et al., 2012, 2015). Moreover, transport of just two exocyst complexes by a myosin dimer would be very hard to detect. Nonetheless, we have added an additional supplementary figure (Figure 1 Supplement 5C) illustrating a clear example of exocyst complex colocalization with a secretory vesicle in the mother cell which we hope will quell fears that the exocyst complex is indeed on secretory vesicles, albeit in small numbers, during this stage of transport.

      With regard to the exocyst octamer leaving at the time of "fusion," the authors should discuss Ahmed et al.'s finding of Sec3 leaving prematurely in mammalian cells, as well as data from the Toomre lab.

      We did reference this earlier work in mammalian cells and indicate that it differs from the situation in yeast. We don't have anything insightful to be drawn from these differences.

      Reviewer #3 (Public Review):

      In this context, it is notable that dual-channel imaging appears to be made by sequential, not simultaneous, acquisition, which deserves a currently missing comment. Moreover, given the weight that image acquisition plays in this project, it might be described and justified better.

      As noted above, we have expanded our description of the microscopy. We took two-color images sequentially as our microscope is not configured with a beam-splitter for simultaneous imaging.

      This referee could not fully understand the routine of image acquisition, specifically, the continuous movement of the stage in the Z-axis as images are streamed (to the RAM or to the disk? the latter takes time, line 177); does it mean that Z-stepping is solely governed by the exposure time? The CCD camera penalizes pixel size (16 µm) at the expense of achieving outstanding quantum efficiency. The optical path includes a 100x objective and a 2x magnification lens to compensate for the large camera pixel size, thereby achieving 0.085 µm/pixel, but these lenses 'waste' part of the fluorescent signal. One wonders if the CMOS camera (6.5 µm pixel size) coupled with a 63x objective wouldn't be appropriate? A brief discussion on this choice would be helpful for readers.

      We now discuss the microscopy in more detail and why we use an EMCCD rather than aCMOS camera.

      It is remarkable that Sec2 and Sec4 are recruited to membranes even before a vesicle is formed (Fig 6I). I find somewhat weak the evidence that RAB11s 'mark' the TGN, and disturbing the fact that RAB11 reaches the PM (does GFP tagging prevent GAP accession?). I should like to recommend strongly that the authors integrate into the introduction/discussion information on the late steps of exocytosis available for Aspergillus nidulans, another ascomycete that is particularly well suited for studying this process. Here RAB11 is not a late Golgi resident but is transiently (20 s) recruited to TGN cisternae in the late stages of their 120 s maturation cycle to drive the transition between Golgi and post-Golgi (Pantazopoulou MBoC, 2014). Recruitment of RAB11 to the TGN is preceded by the arrival of its TRAPPII GEF (Pinar, PNAS 2015; Pinar PLOS Gen 2019), a huge complex that is incorporated en bloc to the TGN (Pinar JoCS, 2020). Upon RAB11 acquisition RAB11 membranes engage molecular motors (Penalva, MBoC 2017) to undertake a several-micron journey that transports them to a vesicle supply center located underneath the apex (review, Pinar & Penalva, 2021). Here is where Sec4 is located, strongly indicating that there is a division of work between two Rabs each mediating one of the two stages between the TGN and the membrane (Pantazopoulou, 2014, MBoC).

      In the general comments above, we discuss the possible artifact of tagged Ypt31 on the PM. In the Discussion, we now compare our results in S. cerevisiae with the findingss in A. nidulans.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, the authors were trying to develop an approach for microindentation-based spatial mapping of articular cartilage of mouse femur. Because mouse cartilage in articulating joints is incredibly thin and challenging to indent repeatably and reliably, a need exists to increase resolution of indentation spacing on very small surfaces, improve sensitivity of indentation (e.g., surface detection), and reduce error and improve accuracy of indentation measurements. Using a relatively new multiaxis material test stand with repositioning capabilities and multi axis load cells, the authors developed a spatial indentation test protocol as well as used this array-based approach to measure cartilage thickness via needle probing. They then validated thickness measurements generated using needle probing with high resolution 3D x-ray imaging using contrast enhancement with phosphotungstic acid (PTA). The authors then compared cartilage thickness and indentation mechanical properties between wild type (C57BL6J) and Prg4 mutant mice.

      This work is rigorous and includes new techniques that are validated using orthogonal approaches. Some of the techniques used in this work, especially indentation-based mapping of cartilage stiffness in small mouse joints, have been challenging for the field to overcome. This is especially true with the exploding number of small animal studies investigating cartilage health in transgenic mouse strains and injury models. While innovative and important, there remain a few key experiments that would help with validation of the data acquired in these experiments.

      Specifically, a general rule of thumb for indentation testing is to test no more than 1/10th the thickness of the indented material. Because the cartilage thickness of the medial condyles (~0.04mm) was only ~2x that of the indentation depth used for automated indentation mapping (0.02mm), it is possible that this thin region of cartilage will lead to substrate effects from the subchondral bone on the indentation data. It is unclear if the indentation measurements are characterizing cartilage or substrate properties. This may not be a major issue for healthy, intact cartilage (including in the mutant strains) but will likely have a major impact on interpretation of results following cartilage degeneration and loss.

      It is unclear if damage was caused by the 0.02mm indentations because the XRM scanning occurred after needle probing tests. The "bands" observed in the 3D XRM imaging following both indentation and needle probing (Fig 2A2) suggests that the indentation probes and individual needle probings at each site are not perfectly overlapping. Surface congruency of the cartilage suggest valley formation at indentation sites.

      We thank the reviewer for the enthusiastic comments on our work and its importance to the field. We agree this is the known rule of thumb; however, by employing a microindentation rather than a nanoindentation approach (such as AFM) to also obtain spatial resolution, we are unable to probe the cartilage within a 1µm amplitude range reliably. Also, we had to accommodate for thickness variations throughout the cartilage surface (thinner and thicker regions) during indentation testing, which are unknowable before needle probing thickness measurements. We completely agree that substrate effects can play a role on indentation data as this is well described within the field. Therefore, to mitigate such effects, the instantaneous modulus was calculated at 20% strain for all positions and presented alongside thickness mappings of same surfaces to avoid misinterpretations on cartilage loss. Demonstrated repeatability in indentation peak forces during test-retest suggests indentations did not damage the cartilage surfaces. This can be further corroborated by XRM imaging of femurs subjected to indentation testing only. Nevertheless, we would also like to clarify (and will do so in the methods) that the indentation and needle probing were not undertaken on the exact same position, exactly because of this possibility. We apologize for any misunderstandings and would be happy to further clarify that on the methods section.

    1. Author Response

      Reviewer #3 (Public Review):

      The study by Randzavola and colleagues provides a follow-up of their previous publication (Thomas DC et al, J Exp Med 2017) describing EROS (Essential for Reactive Oxygen Species or C17Orf62) as a novel chaperone essentially required to support the phagocyte Nox NADPH Oxidase respiratory burst and bacterial killing. Here, the authors extend the investigation of the mechanism underlying EROS effect and show its very early binding in the endoplasmic reticulum and interaction with immature partially glycosylated forms of gp91phox (the catalytic subunit of the Nox complex), allowing the incorporation of heme and subsequent binding of p22phox, which later follows the usual steps for complex maturation. A novel finding was the association of EROS with the OST component of the N-glycosylation machinery. An extended proteome analysis confirmed that EROS is quite specific for the gp91phox/p22phox complex and also for the purinergic P2X7 receptor, which also interacts with EROS (as also shown previously by the authors and further investigated by Ryoden et al. J Immunol 2020). The authors further validate EROS binding to P2X7 and provide evidence that EROS loss-of-function impairs P2X7-associated functions. Particularly, mice with genetically ablated EROS show improved survival to influenza infection.

      A major strength of this line of investigation is the clear functional importance of EROS in the regulation of the protein expression of the Nox complex components. Previous work has clearly shown that human EROS deficiency associated with the severe immunodeficiency Chronic Granulomatous Disease, which is usually caused by genetic deficiency of the Nox complex components. Indeed, the loss of gain of functions of EROS are very clearly associated with major changes in the expression of those components, indicating EROS functional relevance. Moreover, the interplay between the P2X7 receptor and EROS is also relevant, given that this receptor mediates an important arm of innate immunity, namely the nucleotide-driven inflammasome activation. Thus, the authors are likely dealing with some undoubtedly important novel information which may be of broad impact to understand several aspects of the adaptive and even adaptive immunity.

      Enthusiasm for this article, however, is somewhat decreased by some aspects, as follows:

      1) While there is a substantial amount of new data, the corresponding progress in depth of mechanistic insights has not been commensurate, bearing in mind the author's previous work. The novel findings are the more clear documentation of EROS/gp91phox interaction and its time-course during nascent gp91phox protein processing in the ER. Also, their interplay with the OST complex. The extended list of proteins associating with EROS essentially confirms previous findings. Also, the work with P2X7 mostly confirms previous findings, while the novel and interesting experiment with EROS-silenced mice and viral infection needs further work, as commented below.

      We thank the reviewer for this comment and for seeking clarity on novelty. We have addressed this above and in the discussion section. We have not reported the EROS interactome by mass spectrometry in previous work.

      2) Some aspects of these results are less than clearcut. The association between gp91phox and EROS is generally convincing, but for many experiments the authors make wide use of transfections of tagged protein constructs. One can clearly understand that this is possibly the only feasible approach at this time, however these constructs carry the intrinsic problem of possible protein misfolding, which would make them a potentially artificial target of any endoplasmic reticulum chaperone-like protein such as EROS. This would impact exactly on the very mechanism the authors are proposing for EROS effects, i.e., early protein processing.

      We understand Reviewer 3’s concerns about using tagged constructs. However, all transfection experiments depicted in Figure 1 have been done with untagged constructs and in different cell types in both mouse and human systems. The whole approach is also validated by extensive previous work showing the ability of transfected p22phox to augment gp91phox expression (Yu et al., J Biol Chem 1997; PMID: 9341176). All our experiments showed the same result, namely the stabilisation of the 58kDa gp91phox precursor. We have now included data showing we can immunoprecipitate endogenous gp91phox in PLB985 cells and detect endogenous EROS (Figure 3, figure supplement 1A) which confirms the specificity of the association between gp91phox and EROS. In the same sample, we can also detect endogenous p22phox (our positive control) which is well-known to associate as heterodimer with gp91phox. Furthermore, transfection of our constructs does not induce significant ER stress in HEK293 cells. Based on our own data and that of other investigators, we argue that this is a valid and useful approach to demonstrating the ability of EROS to increase gp91phox abundance. Similarly, this is just one of many orthologous techniques used in the manuscript.

      3) The same consideration applies to the experiments in Figure 3 with the OST complex STT3A. The co-localizations shown by the authors are technically acceptable, but their meaning is unclear, given it is expected that the proteins EROS and OST occupy the same compartment, being ER-located proteins, especially if transfected as constructs (tagged or not).

      The experiment has been done to assess the localisation of gp91phox relative to EROS and STT3A which are known to occupy the ER -compartment as pointed by the reviewer. Since HEK293 cells do not express gp91phox, this microscopy analysis allowed to determine if some population of gp91phox could be detected with EROS and STT3A at the ER as opposed to its localization as a mature protein at the plasma membrane and within granules, in phagocytic cells.

      4) It would be important to assess whether cells receiving such constructs depict markers of endoplasmic reticulum stress and/or show impaired survival.

      This has been addressed in Reviewer 3’s recommendation for author point 2.

      5) The experiments with co-transfection in HEK293 cells of EROS, Nox1 and Nox4 provide results at variance with the author's data in their previous work, in which endogenous Nox1 (intestine) and Nox4 (kidney) had no changes in expression in genetically silenced EROS mice.

      We thank the reviewer for this comment and acknowledge that this introduces some ambiguity. In showing the augmentation of NOX1, NOX4 but not p22phox or NOX5 we are demonstrating that it is likely that EROS can bind and stabilise NOX proteins that also require p22phox. In the case of NOX4, this is also supported by our yeast 2 hybrid data. Thus, these data suggest that EROS can bind p22phox-dependent NOX proteins. The key question is whether EROS has a physiological role in controlling the expression of other NOX proteins. Although we addressed this in our previous study, we have done so in a more extensive way in this manuscript. In particular, we note the subsequent publication by Diebold et al. (Methods Mol Biol 2019; PMID: 31172474) which points out that many commercially available antibodies are non-specific. Detailed examination showed this to be the case for the antibody we used in Thomas et al., (J Exp Med, 2017; PMID: 28351984). We therefore undertook specific analysis with the anti-mouse NOX1 antibody clone from Dr C. Yabe-Nishimura and Dr. Misaki Matsumoto.

      Similarly, our work on NOX4 in Thomas et al 2017 (J Exp Med, 2017; PMID: 28351984) suggested that while NOX4 is certainly present in the kidneys of EROS-/- mice, this was a limited analysis as it was not the main focus of the paper, and the conclusion was that there was no drastic effect on NOX4 expression in the same manner as that observed for NOX2. For the revisions to this paper, we examined a cohort of 4 control and 4 EROS-/- mice and showed that EROS does not physiologically regulate NOX4 in the kidney.

      Thus, the use of HEK293, which do not express NOX proteins, as a reductionist system may favour the effect of EROS on NOX1 and NOX4 abundance upon transfection of the constructs. One possible explanation could be that EROS binds to a conserved motifs present on NOX1, NOX2 and NOX4 which is readily accessible in the system we are using.

      6) The article is conceptually divided into two parts. However, there is no clear cross-fertilization between them and they essentially do not integrate.

      Although the reviewer notes that it seems that there are two separate stories, this reflects that we have extensively characterised the function of EROS and found that it specifically and profoundly affects only two distinct pathways in immunity, which is significant in itself. A strength of our manuscript is our extensive granular mass spectrometry approach which shows the specificity of EROS in 2 different cell types in which up to 8000 proteins have been detected. We have therefore placed the control of P2X7 and gp91phox-p22phox in context of the entire proteome. Our paper defines just how specific EROS is in its physiological effects and we therefore focus on the two major pathways that are affected by EROS deficiency. We integrate this in the final figure by showing how the combined lack of gp91phox and P2X7 lead to resistance to influenza A in contrast to the susceptibility to certain bacterial infections.

      7) While the authors claim that "the loss of both ROS and P2X7 signalling leads to resistance to influenza infection", this was not in fact shown in this work. It is known that P2X7 deficiency protects against influenza infection. Thus, it follows naturally that EROS deficiency, which essentially eliminates the expression of P2X7, would have the same effect. However, the role of ROS and gp91phox, i.e. whether or not they add to this equation, remains unclear.

      We thank the reviewer for this comment. The role of phagocyte NADPH oxidase-derived ROS has been explored in gp91phox deficiency and we apologise if this is not made clear in our manuscript. We have now added the following text to the discussion section of the manuscript:

      “A particular strength of our study is that we show marked in vivo sequelae of the lack of P2X7. EROS deficiency leads to profound susceptibility to bacterial infection but protects mice from infection with influenza A. This is likely to reflect the fact that mice that are (i) deficient in gp91phox (ii) deficient in P2X7 (iii) treated with P2X7 inhibitors have improved outcomes following infection with influenza A and raises intriguing questions about the physiological role of EROS. Snelgrove et al showed that gp91phox deficiency improved outcomes in influenza A. gp91phox knockout mice exhibited a reduced influenza titre in the lung parenchyma. Inflammatory infiltrate into the lung parenchyma was markedly reduced and lung function significantly improved (Snelgrove et al., 2006). To et al then showed that the phagocyte NADPH oxidase is activated by single stranded RNA and DNA viruses in endocytic compartments. This causes endosomal hydrogen peroxide generation, which suppresses antiviral and humoral signalling networks via modification of a highly conserved cysteine residue (Cys98) on Toll-like receptor-7. In this study, targeted inhibition of endosomal reactive oxygen species production using cholestanol-conjugated gp91dsTAT (Cgp91ds-TAT) abrogates influenza A virus pathogenicity (To et al., 2017). This group went on to explore infection with a more pathogenic influenza A strain, PR8. Using the same specific inhibitor. Cgp91ds-TAT reduced airway inflammation, including neutrophil influx and alveolitis and enhanced the clearance of lung viral mRNA following PR8 infection (To et al., 2019). This group has also shown that NOX1 (Selemidis et al., 2013) and NOX4 (Hendricks et al., 2022) can drive pathogenic inflammation in influenza A, emphasising the importance of clarifying the roles of EROS in control of expression of these proteins.

      In studies on P2X7, Rosli et al showed that mice infected with 105 PFU of influenza A HKx31 had improved outcomes if they were treated with a P2X7 inhibitor at day 3 post infection and every two days thereafter. Survival was also improved even if the inhibitor is given on day 7 post infection following a lethal dose of the mouse adapted PR8. This was associated with reduced cellular infiltration and pro-inflammatory cytokine secretion in bronchoalveolar lavage fluid, but viral titres were not measured (Rosli et al., 2019). Leyva-Grado et al examined influenza A infection in P2X7 knockout mice. They infected mice with both influenza A/Puerto Rico/08/1934 virus and influenza A/Netherlands/604/2009 H1N1pdm virus. They showed that P2X7 receptor deficiency led to improved survival after infection with both viruses with less weight loss (Leyva-Grado et al., 2017). Production of proinflammatory cytokines and chemokines was impaired and there were fewer cellular hallmarks of severe infection such as infiltration of neutrophils and depletion of CD11b+ macrophages. It is worth noting that the P2X7 knockout strain used in this study was the Pfizer strain in which some splice variants of P2X7 are still expressed (Bartlett et al., 2014). Hence, the dual loss of the phagocyte NADPH oxidase and P2X7 in EROS-/- mice likely confers protection from IAV infection. By reducing the expression of both NOX2 and P2X7, EROS regulates two pathways that may be detrimental in influenza A and we speculate that EROS may physiologically act as a rheostat controlling certain types of immune response.”

    1. Author Response

      Reviewer #1 (Public Review):

      This well-written paper combines a novel method for assaying ubiquitin-proteasome system (UPS) activity with a yeast genetic cross to study genetic variation in this system. Many loci are mapped, and a few genes and causal polymorphism are identified. A connection between UPS variation and protein abundance is made for one gene, demonstrating that variation in this system may affect phenotypic variation.

      The major strength of the study is the power of yeast genetics which makes it possible to dissect quantitative traits down to the nucleotide level. The weakness is that is not clear whether the observed UBS variation matters on any level, however, the claims are suitable to moderate, and generally supported.

      We agree with the reviewer that understanding how causal variants for ubiquitin-proteasome system (UPS) activity affect other molecular, cellular, and organismal phenotypes is an important area of future research.

      The paper provides a nice example of how it is possible to genetically dissect an "endo-phenotype", and learn some new biology. It also represents a welcome attempt to put the function of a mechanism that is heavily studied in molecular cell biology in a broader context.

      We thank the reviewer for these kind words.

      Reviewer #2 (Public Review):

      In this manuscript, the authors developed an elegant quantitative reporter assay to identify quantitative trait loci that regulates N-end rule pathway, a major quality control mechanism in eukaryotes. By crossing two yeast species with divergent proteostasis activity, they generated a population that showed broad variation in proteostasis activity. By sequencing and mapping the underlying loci, they have identified several genes that regulate N-end rule activity. They then verified them using precise genetic tools, validating the power of their approach.

      Overall, it is a very solid manuscript that would be highly interesting for the quality control field.

      In general, I really liked this manuscript for these reasons:

      • Uses fluorescent timers elegantly to quantitatively measure protein degradation.

      • Validates the approach in depth, showing the readers how the tool works.

      • Uses the power of yeast genetics and bulk segregant analysis to map loci that may have small effects.

      • Validates the mapped loci using precise genetic tools.

      In a field that is dominated by biochemistry, this manuscript will be a fresh breath of air…

      We thank the reviewer for their thoughtful evaluation of our work and these kind words.

      Reviewer #3 (Public Review):

      This manuscript, "Variation in Ubiquitin System Genes Creates Substrate-Specific Effects on Proteasomal Protein Degradation" studies the genetic basis of differences in protein degradation. The authors do so by screening natural genetic variation in two yeast strains, finding several genes and often several variants within each gene that can affect protein degradation efficiency by the Ubiquitin-Proteasome system (UPS). Many of these variants have "substrate-specific effects" meaning they only affect the degradation of specific proteins (those with specific degrons). Also, many variants located within the same genes have conflicting effects, some of which are larger than others and can mask others. Overall, this study reveals a complex genetic basis for protein degradation.

      Strengths: Revealing the genetic basis for any complex trait, such as protein degradation, is a major goal of biology. The results of this paper make a significant step towards the goal of mapping the genes and variants involved in this specific trait. Fine mapping methods are used to home in on the specific variants involved and to measure their effects. This is very nicely done and provides a detailed view of the genetic basis of protein degradation. Further, the GFP/RFP system used to quantify the efficiency of the protein degradation system is a very elegant system. Also, the completeness of the analysis, meaning that all 20 N-degrons were studied, is impressive and leads to very detailed findings. It is interesting that some genetic variants have larger and opposite effects on the degradation of different N-degrons.

      We thank the reviewer for these positive comments.

      Weaknesses: Some of the results discussed in this paper are not surprising. For example, the finding that both large effect and small effect genetic variants contribute to this complex trait is not at all surprising. This is true of many complex traits.

      We agree with the reviewer that the number and patterns of QTLs we observe are perhaps not unexpected given that most traits are genetically complex. However, we also note that our results stand in stark contrast to previous efforts to understand how natural genetic variation affects the UPS, which have focused almost exclusively on large-effect mutations in UPS genes that cause rare Mendelian disorders. We have therefore chosen to retain our discussion of the complex genetic architecture of the UPS.

      The discussion of human disease is also a bit extensive given this study was performed on yeast. It might be more productive to use these findings to understand the UPS better on a mechanistic level. Why does the same genetic variant have opposite effects on the degradation of different degrons, even in cases where those degrons are of the same type?

      Following the reviewer’s suggestion we have removed multiple references to human disease from the introduction. We retained paragraph 3 of the introduction (previously, lines 43-55, pg. 2, para. 2 in the revised manuscript), which discusses disease-causing mutations in UPS genes, because the examples presented highlight two important motivations for our work: (1) individual genetic differences create variation in UPS activity and (2) much of our knowledge of how natural genetic variation affects the UPS comes from these rare, limited examples. However, we have re-written the paragraph to focus on these points and removed descriptions of the clinical manifestations of the disorders mentioned.

      We agree with the reviewer that understanding the mechanistic basis of substrate-specific variant effects on distinct N-degrons is important. However, doing so would require additional experiments that we argue are outside the scope of the current study.

      Overall, this manuscript excels at mapping the genetic basis of variation in the UPS system. It demonstrates a very complex mapping from genotype to phenotype that begs for further mechanistic explanation. These results are important to the UPS field because they may help researchers interrogate this highly conserved essential system. The manuscript is weaker when it comes to the broader conclusions drawn about the relative importance of large vs. small effects variants on complex traits, the amount of heritability explained, and the effects of genetic variation on protein abundance vs transcript abundance. Though in the case of protein vs transcript, I feel the cursory examination of the trends is perhaps at an appropriate level for the study, as it is mainly meant to show these things differ rather than to show exactly how and why they differ.

      We state that the distribution of QTL effect sizes for UPS activity consists of many QTLs with small effects and few QTLs of large effects. While this result is similar to patterns observed for other complex traits, it differs dramatically from the results of previous studies of genetic influences on the UPS, which have been largely confined to large-effect variants. Given these differences, we think it is appropriate and worthwhile to emphasize the complex genetic architecture of UPS activity.

      We agree that estimating the fraction of heritability explained by our QTLs and variants would be valuable. However, as noted in our response to Reviewer 1, the QTL mapping method we used does not permit ready calculation of heritability estimates due to its pooled nature.

      The reviewer is correct in noting that the primary goal of our RNA-seq and proteomics experiments was to provide an initial exploration of the effects of causal variants for UPS activity on global gene expression at the protein and mRNA levels. While a comprehensive dissection of the effects of this and other causal variants is an important area of future work, our results here show broad changes in global gene expression and establish that the causal UBR1 variant affects gene expression at the protein and mRNA levels.

      Reviewer #4 (Public Review):

      Overall the paper is clear and well-written. The experimental design is elegant and powerful, and it's a stimulating read. Most QTL mapping has focused on directly measurable phenotypes such as expression or drug response; I really like this paper's distinctive approach of placing bespoke functional assays for a specific molecular mechanism into the classical QTL framework.

      We thank the reviewer for their thoughtful evaluation of the work and positive comments.

    1. Author Response

      Reviewer #2 (Public Review):

      Recent advances in the investigation of functional brain connectivity have allowed the identification of the main connectivity gradient between unimodal to transmodal brain regions. Gao et al. aimed to test whether this connectivity gradient is changing according to task demands and if so, whether this change was also related to the complexity of brain signals evoked by events of various task demands. Their results are three-fold. 1) They first compared the gradient of connectivity obtained during a semantic relatedness judgment task to a purely visual detection task and to a resting state. a) They found that the same main gradient could be extracted from the three conditions, making it suitable for investigating the effect of word relatedness. b) Additionally, they showed that the word relatedness modulates the main gradient: when words are close, the gradient was strengthened, i.e., the dissociation between unimodal and transmodal areas was sharpened. 2) The authors found that the strength of word associations modulates the complexity of brain signals: the closer the words, the more convergent brain signals across participants and trials were, particularly in the transmodal areas of the main gradient. 3) They found that transmodal brain regions in the gradient were similarly activated in participants with similar relatedness judgments. Finally, they tested the link between the three results above using mediation analysis. They showed that the dimensionality difference (result 2) mediated the link between the gradient in the semantic task (result 1a) and the interindividual similarities in semantic judgment and brain activation (result 3). Altogether, this study demonstrates that the main gradient state is predictive of both task variations and inter-individual similarities of task responses. Those results suggest that gradients are a relevant measure of functional connectivity for investigating the variation of connectivity within a task and between individuals. The results overall support conclusions.

      • Strengths:

      1. The main strength of the article is the methods used to obtain the results. Gradients of functional connectivity are a new measure that goes beyond classical brain network functional connectivity. Investigating the dynamics of gradients during a semantic task allows us to better understand how different brain regions (unimodal, transmodal, belonging to some specific networks, etc.) adapt to variability in a task.

      The second strength is the topic: the question is relevant to researchers interested in semantic memory or processing and to any researcher interested in brain dynamics within and between individuals. The demonstration is elegant, and the behavioral task is simple; it compensates for the complexity of the methods.

      • Weaknesses:

      1. The main weakness of the article is the lack of details about the performed analyses, which prevents a clear understanding of the results. The complexity of their methods calls for a crystal-clear description of them. The reader is not informed about how statistics are computed. New terms are sometimes used to describe already mentioned results, making reading the article particularly difficult.

      Thanks very much for the suggestions on statistics. We have now significantly updated our manuscript, please see our detailed reply to Essential Revision.

      1. Conceptually, the authors assumed that during the task, participants generated a word linking the pair of words displayed on the screen and that the neural and cognitive processes solely vary along with the distance between the two words of the pair. However, when words are close, it is not obvious that individuals will generate a third word to link them, and it might be even more challenging to find a linking word in that case as opposed to when words are quite distant from each other. Considering those potential confounds, the interpretation of the results could be different. The authors always contrast very high versus very low distance, then the observed results could also be interpreted as: "observing a link" versus "generating a word link", the first scenario is much more cognitively simple, and this could also explain the differences they observed.

      Sorry that we did not explain our task instruction clearly in our initial submission. The participants were not instructed to generate a linking word specifically and the link was typically expressed in multiple words and could involve imagery as well as words. For this reason, we are not sure that a simple recognition/generation distinction will capture the different neural effects that relate to high and low associations. However, the text now acknowledges that multiple cognitive processes could contribute to the differences we observe, including recognition vs. generation, more automatic retrieval vs. more controlled retrieval, and processes associated with creativity. We have acknowledged multiple ways that the neural patterns could be interpreted in the discussion. Please see page 29.

      ‘Though our results are in line with controlled semantic cognition framework in general, while multiple cognitive processes could contribute to the differences that relate to strong and weak associations we observe, including observing vs. generating semantic links, more automatic retrieval vs. more controlled retrieval, imagery, and processes associated with creativity.’

      Reviewer #3 (Public Review):

      With resting-state fMRI data, recent work has mapped the organisation of the cortex along a continuous gradient, and regions that share similar patterns of functional connectivity are located at similar points on the gradient (Margulies et al., 2016). In the current study, the authors investigate how this dimension of connectivity changes during conceptual retrieval with different levels of semantic association strength. Specifically, they perform gradient analysis on task-fMRI informational connectivity data and reveal a similar principal gradient to the previous study, which captures the separation of heteromodal memory regions from the unimodal cortex. More importantly, by comparing the gradient generated with data from different experimental conditions (i.e., strong vs. weak association), the authors find the separation of the regions at the two ends of the gradient can be regulated by the association strength, with more separation for stronger association. They also examine the relationships between the gradient values and dimensionality and brain-semantic alignment measures, to explore the nature of this shifting gradient as well as the corresponding brain areas.

      Strengths:

      1. The aim of this study is clear and the relevant background literature is covered at an appropriate level of detail. With the cortical gradient analysis approach, this study has the potential to make a contribution to the understanding of the topographical neural basis of semantics in a fine-grained manner.

      2. The methodology in the current study is novel. This study validates the feasibility of performing gradient analysis on task-fMRI data, which is enlightening for future research. Using the number of PCs generated by PCA as a measure of dimensionality is also an interesting approach.

      3. The authors have conducted multiple control analyses, which tested the validity of their results. Specifically, a control task without engaging semantic processing was built in the experimental design (i.e., the chevron task), and the authors conducted multiple parallel control analyses with the data from this control task as a comparison with their main results. Other control analyses were also performed to validate the robustness of their methodological choices. For example, varied thresholds were used during the calculation of dimensionality and similar results were obtained.

      Weaknesses:

      1. As a major manipulation in the experiment, it is not very clear how the authors split/define their stimuli into strong and weak semantic association conditions. If I understood correctly, word2vec was used to measure the association strength in each pair of words. Then the authors grouped the top 1/3 association strength trials as a "strong association" condition and the bottom 1/3 as "weak association" (Line 689), and all analyses comparing the effect of "strong vs. weak association" were conducted with data from these two subsets of stimuli. However, in multiple places, the authors indicate the association strength of their stimuli ranges from completely unrelated to weakly related to highly related (Line 612, Line 147, Line 690, and the examples in Figure 1B). This makes me wonder if the trials with bottom 1/3 association strength (i.e., those were used in the current study) are actually "unrelated/no association" trials (more like a baseline condition), instead of "weak association" trials as the authors claimed. These two situations could be different regarding how they engage semantic knowledge and control processing. Besides, I am very interested in what will the authors find if they compare all three conditions (i.e., unrelated vs. weak association vs. strong association).

      Thanks very much for bringing up this point. We have conducted additional analysis for the intermediary bin and compare it against the bottom for the gradient analysis and against the top 1/3 for the dimensionality analysis (compared to the baseline condition for each analysis), which did show a similar patten like the contrast between strong and weak association but with a smaller effect, thus representing an intermediary profile as expected. The correlation between the principle gradient difference between middle and weak association with the principle gradient value derived from resting state was also significant, see Figure S10C, but its magnitude was smaller than what we reported in the main body of manuscript (r = 0.235 vs. r = 0.369). Given that the expected strongest effect is between top and bottom 1/3, thus, we have now included these results in the supplementary materials. Please see Figure S10 in page 7.

      1. Following the previous point, because the comparison between weak vs. strong association conditions is the key of the current study, I feel it might be better to introduce more about the stimuli in these two conditions. Specifically, the authors only suggested the word pairs fell in these two conditions varied in their association strength, but how about other psycholinguistic properties that could potentially confound their manipulation? For example, words with higher frequency and concreteness may engage more automatic/richer long-term semantic information and words with lower frequency and concreteness need more semantic control. I feel there may be a possibility that the effect of semantic association was partly driven by the differences in these measures in different conditions.

      Thanks for raising this point. We have performed additional control analysis to examine the relationship between association strength and psycholinguistic features according to the reviewer’s suggestion. The association strength did not show significant correlation with word frequency (r = -0.010, p = 0.392), concreteness (r = -0.092, p = 0.285) or imageability (r = 0.074, p = 0.377). Direction comparison of these psycholinguistic features between strongly and weakly associated word-pairs also did not any significant difference: frequency (t = 0.912, p = 0.364), concreteness (t = 1.576, p = 0.119), imageability (t = 1.451, p = 0.153). Please see in page 32:

      ‘The association strength did not show significant correlation with word frequency (r = -0.010, p = 0.392), concreteness (r = -0.092, p = 0.285) or imageability (r = 0.074, p = 0.377).’

      1. The dimensionality analysis in the current study is novel and interesting. In this section, the authors linked decreasing dimensionality with more abstract and less variable representations. However, most results here were built based on the comparison between the dimensionality effects for strong and weak association conditions. I wonder if these conclusions can be generalised to results within each condition and across different regions (i.e., regions having lower dimensionality are doing more abstract and cross-modal processing). If so, I am curious why the ATL (a semantic "hub") in Figure 3A has higher dimensionality than the sensory-motor cortices (quite experiences related) and AG (another semantic "hub").

      The dimensionality and its relationship to the cortical gradient was also examined for each condition. We assessed whether this relationship was influenced by associative strength, averaging dimensionality estimates for sets of four trials with similar word2vec values using a ‘sliding window’ approach. There was a negative correlation between overall dimensionality (averaged across all trials) and principal gradient. And the magnitude of this negative relationship increases as a function of the association strength. So, we believe our conclusion could be generalized across conditions. In our results, we observed higher dimensionality in ATL/frontal orbital cortex than sensory-motor cortices, which seems contradictory to our conclusion. However, these areas are subject to severe distortion and signal loss in functional MRI, the lower tSNR, thus, caused higher dimensionality estimation in PCA. Therefore, we conducted a control analysis in which regions in limbic network were removed due to their low tSNR, while this pattern remained significant (r = -0.346, p = 0.038).

      Please see in Discussion part in page 30.

      ‘It is worth noting that not all brain regions showed the expected pattern in the dimensionality analysis – especially when considering the global dimensionality of all semantic trials, as opposed to the influence of strength of association in the semantic task. In particular, the limbic network, including regions of ventral ATL thought to support a heteromodal semantic hub, showed significantly higher dimensionality than sensory-motor areas – these higher-order regions are expected to show lower dimensionality corresponding to more abstract representations. However, this analysis does not assess the psychological significance of data dimensionality differences (unlike our contrast of strong and weak associations, which are more interpretable in terms of semantic cognition). Limbic regions are subject to severe distortion and signal loss in functional MRI, which might strongly influence this metric. Future studies using data acquisition and analysis techniques that are less susceptible to this problem are required to fully characterize global dimensionality and its relation to the principal gradient.’

      1. I am not sure about the meaning/representational content underlying the semantic similarity matrix in the semantic-brain alignment analysis. According to the authors, this matrix was built based on the correlation of participants' ratings of associative strength (0, no link; 1~4, weak to strong) across trials. The authors indicate that this matrix reflects the global similarity of semantic knowledge between participants (Line 403). However, even though two participants share very similar ratings of association strength across trials, they could still interpret the meaning/knowledge underlying the associations very differently. For example, one participant may interpret the link between "man" and "car" as a man owns a car but another participant may interpret it as a man is hit by a car, although both associations could be rated as strong for this trial. This situation may be even more obvious for those pairs with weak association. Therefore, I am not confident this is a measure of similarity of semantic knowledge.

      Thanks very much for bring up this point. Our experimenter carefully evaluated the links generated for each trial in each participant and found that the weaker association the less consistent their link being formed was. So, we agreed with the reviewer that even when two participants share similar ratings of association strength, they could still interpret those word pairs significantly different, especially for those weakly associated trials. Despite the retrieval content/meaning might be different, i.e. a man owns a car or a man is hit by car, both scenarios are quite consistent and without strong semantic conflict being detected. Therefore, we argued that the semantic-brain alignment might reflect the similarity of neural states of retrieval rather than general semantic content. We have now updated this point in the manuscript. Please see on page 20. ‘A semantic similarity matrix, based on the correlation of participants’ ratings of associative strength across trials (reflecting the global similarity of neural states of retrieval between participants; left-hand panel of Figure 4A), was positively associated with neural pattern similarity in inferior frontal gyrus, posterior middle temporal gyrus, right anterior temporal lobe, bilateral lateral and medial parietal cortex, pre-supplementary motor area, and middle and superior frontal cortex (right-hand panel of Figure 4A).’

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Vides et al. performed a functional analysis of the Parkinson's disease-associated leucine-rich repeat kinase 2 (LRRK2). In particular, the authors sought to address how membrane recruitment of LRRK2 leads to an increase in its kinase activity. Briefly, the authors showed that LRRK2 utilizes two distinct binding sites (350-550 #1, 17/18 #2) for Rab GTPases within its N-terminal Armadillo domain to achieve membrane association. Intriguingly, these two sites differ substantially in their preference for binding phosphorylated (Rab8a, Rab10) and non-phosphorylated (Rab8a, Rab10, Rab29, Rab32, Rab39) substrates. In cells, a LRRK2 site #2 mutant showed a significantly reduced colocalization with phosphorylated Rab10. Using LRRK2 inhibitor washout experiments, the authors demonstrate that disrupting site #2 led to slower re-phosphorylation kinetics. Lastly, the authors employed an elegant in vitro system to demonstrate that LRRK2 membrane association and Rab phosphorylation are coupled in a feed-forward reaction. Overall, the work of Vides and colleagues provide compelling mechanistic insights into the spatial regulation of LRRK2.

      Nevertheless, a few critical points remain.

      Major points:

      1) Since LRRK2 is reported to form dimers and multimers, the authors should perform their colocalization studies (Figure 6) in cells lacking endogenous LRRK2.

      Co-localization with wild type LRRK2 is not seen with the mutant in question, so dimerization/oligomerization with endogenous protein appears not to be an issue for this construct.

      2) To what extent does modification of K17 and/or K18 (e.g., acetylation or ubiquitylation) play a role in regulating LRRK2 pRab binding?

      Phosphosite indicates LRRK2 ubiquitylation at K1118, K1129, K1833, K1963, K2091, with none in the ARM domain. We have not looked at either acetylation or ubiquitylation directly but now mention that this could regulate interaction with pRabs.

      3) In their lipid bilayer-based in vitro assay, the authors should also examine the effect of an LRRK2 variant that lacks site #1.

      We have included the opposite mutant with similar impact on the model: we show that lack of pRab binding site at the N-terminus removes the cooperativity of the otherwise wild type protein.

      Reviewer #2 (Public Review):

      Vides and colleagues describe a novel feed-forward mechanism of LRRK2-mediated phosphorylation of Rab8a and Rab10. The work underlies the importance of the N-terminal armadillo domain in the binding of different Rabs. They further characterized the Rab29 binding epitope, which is involved in the membrane targeting of LRRK2 mediated by Rab29 (site #1). Beyond previous work, the authors could demonstrate that one point mutation (K499E) is sufficient to abolish Rab29 binding. Furthermore, they could show that this binding site also binds the substrate Rabs Rab8a and Rab10. In addition to this binding site (#1), the authors identified one additional site (site #2) particularly involved in the specific binding of Rab8a and Rab10 but not of Rab29 nor the non-LRRK2 substrate Rab7, providing an explanation for the LRRK2 substrate specificity observed in vivo. While the Rab29 binding site bind nonphosphorylated Rabs, the newly identified site around the N-terminal Lysine 18 shows increased binding to phosphorylated Rab and provides support for a feed-forward mechanism in the substrate phosphorylation.

      The authors provide a sound biochemical characterization of critical steps of LRRK2 activation, which is of broad interest to the field. Beyond scientific interest, a well- characterized activation mechanism might guide future drug development strategies.

      We thank the reviewer for noting that we should document the bound nucleotide identity. Rab8 and Rab10 are not the easiest to work with–much harder than other Rabs to retain full nucleotide exchange capacity–preps show at best, 50% active molecules in terms of ability to exchange nucleotide. We maintain Mg-GTP throughout all purification steps and assays and use Q mutants in vitro to stabilize GTP binding. Even so, we now monitored the nucleotide state of purified Rabs by mass spec and found that our routine preps of Rab8A-Q and Rab10-Q each show a 50:50 ratio of bound GTP to GDP. We have noted this caveat in the text –our work will underestimate affinities since GTP-bound forms likely predominate in these interactions.

      Major concerns:

      • The nucleotide states of the different Rabs (after nucleotide exchange), need to be experimentally confirmed, i.e. by HPLC.

      • It is not always clear, which Rab variants (i.e. WT or Q63L) have been used for a particular experiment (information provided in the main text vs material and methods). While irrelevant for in vitro experiments, for studies in cells it should be considered that the use of Rab Q63L constructs (Q60L in Ras), does not necessarily imply that the GAP catalyzed GTP hydrolysis is completely abolished. In contrast to Ras GAPs, some RAB GAPs can provide the water-coordinating glutamine residue, critical for hydrolysis (see: Müller and Goody, 2018; PMID: 28055292).

      All studies within cells were done with endogenous Rab GTPases (WT). We have also clarified the text throughout as to which Rab form is used.

      Reviewer #3 (Public Review):

      Vide et al. present new insights into the interactions between LRRK2 and Rab GTPases. They identified two distinct Rab-binding sites in the N-terminal Armadillo (ARM) domain of LRRK2, which they named Site #1 and Site #2. One of the main findings is the striking effect of Rab GTPase phosphorylation on LRRK2's recruitment to and activation on membranes; both unmodified and phosphorylated Rabs (pRab) bind to the N-terminus of LRRK2, but to different regions. Site #1, located closer to the C-terminus of the ARM domain, binds unmodified Rab8A, Rab10, and Rab29, with Rab29 showing the highest affinity. Site #2, located at the extreme N-terminus of LRRK2, binds to the modified pRab8A and pRab10. Combining structure prediction and conservation analysis they identified the potential interaction interfaces of Site #1 and Site #2, including two conserved lysine residues (K17 and K18) in Site #2 that are critical for pRab binding. The authors propose a model where initial membrane association is mediated by binding unphosphorylated Rab8A, 10, or 29 to the lower-affinity Site #1. Membrane-associated LRRK2 then phosphorylates one of its substrates, which can now engage the higher-affinity Site #2, starting a cascade of phosphorylation events (the feed-forward mechanism).

      Overall, the authors present clear and convincing data showing the interaction between LRRK2's Nterminal ARM domain and Rab/pRab, and supporting their feed-forward mechanism. The main shortcoming in the manuscript is the absence of data directly addressing two important features of their feed-forward model: (1) The proposal that the increased activity of LRRK2 upon recruitment to membranes is only the result of its increased local concentration (without any contributions from a potential Rab-dependent activation); and (2) The ability of LRRK2 to simultaneously bind Rab and pRab. Despite this shortcoming, this manuscript presents an important contribution to our understanding of LRRK2 function, providing an elegant model for LRRK2's recruitment to and activation on membranes. This paper will be of much interest to a broad readership.

      We have fully addressed the “shortcoming”: we now demonstrate that phosphoRab10 can bind LRRK2 Armadillo domain simultaneously with Rab8 and also that pRab8 can activate kinase activity on Rab10. We thank the reviewer for these terrific suggestions.

    1. Author Response

      Reviewer #2 (Public Review):

      This study evaluates the causal relationship between childhood obesity on the one hand, and childhood emotional and behavioral problems on the other. It applies Mendelian Randomization (MR), a family of methods in statistical genetics that uses genetic markers to break the symmetry between correlated traits, allowing inference of causation rather than mere correlation. The authors argue convincingly that previous studies of these traits, both those using non-genetic observational epidemiology methods and those using standard MR methods, may be confounded by demographic effects and familial effects. One possible example of this kind of confounding is that the idea that obesity in parents may contribute to emotional and behavioral problems in children; another is the idea that adults with emotional and behavioral issues may be more likely to have children with partners who are obese, and vice-versa. They then make use of a recently proposed "within-family" MR method, which should effectively control for these confounders, at the cost of higher uncertainty in the estimated effect size, and therefore lower power to detect small effects. They report that none of the previously reported associations of childhood BMI with anxiety, depression, or ADHD are replicated using the within-family MR method, and that in the case of depression the primary association appears to be with maternal BMI rather than the child's own BMI.

      This argument that these confounders may affect these phenotypes is fairly sound, and within-family MR should indeed do a good job of controlling for them. I do not see any major issues with the cohort itself or the choice of genetic instruments. I also do not see any major issues with the definitions or ascertainment of the phenotypes studied, though I am not an expert on any of these phenotypes in particular. I am especially satisfied with the series of analyses demonstrating that the results are robust to many variations of MR methodology. Overall, I think the positive result this study reports is very credible: that the known association between childhood BMI and depression is likely primarily due to an effect of maternal BMI rather than the child's own BMI (though given that paternal BMI has a similar effect size with only a slightly wider confidence interval, I would instead say that the effect is from parental BMI generally, not specifically maternal.)

      In the updated results based on the larger genetic data release, the estimates for the association of maternal BMI and paternal BMI with the child’s depressive symptoms are more clearly different than they were in the smaller dataset (for maternal BMI, beta= 0.11, CI:0.02,0.19, p=0.01; for paternal BMI, beta=0.02, CI:-0.09,0.12, p=0.71). Therefore, in this version, it makes sense to note an association with maternal BMI specifically.

      The main weakness of the study comes from its negative results, which the authors emphasize as their primary conclusion: that previously reported associations of childhood BMI with anxiety, depression, and ADHD are not replicated using within-family MR methods. These claims do not seem justified by the evidence presented in this study. In fact, in every panel of figures 2 and 3, the error bars for the within-family MR analysis encompass the estimates for both the regression analysis and the traditional MR analysis, suggesting that the within-family analysis provides no evidence one way or another about which of these analyses is more accurate. More generally, in order to convincingly claim that there is no causal relationship between two traits, an MR study must argue that the study would be powered to detect a relationship if one existed. Within-family MR methods are known to have less power to detect associations and less precision to estimate effect sizes than traditional MR methods or traditional observational epidemiology methods, so it is not sufficient to show that these other methods have power to detect the association. To make this kind of claim, it is necessary to include some kind of power analysis, such as a simulation study or analytic power calculations, and likely also a positive control to show that this method does have power to detect known effects in this cohort.

      We agree that it is imperative that negative (i.e. “non-significant”) results are correctly interpreted - it is just as important to discover what is unlikely to affect emotional and behavioural outcomes as what does affect them. Negative results (non-significant estimates) are neither a weakness nor strength of the study, but simply reflect the estimation error in our analysis of the data. The key question is whether our within-family MR estimates are sufficiently powered to detect effect sizes of interest or rule out clinically meaningful effect sizes – or are they simply too imprecise to draw any conclusions? As the reviewer suggests, one way to address this is via a post-hoc power calculation. We consider post-hoc power calculations redundant, since all the information about the power of our analysis is reflected in the standard errors and reported confidence intervals. Moreover, any post-hoc power calculation will be necessarily approximate compared to using the standard errors and confidence intervals which we report.

      Despite these methodological reservations, we have conducted simulations to estimate the power of our within-family models (the R code is included at the end of this document). These simulations indicate that we do have sufficient power to detect the size of effects seen for depressive symptoms and ADHD in models using the adult BMI PGS. They also indicate that we cannot rule out smaller effects for non-significant associations (e.g., for the impact of the child’s BMI on anxiety). Naturally, this is entirely consistent with the width of the confidence intervals reported in results tables and in Figures 1 and 2. However, although power calculations are important when planning a study, they make little contribution to interpretation once a study has been conducted and confidence intervals are available (e.g., https://psyarxiv.com/tcqrn/). For this reason, we comment on these simulations in this response to reviewers but do not include them in the manuscript or supplementary materials. At the same time, we have changed the language used in the manuscript to be clearer that the results were imprecise and that values contained within the confidence limits cannot be ruled out.

      For example, the discussion now includes the following:

      ‘However, within-family MR estimates using the childhood body size PGS are still consistent with small effects of the child’s BMI on all outcomes, with upper confidence limits around a 0.2 standard-deviation increase in the outcome per 5kg/m2 increase in BMI.’

      And the conclusion of the paper now reads:

      ‘Our results suggest that genetic variation associated with BMI in adulthood affects a child’s depressive and ADHD symptoms, but genetic variation associated with recalled childhood body size does not substantially affect these outcomes. There was little evidence that BMI affects anxiety. However, our estimates were imprecise, and these differences may be due to estimation error. There was little evidence that parental BMI affects a child’s ADHD or anxiety symptoms, but factors associated with maternal BMI may independently influence a child’s depressive symptoms. Genetic studies using unrelated individuals, or polygenic scores for adult BMI, may have overestimated the causal effects of a child’s own BMI.’

      Regarding a positive control: for analyses of BMI in adults, suitable positive controls would include directly measured biomarkers such as fat mass or blood pressure or reported medical outcomes like type 2 diabetes. In adolescents and younger adults, age at menarche or other measures of puberty can be used, as these are reliably influenced by BMI. However, the age of the participants for whom within-family effects are being estimated (8 years), together with the lack of any biomarkers such as fat mass (due to the questionnaire-based survey design) mean no suitable measures are available.

      Reviewer #3 (Public Review):

      Higher BMI in childhood is correlated with behavioral problems (e.g. depression and ADHD) and some studies have shown that this relationship may be causal using Mendelian Randomization (MR). However, traditional MR is susceptible to bias due to population stratification, assortative mating, and indirect effects (dynastic effects). To address this issue, Hughes et al. use within-family MR, which should be immune to the above-listed problems. They were unable to find a causal relationship between children's BMI and depression, anxiety, or ADHD. They do, however, report a causal effect of mother's BMI on depression in their children. They conclude that the causal effect of children's BMI on behavioral phenotypes such as depression and anxiety, if present, is very small, and may have been overestimated in previous studies. The analyses have been carried out carefully in a large sample and the paper is presented clearly. Overall, their assertions are justified but given that the conclusions mostly rest on an absence of an effect, I would like to see more discussion on statistical power.

      1) The authors show that the estimates of within-family MR are imprecise. It would be helpful to know how much power they have for estimating effect sizes reported previously given their sample size.

      As discussed in response to a comment from reviewer 2, the power of our results is already indicated by our standard errors and confidence intervals. Nevertheless, we conducted simulations to estimate the size of effects which we had 80% power to detect. Results, presented below, are consistent with our main results. As discussed in response to a comment from reviewer 2, we consider post-hoc power calculations redundant when standard errors and confidence intervals are reported; for this reason, we include this information in the response to reviewers but not the manuscript itself.

      2) They used the correlation between PGS and BMI to support the assertion that the former is a strong instrument. Were the reported correlations calculated across all individuals? Since we know that stratification, assortative mating, and indirect effects can inflate these correlations, perhaps a more unbiased estimate would be the proportion of children's BMI variance explained by their PGS conditioned on the parents' PGS. This should also be the estimate used in power calculations.

      The manuscript has been updated to quote Sanderson-Windmeijer conditional R2 values: the proportion of BMI variance explained by the BMI PGS for each member of a trio, conditional on the PGS of the other members of the trio, and all genetic covariates included in within-family models. Similarly, we now show Sanderson-Windmeijer conditional F-statistics for a model including the child, mother, and father’s BMI instrumented by the child, mother, and father’s PGS.

      3) In testing the association of mothers' and fathers' BMI with children's symptoms, the authors used a multivariable linear regression conditioning on the child's own BMI. Was the other parent's BMI (either by itself or using the polygenic score) included as a covariate in the multivariable and MR models? This was not entirely clear from the text or from Fig. 2. I suspect that if there were assortative mating on BMI in the parent's generation, the effect of any one parent's BMI on the child's symptoms might be inflated unless the other parent's BMI was included as a covariate (assuming both mother's and father's BMI affect the child's symptoms).

      Non-genetic models include both the mother and father’s phenotypic BMI as well as the child’s, allowing estimation of conditional effects of all three. This controls for assortative mating as noted by the reviewer. This was not previously clear - all relevant text and figure captions have been updated to clarify this.

      4) They report no evidence of cross-trait assortative mating in the parents generation. The power to detect cross-trait assortative mating in the parents' generation using PGS would depend on the actual strength of assortative mating and the respective proportions of trait variance explained by PGS. Could the authors provide an estimate of the power for this test in their sample?

      We have updated the discussion of assortative mating (in both the results and the discussion section) to note possible limitations of power and clarify that that this approach to examining assortment may not capture its full extent.

      The relevant part of the results section now reads:

      “In the parents’ generation, phenotypes were associated within parental pairs, consistent with assortative mating on these traits (Appendix 1 – Table 5). Adjusted for ancestry and other genetic covariates, maternal and paternal BMI were positively associated (beta: 0.23, 95%CI: 0.22,0.25, p<0.001), as were maternal and paternal depressive symptoms (beta: 0.18, 95%CI: 0.16,0.20, p<0.001), and maternal and paternal ADHD symptoms (beta: 0.11, 95%CI: 0.09,0.13, p<0.001). Consistent with cross-trait assortative mating, there was an association of mother’s BMI with father’s ADHD symptoms (beta: 0.03, 95%CI: 0.02,0.05, p<0.001) and mother’s ADHD symptoms with father’s depressive symptoms (beta: 0.05,95%CI: 0.05,0.06, p<0.001). Phenotypic associations can reflect the influence of one partner on another as well as selection into partnerships, but regression models of paternal polygenic scores on maternal polygenic scores also pointed to a degree of assortative mating. Adjusted for ancestry and genotyping covariates, there were small associations between parents’ BMI polygenic scores (beta: 0.01, 95%CI: 0.00,0.02, p=0.02 for the adult BMI PGS, and beta: 0.01, 95%CI: 0.00,0.02, p=0.008 for the childhood body size PGS), and of the mother’s childhood body size PGS with the father’s ADHD PGS (beta: 0.01, 95%CI: 0.00,0.02, p=0.03). We did not detect associations with pairs of other polygenic scores, which may be due to insufficient statistical power.”

      And the relevant part of the discussion section now reads:

      “We found some genomic evidence of assortative mating for BMI, and cross-trait assortative mating between BMI and ADHD, but not between other traits. However, associations between polygenic scores, which only capture some of the genetic variation associated with these phenotypes, may not capture the full extent of genetic assortment on these traits.”

      5) Are the actual phenotypes (BMI, depression or ADHD) correlated between the parents? If so, would this not suffice as evidence of cross-trait assortative mating? It is known that the genetic correlation between parents as a result of assortative mating is a function of the correlation in their phenotypes and the heritabilities underlying the two traits (e.g., see Yengo and Visscher 2018). An alternative way to estimate the genetic correlation between parents without using PGS (which is noisy and therefore underpowered) would be to use the phenotypic correlation and heritability estimated using GREML or LDSC. Perhaps this is outside the scope of the paper but I would like to hear the author's thoughts on this.

      Associations between maternal and paternal phenotypes are consistent with a degree of assortative mating (shown below). These results have added to Appendix 1 - Table 5, which also shows associations between maternal and paternal polygenic scores, and methods and results updated accordingly (see quoted text in response to the comment above). For comparability, both sets of results are based on regression models adjusting for the mother’s and father’s ancestry PCs and genotyping covariates. We agree that analysis of assortative mating using GREML or LDSC is out of scope for this paper. As noted above, we have updated the discussion to acknowledge the limitations of the approach taken:

      ‘We found some genomic evidence of assortative mating for BMI, and cross-trait assortative mating between BMI and ADHD, but not between other traits. However, associations between polygenic scores, which only capture some of the genetic variation associated with these phenotypes, may not capture the full extent of genetic assortment on these traits.’

      6) It would be helpful to include power calculations for the MR-Egger intercept estimates.

      As with our response to the comments above, post-hoc power calculations are redundant, as all the information about the power of our analysis, including the MR-Egger is indicated by the standard errors and confidence intervals. MR-Egger is less precise than other estimators, as is made clear from the wide confidence intervals reported in the relevant tables (Appendix 1 - Tables 8 and 9). However, we have now updated the discussion to give more weight to this as a limitation. The discussion of pleiotropy in the final paragraph of the discussion now reads:

      ‘While robustness checks found little evidence of pleiotropy, these methods rely on assumptions. Moreover, MR-Egger is known to give imprecise estimates (Burgess and Thompson 2017), and confidence intervals from MR-Egger models were wide. Thus, pleiotropy cannot be ruled out.’

      Similarly, we have updated the relevant line of the results section, which now reads:

      ‘MR-Egger models found little evidence of horizontal pleiotropy, although MR-Egger estimates were imprecise (Appendix 1 - Tables 8 and 9).’

      7) Finally, what is the correlation between PGS and genetic PCs/geography in their sample? A correlation might provide evidence to support the point that classic MR effects are inflated due to stratification.

      Figures presenting the association of the child’s BMI polygenic scores and their PCs have been added to the supplementary information as Appendix 1 - Figure 2 and Appendix 1 - Figure 3. Consistent with an influence of residual stratification, a regression of the child’s BMI polygenic scores against their ancestry PCs (adjusting for genotyping centre and chip) found that 7 of the 20 PCs were associated at p<0.05 with the adult BMI PGS, and 8 of 20 with the childhood body size PGS (under the null hypothesis, we would expect one association in each case). When parental polygenic scores were added to the models, these associations attenuated towards to null.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript shows that bone is resorbed during the early steps of limb regeneration in urodeles, and osteoclasts are required for this process. In case of impaired resorption, integration of newly-formed tissue with the original bone shaft is compromised. The manuscript further shows that wound epithelium is required for bone resorption and suggests that it induces osteoclastogenesis or migration of osteoclasts. Furthermore, the authors showed that the formation of novel skeletal elements is initiated while the resorption of the old one is still actively ongoing.

      The study is well designed, conclusions are relatively well supported, and data are presented in a clear way. Two new models of transgenic axolotls have been created. The strongest and most important finding is that partial bone resorption is required for tissue reintegration. My main concern is the novelty of this study, which is quite limited in my opinion.

      Specifically, resorption of bone stump during limb regeneration has been shown before in various model organisms.

      The role of osteoclasts in this process has not been well characterized in urodeles but has been shown during the regeneration of a mouse digit.

      It is reasonable to anticipate that similarly, osteoclasts are resorbing bone in salamanders, especially since this is the only cell type known for bone resorption.

      Thus, this observation, despite being nicely and thoroughly done, is of limited interest.

      The role of wound epithelium in bone histolysis is well demonstrated via skin flap experiments in this manuscript. However, upon skin flap surgery no limb regeneration occurs, implying wound epithelium is a key tissue triggering all the processes of limb regeneration. Accordingly, the absence of bone histolysis in such conditions can be secondary to the absence of any other part of the regenerative process, e.g., blastema formation, macrophage M1 to M2 transition, reinnervation, etc. The proposed link between wound epithelium and osteoclastogenesis (i.e., Sphk1, Ccl4, Mdka) is very superficial and very suggestive.

      No functional evidence was provided to confirm these connections. Finally, the authors showed that new bone formation occurs while resorption of the bone stump is still ongoing. This is a nice observation, but again, rather indirect as it is based on the dynamics of bone resorption and bone formation in different animals. Due to high variability among animals, direct evidence, like double staining for osteoclasts and blastema markers would address this point more precisely.

      We consider that our work provides evidence, for the first time, that skeletal resorption in early stages of regeneration has a durable impact by affecting tissue integration. We show that this process occurs in a short and conserved time, which provides a window of interest for comparative research with other models, and interventional therapies. To our knowledge, limb regeneration is studied mainly in amphibians, as they are the only established lab model with this ability. Some lizards, geckos and possibly iguanas, have been reported to regrow an appendage albeit lacking the regenerative fidelity amphibians have. In an established regeneration lab model, such as the axolotl, the study of regeneration-induced resorption has been scarce.

      During murine digit tip, osteoclasts are recruited to the amputation site and resorb the bone in a similar time frame as we show here in the axolotl. Ablating osteoclasts delays the regeneration time, however, no study has been conducted on the impact of tissue integration. Additionally, a key difference between mouse digit and adult axolotl limb regeneration is that the new skeletal elements are built fundamentally different: direct ossification (bone on top of bone) in mouse, versus endochondral ossification (cartilage on top of osteo-cartilage elements) in the axolotl limb. The tissue integration of the latter may present different challenges worth exploring to understand its regulation. What this work adds, is a characterization of the temporal and cellular dynamic of regeneration-induced resorption, the interaction of osteoclasts with skeletal cells and lastly, the impact on tissue integration.

      Based on previous studies in mammals, it is reasonable to anticipate the presence and role of osteoclasts in salamanders. However, the growing body of work in the field, as well as our own work in the axolotl, have shown that extrapolations of mammalian skeletal biology to other species come with their risks.

      We agree that the role of the wound epithelium (WE) in skeletal histolysis will require further and extensive work. The evidence shown here, provides a glimpse of the complex response and crosstalk of the WE with the tissue underneath, and we hypothesize this response is tailored to the tissue composition exposed during the injury.

      Finally, following the reviewer’s advice, we have conducted new experiments to prove the temporal connection between skeletal resorption and regeneration, showing that these processes occur simultaneously.

      Reviewer #3 (Public Review):

      This study outlines the role of osteoclast-mediated resorption in integrating the skeletal elements during limb regeneration, using axolotls that can regenerate the entire limb upon amputation. Using calcium-binding vital dyes (calcein and alizarin red), the authors first demonstrated that a large portion of amputated skeletal elements is resorbed prior to blastema formation. They further show that 1) inhibiting bone resorption by zoledronic acid impairs proper integration of the pre-existing and regenerating skeletal elements, 2) removing the wound epithelium using the full skin flap surgery inhibits bone resorption, and 3) bone resorption and blastema formation are correlated. The authors reached the major conclusion that bone resorption is essential for successful skeletal regeneration. Notably, this study applies a well-established and elegant axolotl limb regeneration model and transgenic reporter strains to reveal the potential roles of resorption in limb regeneration.

      Strengths:

      1. The authors utilized a well-established axolotl limb regeneration model and applied elegant vital mineral dyes and transgenic reporter lines for sequential in vivo imaging. The authors also provided quantitative assessment by examining multiple animals, particularly in the early sections, ensuring the rigor and the reproducibility of the study.

      2. The authors further performed important interventions that can impinge upon successful limb regeneration, including inhibition of bone resorption by zoledronic acid and impairment of the wound epithelium by full skin flap surgery. These procedures gave rise to useful insights into the relationship between bone resorption and successful limb regeneration.

      3. The imaging presented in this manuscript is of exceptionally high quality.

      Weaknesses:

      1. Despite the high quality of the work, many analyses in this study are incomplete, making it insufficient to support the major conclusion. For example, in Figure 4, the authors did not provide any quantitative assessment to show how zol affects the integration of the skeletal elements (angulation?), which seems to be essential for supporting the conclusion. Likewise in Figure 7, the analyses of EdU+ cells and Sox9 reporter expression were not included in zol-treated animals. Similarly in Figure 5, quantification of osteoclasts was not performed with the full skin flap surgery group. Analyses of only normally regenerated animals are not sufficient to support many of the conclusions.

      2. The phenotype of zol-treated animals in limb regeneration is somewhat disappointing. Although zol-treated animals show decreased blastema formation and unresorbed pre-existing skeletal elements, limb regeneration still occurs and the only phenotype is a relatively minor defect in skeletal integration. It is possible that zol-induced defect in blastema formation is not directly linked to the failure of integration at a later stage. I find this “weakness” a bit subjective.

      3. As an integration failure of the newly formed skeleton still occurs in untreated animals, it is not entirely clear how the authors can attribute this defect to a lack of bone resorption. More quantitative analyses would be necessary to demonstrate the correlation between zol treatment and lack of integration.

      Taking into consideration the reviewer’s concerns, we have improved our analysis of integration phenotype. The assessment of integration success was carried out using a score matrix and with it, we correlated the extent of resorption with integration efficiency more accurately. We believe our results provide sufficient evidence to support this correlation.

      When we first saw the phenotype of zol-treated animals, we were far from disappointed, we were actually intrigued that we could observe a significant failure in tissue integration after removing the function of osteoclasts in an early phase of regeneration. All or nothing results are exciting, subtle results on the other hand, could prove more informative, and we think this is the case here. Our treatment does not inhibit regeneration, but disrupts tissue integration, opening another fascinating aspect of regeneration: how old tissue is capable of functionally integrate newly-formed tissue?

      The integration phenotypes observed in the un-resorbed limbs does not resemble anything reported in the field so far. Moreover, the range of phenotypes observed led us to better determine its correlation with resorption. Importantly, the presence of integration failures in untreated animals allowed us to look into ECM organization at this old-new tissue interphase, while highlighting the normal occurrence of imperfect regeneration in the axolotl limb.

      Finally, we have included new results to complement the conclusions presented at the end of our work. Albeit we observed differences in blastema size in zol-treated animals, we did not observe difference in the amount of EdU+ cells, which reveals that the skeleton cannot be used as a reference for assessing blastema location. This conclusion is complemented with our in vivo assays in which we observed condensation of cartilage despite resorption still occurring. We consider our conclusions to be justified and supported by the assays presented in our work.

    1. Author Response

      Reviewer #1 (Public Review):

      Khan et al describe how two important transcription factors functionally cooperate to activate a few of the CRP-dependent genes in Mycobacterium tuberculosis. CRP is a global regulator in eubacteria needed to activate a number of genes while PhoP is an acid stress response regulator required for expression specific set of genes. The authors delineate the interaction between these two key regulators of the bacterial pathogen and show that in a subset of CRP-dependent promoters, PhoP binding recruits CRP to activate transcription.

      The experiments are well designed and executed with a coalescent presentation of the manuscript. While the data is well organized and presented with clean images of phophorimages and blots to facilitate their easy understanding, interpretation could have been more robust (see comments below).

      We thank the reviewer for these extremely encouraging comments. We have now included substantial changes throughout the ‘Results’ section to improve interpretation of the results (please see below our responses).

      Obviously, the strength of the paper is the description of hitherto unknown stress-specific cooperation between two well-studied transcription factors with most evidence supporting the claims. In E. coli (and in other bacteria) studies CRP mediated control of genes have led to the identification of different classes of CRP-dependent promoters with their own specific regulators. Such a description was lacking in M. tuberculosis and the PhoP - CRP collaboration described is likely to have implications on pathogenesis. The weakness (or possibly what remains to be explored) is that the precise mechanism of the cooperative transcription regulation is yet to be understood.

      We agree with the reviewer’s comment that the precise mechanism of cooperative transcription regulation is yet to be fully understood. While we briefly mention it as the future scope of work in the concluding part of the ‘Discussion’ section, we have now included a new paragraph on the schematic model summarizing a possible mechanism of cooperative transcription regulation.

      From the data presented it is apparent that PhoP binds to whiB up promoter own efficiently. It is also evident that CRP is recruited to its site as a result of PhoP binding. This is reminiscent of the bacteriophage Lamba paradigm of positive cooperativity. Thus, it is not reciprocal synergy (as stated in the paper in one place). It is PhoP mediated recruitment as claimed elsewhere. Indeed, PhoP null mutants nicely support the latter interpretation

      The reviewer raises an important and interesting point on positive cooperativity resembling bacteriophage lambda paradigm. We agree. We have now modified text of the ‘Results’ section to establish clarity on this matter.

      A discussion on why and how CRP binds on its own in other CRP-dependent promoters would help better appreciate the need for PhoP sites next to CRP sites for their cooperative interaction in these promoter subsets. CRP sites could be at a varied distance with respect to the promoter as seen in E. coli.

      Again, this is an interesting point. We thank the reviewer for bringing this point to our attention. As recommended by the reviewer, we have now included the following text in the ‘Discussion’ section of the revised manuscript.

      “Notably, the subset of genes which undergo differential expression in Δcrp-H37Rv conforms a pattern largely resembling canonical CRP regulon of E. coli with CRP binding sites either proximal to transcription start sites, leading to repression or distal to transcription start sites, leading to promoter activation, respectively (Kahramanoglou et al., 2014). It is noteworthy that CRP has been suggested to function as a general chromosomal organizer (Grainger et al., 2005). In this study, we uncover that strikingly PhoP binding sites are present next to CRP binding sites, located only distal upstream of promoters, and therefore, associated with activation. We propose that in case of these co-regulated promoters, the additional stability of the transcription initiation complex is derived from protein-protein interaction between CRP and PhoP. These two interacting proteins remain bound to their cognate sites away from the start site, and contribute to stability of the transcription initiation complex, providing access for mycobacterial RNA polymerase (RNAP) to bind and transcribe genes. A schematic model is shown in Fig. 6C. Together, these molecular events mitigate stress by controlling expression of numerous genes and perhaps contribute to better survival of the bacilli in cellular and animal models.”

      Reviewer #2 (Public Review):

      In this manuscript by Khan et al., the authors set out to characterize how the cAMP receptor protein, CRP, and PhoP function to coregulate a subset of virulence genes in Mycobacterium tuberculosis. To this end, the authors use a wide variety of molecular techniques to monitor gene regulation, DNA-binding activity, and protein-protein interactions between phosphorylated PhoP and CRP. The authors conclude that phosphorylated PhoP functions to recruit CRP to promoter regions, where together the two regulators function synergistically to control gene expression. In general, the conclusions of the manuscript appear to be justified by the data, however, the text is difficult to follow. The current version of the paper is likely of interest to scientists within the field of mycobacterial signal transduction.

      The major strength of the paper is that the authors test their hypothesis using a variety of complementary approaches. The authors demonstrate a genetic interaction between CRP and PhoP in vivo and reconstitute the phenomenon in vitro, providing compelling evidence that the coregulation by these well-studied regulators does take place. The major weakness is that the logic of the manuscript is difficult to follow as a reader, at times making an evaluation of results and interpretations difficult. The majority of the experimentation involves the whiB1 promoter while conclusions are extrapolated broadly.

      We would like to thank the reviewer for her/his constructive comments and suggestions. In the revised manuscript, we have now included numerous changes throughout the ‘Results’ and ‘Discussion’ sections to improve logic of the manuscript and interpretation of the results (please see below our responses). Also, we have included experiments as requested by the reviewers and provided additional data and explanations that address their concerns.

    1. Author Response

      Reviewer #3 (Public Review):

      1) Information is missing about the regions of interest in which calcium responses were measured. Judging from Fig. 1E, calcium signals were measured in the somata, and this should be specified. Also judging from this figure, calcium signals seem to be largely confined to the somata and virtually absent from dendritic arbors. Fig. 6a shows very faint signals in the dendrites, yet those signals seem to have been measured rather far from the point of force application (a scale bar is shown but undefined), and, for some unknown reason, not between soma and force application point). Should there be detectable calcium signals in the somata, respective image gains should be adjusted so that those signals can be appreciated by the reader. If there are no clear signals in the dendrites, this would affect interpretations concerning e.g. Ca-α1D.

      Calcium responses can be observed in the soma and dendrites, which was presented in the original manuscript (Figure 6). Inspired by the 2nd suggestion from this reviewer, we went through our data and refined our measurement of the dendritic signal in the revised manuscript (see revised Figure 6). In addition, we also showed that the dendritic response was dependent on Ca-α1D (see revised Figure 6 and Figure 6-figure supplement 1). Finally, in the revised manuscript, we made it clear that all F/F0 were measured from the soma unless otherwise stated (see Figure 2, legend).

      2) Along this line, analyzing also the spacial distribution of dendritic calcium responses to the pokes would provide a much more detailed picture about how the dendritic tree responds to the various pokes. The beauty of the imaging approach chosen here is that it provides such information. Rather than ignoring this possibility, it should be exploited in this study, especially as respective data might provide much deeper insights into the relation between the mechanosensory function of the cell and its dendritic tree (and bolster the modelling results in Fig. 4 experimentally).

      In the original manuscript, we included the data on the dendritic calcium signal and showed that the dendritic signal was reduced when the activity of VGCCs were inhibited or in the Ca-α1D knockdown mutant (see Fig. 6 A-B in the original manuscript). Inspired by the suggestion from the reviewers, we had a closer look at our data and performed additional experiments. In the revised Figure 6 A-B, we showed that the mechanical stimuli could evoke calcium responses not only in the soma, but also in the homolateral (i.e. between the soma and the force probe) and contralateral (i.e. opposite side of the force probe) dendrites, suggesting that the dendritic signals are propagating within the dendritic arbors. Moreover, in the revised Figure 6 A-B and Figure 6-figure supplement 1, we showed that these dendritic signals were reduced in the mutant strains of Ca-α1D or if the fillet preparation was treated with nimodipine, demonstrating a clear dependence on the activity of VGCCs. However, because our imaging speed is not fast enough to capture the dendritic flow of calcium signals, the dynamics of signal propagation remains undefined. This would be an interesting issue to study in the future. Along with the revised Figure 6, we also revised the text and legends accordingly.

      3) When showing response functions as in e.g. Figs. 2C, G, H, 3D, 5C-E, etc., the y-axis should have a logarithmic scaling; receptor potentials of receptor cells usually scale proportionally to the logarithm of the stimulus amplitude. Only then, the reader will be able to fully appreciate the sensitivity differences. This will also alter interpretation of response function slopes.

      We thank the reviewer for the suggestion. However, the stimulation force is actually a distal stimulus for the cell, while the proximal stimuli (e.g. local deformation) are difficult to measure/estimate. Therefore, we are not sure if the cellular responses scale necessarily to the logarithm of macroscopic forces (i.e. the distal stimuli). However, simply by looking at the data, we found that the response is proportional to the force and for conciseness, and thus we fitted the plot using a linear function.

      4) The knockdown and mutant data is interesting, yet important controls are missing. For the RNAi lines used, qPCR data on the knockdown-efficiency should be added. For the channel mutations, available genetic rescue lines should be used as controls. Data on protein localization is presented for the mechanosensitive channels, but not for voltage-gated calcium channel subunit. Should antibodies be available, respective stainings should be included. If not, the authors should at least check whether Ca-α1D is expressed in the cell using e.g. Mi{ET1}Ca-α1D[MB06807] that is available at Bloomington.

      First, we did not use RNAi mutant for Piezo. The PiezoKO line is a genomic mutant strain.

      Second, for Ca-α1D, because there are only a small number of c4da in each animal and Ca-α1D has a quite broad expression in various types of neurons (see our revised Figure 6-figure supplement 2), we expected that the reduction in the expression level of Ca-α1D in c4da would be very difficult to detect. Therefore, we knocked down the expression of Ca-α1D in the whole animal using the same uas-Ca-α1Di strain and the tub-gal4 strain. Using RT-PCR, we showed that the expression level of Ca-α1D was significantly reduced (revised Figure 6-figure supplement 2). In fact, the same RNAi strain was also used in other functional studies.

      5) The statistics used is not entirely convincing. T-test are used throughout, though I do not feel that all the data is distributed normally. Moreover, some figures include multiple comparisons, apparently without statistical correction. The data should be re-analyzed using appropriate statistical procedures.

      We thank the reviewer for this suggestion. We have now used Mann-Whitney U test or Kruskal Wallis test for all the data that were not proven to follow a normal distribution. For multiple comparisons, we used One-way ANOVA. We have now included the relevant information in the revised figure legends.

    1. Author Response

      Reviewer #3 (Public Review):

      1) Validation of reagents: The authors generated a pY1230 Afadin antibody claiming that (page 6) "this new antibody is specific to tyrosine phosphorylated Afadin, and that pY1230 is targeted for dephosphorylation by PTPRK, in a D2-domain dependent manner". The WB in Fig 1B shows a lot of background, two main bands are visible which both diminish in intensity in ICT WT pervanadate-treated MCF10A cell lysates. The claim that the developed peptide antibody is selective for pY1230 in Afadin would need to be substantiated, for instance by pull down studies analysed by pY-MS to substantiate a claim of antibody specificity for this site. However, for the current study it would be sufficient to demonstrate that pY1230 is indeed the dephosphorylated site. I suggest therefore including a site directed mutant (Y1230F) that would confirm dephosphorylation at this site and the ability of the antibody recognizing the phosphorylation state at this position.

      We would like this antibody to be a useful and freely accessible tool in the field and have taken on board the request for additional validation. To this end we have significantly expanded Supplementary Figure 2 (now Figure 1 - figure supplement 2) and included a dedicated section of the results as follows: 1. We have now included information about all of the Afadin antibodies used in this study, since Afadin(BD) appears to be sensitive to phosphorylation (Figure 1 - figure supplement 2A). 2. We have demonstrated that the Afadin pY1230 antibody detects an upregulated band in PTPRK KO MCF10A cells, consistent with our previous tyrosine phosphoproteomics (Figure 1 - figure supplement 2B). This indicates that the antibody can be used to detect endogenous Afadin phosphorylation. 3. We have included two new knock down experiments demonstrating the recognition of Afadin by our antibody (Figure 1 - figure supplement 2C). There appear to be two Afadin isoforms recognised in HEK293T cells by both the BD and pY1230 antibody, consistent with previous reports (Umeda et al. MBoC, 2015). We have highlighted these in the figure. 4. We have performed mutagenesis to demonstrate the specificity of the antibody. We tagged Afadin with a fluorescent protein tag, reasoning that it would cause a shift in molecular weight that could be resolved by SDS PAGE, as is the case. We noted that the phosphopeptide used spans an additional tyrosine, Y1226, which has been detected as phosphorylated (although to a much lower extent than Y1230) on Phosphosite plus. The data clearly show that Afadin cannot be phosphorylated when Y1230 is mutated to a phenylalanine (compared to CIP control), indicating that this is the predominant site recognised by the antibody. In addition, the endogenous pervanadate-stimulated signal is completely abolished by CIP treatment (Figure 1 - figure supplement 2D). 5. We have included densitometric quantification of the dephosphorylation assay shown in Figure 1B, which was part of a time course and shows preferential dephosphorylation by the PTPRK ICD compared to the PTPRK D1. The signal stops declining with time, which could indicate antibody background, or an inaccessible pool of Afadin-pY1230 (Figure 1 - figure supplement 2E). 6. To further demonstrate that this site is modulated by PTPRK in post-confluent cells, we have used doxycycline (dox)-inducible cell lines generated in Fearnley et al, 2019. Upon treatment with 500 ng/ml Dox for 48 hours PTPRK is induced to lower levels than wildtype, however, normalized quantification of the Afadin pY1230 against the Afadin (CST) signal clearly indicates downregulation by PTPRK WT, but not the catalytically inactive mutant (Figure 1 - figure supplement 2F and 2G). Together these data strengthen our assertion that this antibody recognises endogenously phosphorylated Afadin at site Y1230, which is modulated in vitro and in cells by PTPRK phosphatase activity. For clarity, we have highlighted and annotated the relevant bands in figures. We have also included identifiers for each Afadin total antibody was used in particular experiments.

      2) The authors claim that a short, 63-residue predicted coiled coil (CC) region, is both necessary and sufficient for binding to the PTPRK-ICD. The region is predicted to have alpha-helical structure and as a consequence, a helical structure has been used in the docking model. Considering that the authors recombinantly expressed this region in bacteria, it would be experimentally simple confirming the alpha-helical structure of the segment by CD or NMR spectroscopy.

      To clarify, the helical structure in the docking model was independently predicted by several sequence and structural analysis programmes including AlphaFold2, RobettaFold, NetSurfP and as annotated in Uniprot (as a coiled coil). We did not stipulate prior to the AF2 prediction that it was helical. Isolated short peptides frequently adopt helical structure, therefore prediction of a helix within the context of the full Afadin sequence is, in our opinion, stronger evidence than CD of an isolated fragment.

      3) Only two mutants have been introduced into PTPRK-ICD to map the Afadin interaction site. One of the mutations changes a possibly structurally important residues (glycine) into a histidine. Even though this residue is present in PTPRM, it does not exclude that the D2 domain no longer functionally folds. Also the second mutation represents a large change in chemical properties and the other 2 predicted residues have not been investigated.

      The residues that were selected for mutation are all localised to the protein surface and therefore are unlikely to be involved in stable folding of PTPRK. In support of the correct folding of the mutated PTPRK, we include in Figure 1 below SEC elution traces for wild-type and mutant D2 showing that they elute as single symmetric peaks at the same elution volume as the WT protein. This is consistent with them having a similar shape and size, and not being aggregated or unfolded.

      Figure 1. PTPRK-D2 wild-type and mutant preparative SEC elution profiles. A280nm has been normalised to help illustrate that the different proteins elute at the same volume. The main peak from these samples was used for binding assays in the main paper.

      Furthermore, the yield for the double mutant was very high (4 mg of pure protein from a 2 L culture, see A280 value in graph below), whereas poorly folded proteins tend to have significantly reduced yields. This protein was also very stable over time whereas unfolded proteins tend to degrade during or following purification.

      Figure 2. Analytical SEC elution profile for the PTPRK-D2 DM construct showing the very high yield consistent with a well-folded, stable protein.

      Finally, we have carried out thermal melt curves of the WT and mutant PTPRK D2 domains showing that they all possess melting temperatures between 39.3°C and 41.7°C, supporting that they are all equivalently folded. We include these data as an additional Supplementary Figure (Figure 4 - figure supplement 3) in the paper.

      4) The interface on the Afadin substrate has not been investigated apart from deleting the entire CC or a central charge cluster. Based on the docking model the authors must have identified key positions of this interaction that could be mutated to confirm the proposed interaction site.

      We have now made and tested several additional mutations within both the Afadin-CC and PTPRK-D2 domains to further validate the AF2 predicted model of the complex.

      For Afadin-CC we introduced several single and double mutations along the helix including residues predicted to be in the interface and residues distal from the interface. These mutations and the pulldown with PTPRK are described in the text and are included as additional panels to a modified Figure 3. All mutations have the expected effect on the interaction based on the predicted complex structure. To help illustrate the positions of these mutations we have also included a figure of the interface with the residues highlighted.

      For the PTPRK-D2 we have also introduced two new mutations, one buried in the interface (F1225A) and one on the edge of the interface encompassing a loop that is different in PTPRM (labelled the M-loop). GST-Afadin WT protein was bound to GSH beads and tested for their ability to pulldown WT and mutated PTPRK. These new mutations (illustrated in the new Figure 4 – figure supplement 2) further support the model prediction. F1225A almost completely abolishes binding as predicted, while the M-loop retains binding. These mutations and their effects are now described in the main text and the pull-down data, including controls and retesting of the original DM mutant, are included as panel H in a newly modified Figure 4 focussed solely on the PTPRK interface.

      5) A minor point is that ITC experiments have not been run long enough to determine the baseline of interaction heats. In addition, as large and polar proteins were used in this experiment, a blank titration would be required to rule out that dilution heats effect the determined affinities.

      All control experiments including buffer into buffer, Afadin into buffer and buffer into PTPRK were carried out at the same time as the main binding experiment and are shown below overlaid with the binding curve. These demonstrate the very small dilution heats consistent with excellent buffer matching of the samples.

      We were able to obtain excellent fits to the titration curves by fitting 1:1 binding with a calculated linear baseline (see Figure 2B,D). Very similar results were obtained by fitting to the sum (‘composite’) of fitted linear baselines obtained for the three control experiments for each titration.

    1. Author Response

      Public Evaluation Summary:

      This work presents a series of enhancements to the PhIP-seq method of autoantibody discovery, with the goal of improving scaling to larger cohorts and increasing disease specificity. The strength of the paper is the validation of the high throughput format, although results from screening patient samples confirm or only modestly extend previous data.

      We thank the reviewers for their feedback and agree that the validation of our high throughput, easily accessible approach is a strength of this work. We appreciate that the reviewers expressed uncertainty about whether there were sufficient advances to qualify this paper as a Research Advance. In addition to a point-by-point rebuttal, we quantify and enumerate the advances, improvements, and novel findings disclosed in this manuscript, relative to our original eLife paper.

      1. Demonstration of the importance of adequate healthy control cohorts in PhIP-Seq design. Using scaled protocols, we demonstrate the importance of using large control cohorts to filter out non-specific hits, as well as to detect rare but specific disease-associated antigens such as PDYN. To our knowledge, we are the first to demonstrate and discuss the consequences of PhIP-Seq dataset interpretation in the absence of sufficient controls. These findings are especially important in light of recent, high-impact papers using few to no controls (Mina et al. Science 2019, Gruber et al. Cell 2020, among others) to make conclusions about novel autoantibodies in the context of specific diseases.

      2. Design, validation and documentation of accessible, benchtop protocols for scaled PhIP-Seq. These protocols enable parallel testing of 600-800 samples without contamination or batch effects. Using a substantially expanded, multi-cohort set of patients with APS1, we validate the quality of the protocol and apply this protocol to numerous other disease contexts. Importantly, our protocols are documented (protocols.io) with each step tested for optimal quality, and are easily accessible without the need for robotics or specialized equipment.

      3. Machine Learning for disease classification using phage-based immunoprofiling. We show that large, well-controlled PhIP-Seq datasets lend well to machine learning approaches and enable unsupervised classification of disease status. To our knowledge, this is the first successful application of an unsupervised machine learning approach to phage-based immunoprofiling data. We demonstrate that PhIP-Seq data enables APS1 disease classification in 97% of cases (compare even to the 95% sensitivity seen in current testing for anti-IFN antibodies in the setting of suspected APS1). This finding, while applied to only one large cohort, demonstrates that PhIP-Seq data, when appropriately controlled, can have substantial value outside of simply a single-antigen discovery platform. The combination of machine learning and phage-based immunoprofiling will likely have extensive applications beyond APS1 including the discovery of novel diagnostic tests and biomarkers.

      4. Novel IPEX antigen BTNL8. We discovered and validated anti-BTNL8 antibodies in 42% of IPEX patients, suggesting that this may be a major autoantigen in IPEX. BTNL8 is a cell surface-expressed protein in intestinal gamma-delta T-cells, raising the novel question of a possible role for autoantibodies in directly regulating gut epithelial immune homeostasis (see discussion, lines 540-551). This is the first report, not only of BTNL8, but of any antigen discovery by PhIP-seq immunoprofiling in IPEX patients. Given the importance of this discovery, we sought to validate the presence of these autoantibodies in an additional validation cohort. We were successful, and present these findings in the new Figure 5., highlighting the generalizability of our findings to IPEX patients.

      5. BEST4 autoantibodies in IPEX and RAG-hypomorphic patients. We discovered anti-BEST4 antibodies in 15% of patients with IPEX, as well as in 2 patients with RAG1/2 mutations, demonstrating a connection between the intestinal autoimmunity seen in both IPEX and RAG1/2 deficiency. Of note, one of the 2 positive RAG1/2 deficient patients with anti-BEST4 antibodies is known to have very-early-onset IBD (VEO-IBD), a rare sub-phenotype in RAG-hypomorphs (and other primary immune deficiencies). Given the severity of VEO-IBD and how little is known about why certain patients with immune dysregulation develop this phenotype, these findings mark an important scientific advance and provide an essential clue into etiology. Furthermore, given that IPEX is driven by dysfunctional Treg cells, the commonality of these findings in both IPEX and hypomorphic RAG indicate a potential role for Treg dysfunction in hypomorphic RAG.

      6. Expansion of scaled PhIP-Seq to interrogate severe COVID-19 pneumonia, Kawasaki disease (KD), and Multisystem Inflammatory Syndrome in Children (MIS-C). Importantly, in MIS-C we find no evidence for any of the previously reported autoantigens described in Gruber at al (Cell, 2020) – a study which made strong conclusions about autoantibodies despite featuring only 4 PhIP-Seq control samples. Our results highlight the importance of scaling and appropriate control groups, and caution against overinterpretation of reported disease-specific autoantigens in PhIP-Seq (or other expanded antigen screening technologies such as near-proteome wide fixed protein arrays) which utilize smaller control cohorts, often without orthogonal validation experiments.

      7. Anti-CGNL1 antibodies in KD/MIS-C. We discovered and validated autoantibodies to CGNL1 in KD and MIS-C. It is possible that these antibodies represent a subset of specificities within anti-endothelial cell antibodies, given the endothelial expression of CGNL1 as well as its implications in cardiovascular disease.

      Reviewer #2 (Public Review):

      The authors update PhIP-seq into a high throughput format with the goal to accommodate screening of large numbers of human patient sera for the presence of novel autoantibodies and screening of more control sera to better determine standards for positivity of experimental samples. The high throughput protocol is detailed in an associated web-based format and validated in the paper using sera from patients with inherited immunodeficiencies and patients with MIS-C, Kawasaki syndrome, and COVID19. These are strengths of the work, and the high throughput PhIP-seq format will be useful to other investigators doing similar screenings. Yet, the findings do not significantly extend our knowledge of the range of autoantibodies in these illnesses, and many of the autoantibodies detected using PhIP-seq linear epitopes are not validated with other strategies, limiting significance of the results. The data from MIS-C and Kawasaki cohorts are confounded by an undetermined number of IVIG treated subjects, and limited numbers of control samples, including sera from patients with febrile illnesses that contain autoantibodies that are not discussed in the context of findings from the experimental groups.

      In summary, the paper is solid technically, with the high throughput strategy seemingly well validated; however, the advance here is primarily a technical one.

      We thank the reviewer and agree that the technical advance here is substantial and will be of value to other investigators doing similar screenings – as well as to investigators who previously did not have access to this technology due to high requirements for robotics and specialized equipment in previous iterations of the protocol. As such, we feel that this, combined with the demonstration of how to appropriately control PhIP-Seq experiments, should be considered a valuable research advance alone -- even in the absence of the extensive validation and novel findings on 5 additional disease contexts, summarized in greater detail above.

      IVIG status is discussed in lines 417-423. Briefly, the large majority of MISC samples are confirmed to be IVIG free at the time of blood draw. All of our KD samples are confirmed IVIG-free.

      While pediatric febrile illness samples could conceivably contain autoantibodies, we believe that this is best group for comparison given that these samples are taken from age-matched, acutely ill patients, thus providing a control group that is as clinically similar to MIS-C as possible. In addition, we included adult healthy sera and adult COVID19 sera as secondary control groups. Of note, this matching is much more extensive (and substantially larger in number) than the recent study in Cell (Gruber at el 2020), which for PhIP-Seq used only 4 healthy, COVID19-negative samples to compare to 9 MISC samples.

      Reviewer #3 (Public Review):

      This paper presents a rigorously performed series of studies to improve the ability of the PhIP-seq method to discover autoantibodies against peptide antigens that span the whole peptidome at scale, and increase the ease of validation and definition of disease specificity. The paper is an extension of a recent paper from the DeRisi and Anderson groups done on APS1 patients, which defined and validated a novel series of tissue-specific autoantigens in APS1. The current studies show that the authors can find the antibodies they previously defined, and using larger numbers of disease and control samples, can expand some what they detect. They then use the new method to look at multiple additional processes in which autoimmunity has been demonstrated/postulated.

      The dataset may be of use to others interested in defining novel autoantibodies. The findings really did not share significant new insights into the processes they studied,. As the authors note, they were unable to detect the antibodies (~10% of patients) recognizing type I IFNs in severe COVID-19, where these had been demonstrated effectively using ELISA previously. Unlike APS1, where their findings about uncommon tissue specific autoantibody responses across a population with known genetic deficiency and heterogeneous phenotypes could really illustrate the power of the method and approach, that elegance and powerful and novel conclusion is not as evident here.

      The trade-off between sensitivity, specificity, and screening power of antigen discovery tools is present in every assay. We do not feel that the comparison of our assay to a single protein ELISA assay is appropriate (nor particularly relevant for the conclusions drawn in this manuscript) given the inherent difference in nature and goals of the two assays. It has long been understood that PhIP-Seq does not have sensitivity for all protein antigens, including post-translationally modified and conformational antigens, which we state for readers in lines 190-193, within the discussion section, as well as in our previous work.

    1. Author Response

      Reviewer #2 (Public Review):

      Silberberg et al. present a series of cryo-EM structures of the ATP dependent bacterial potassium importer KdpFABC, a protein that is inhibited by phosphorylation under high environmental K+ conditions. The aim of the study was to sample the protein's conformational landscape under active, non-phosphorylated and inhibited, phosphorylated (Ser162) conditions.

      Overall, the study presents 5 structures of phosphorylated wildtype protein (S162-P), 3 structures of phosphorylated 'dead' mutant (D307N, S162-P), and 2 structures of constitutively active, non-phosphorylatable protein (S162A).

      The true novelty and strength of this work is that 8 of the presented structures were obtained either under "turnover" or at least 'native' conditions without ATP, ie in the absence of any non-physiological substrate analogues or stabilising inhibitors. The remaining 2 were obtained in the presence of orthovanadate.

      Comparing the presented structures with previously published KdpFACB structures, there are 5 structural states that have not been reported before, namely an E1-P·ADP state, an E1-P tight state captured in the autoinhibited WT protein (with and without vanadate), and two different nucleotide-free 'apo' states and an E1·ATP early state.

      Of these new states, the 'tight' states are of particular interest, because they appear to be 'off-cycle', dead end states. A novelty lies in the finding that this tight conformation can exist both in nucleotide-free E1 (as seen in the published first KdpFABC crystal structure), and also in the phosphorylated E1-P intermediate.

      By EPR spectroscopy, the authors show that the nucleotide free 'tight' state readily converts into an active E1·ATP conformation when provided with nucleotide, leading to the conclusion that the E1-P·ADP state must be the true inhibitory species. This claim is supported by structural analysis supporting the hypothesis that the phosphorylation at Ser162 could stall the KdpB subunit in an E1P state unable to convert into E2P. This is further supported by the fact that the phosphorylated sample does not readily convert into an E2P state when exposed to vanadate, as would otherwise be expected.

      The structures are of medium resolution (3.1 - 7.4 Å), but the key sites of nucleotide binding and/or phosphorylation are reasonably well supported by the EM maps, with one exception: in the 'E1·ATP early' state determined under turnover conditions, I find the map for the gamma phosphate of ATP not overly convincing, leaving the question whether this could instead be a product-inhibited, Mg-ADP bound E1 state resulting from an accumulation of MgADP under the turnover conditions used. Overall, the manuscript is well written and carefully phrased, and it presents interesting novel findings, which expand our knowledge about the conformational landscape and regulatory mechanisms of the P-type ATPase family.

      We thank the reviewer for their comments and helpful insights. We have addressed the points as follows:

      However in my opinion there are the following weaknesses in the current version of the manuscript:

      1) A lack of quantification. The heart of this study is the comparison of the newly determined KdpFABC structures with previously published ones (of which there are already 10). Yet, there are no RMSD calculations to illustrate the magnitude of any structural deviations. Instead, the authors use phrases like 'similar but not identical to', 'has some similarities', 'virtually identical', 'significant differences'. This makes it very hard to appreciate the true level of novelty/deviation from known structures.

      This is a very valid point and we thank the reviewers for bringing it up. To provide a better overview and appreciation of conformational similarities and significant differences we have calculated RMSDs between all available structures of KdpFABC. They are summarised in the new Table 1 – Table Supplement 2. We have included individual rmsd values, whenever applicable and relevant, in the respective sections in the text and figures. We note that the RMSDs were calculated only between the cytosolic domains (KdpB N,A,P domains) after superimposition of the full-length protein on KdpA, which is rigid across all conformations of KdpFABC (see description in material and methods lines 1184-1191 or the caption to Table 1 – Table Supplement 2). We opted to not indicate the RMSD calculated between the full-length proteins, as the largest part of the complex does not undergo large structural changes (see Figure 1 – Figure Supplement 1, the transmembrane region of KdpB as well as KdpA, KdpC and KdpF show relatively small to no rearrangements compared to the cytosolic domains), and would otherwise obscure the relevant RMSD differences discussed here.

      Also the decrease in EPR peak height of the E1 apo tight state between phosphorylated and non-phosphorylated sample - a key piece of supporting data - is not quantified.

      EPR distance distributions have been quantified by fitting and integrating a gaussian distribution curve, and have been added to the corresponding results section (lines 523-542) and the methods section (lines 1230-1232).

      2) Perhaps as a consequence of the above, there seems to be a slight tendency towards overstatements regarding the novelty of the findings in the context of previous structural studies. The E1-P·ATP tight structure is extremely similar to the previously published crystal structure (5MRW), but it took me three reads through the paper and a structural superposition (overall RMSD less than 2Å), to realise that. While I do see that the existing differences, the two helix shifts in the P- and A- domains - are important and do probably permit the usage of the term 'novel conformation' (I don't think there is a clear consensus on what level of change defines a novel conformation), it could have been made more clear that the 'tight' arrangement of domains has actually been reported before, only it was not termed 'tight'.

      As indicated above we have now included an extensive RMSD table between all available KdpFABC structures. To ensure a meaningful comparison, the rmsd are only calculated between the cytosolic domains after superimposition of the full-length protein on KdpA, as the transmembrane region of KdpFABC is largely rigid (see figure below panel B). However, we have to note that in the X-ray structure the transmembrane region of KdpB is displaced relative to the rest of the complex when compared to the arrangement found in any of the other 18 cryo-EM structures, which all align well in the TMD (see figure below panel C). These deviations make the crystal structure somewhat of an outlier and might be a consequence of the crystal packing (see figure below panel A). For completeness in our comparison with the X-Ray structure, we have included an RMSD calculated when superimposed on KdpA and additional RMSD that was calculated between structures when aligned on the TMD of KdpB (see figure below panel D,E). The reported RMSD that the reviewer mentiones of less than 2Å was probably obtained when superimposing the entire complex on each other (see figure below panel F). However, we do not believe that this is a reasonable comparison as the TMD of the complex is significantly displaced, which stands in strong contrast to all other RMSDs calculated between the rest of the structures where the TMD aligns well (see figure below panel B).

      From the resulting comparisons, we conclude that the E1P-tight and the X-Ray structure do have a certain similarity but are not identical. In particular not in the relative orientation of the cytosolic domains to the rest of the complex. We hope that including the RMSD in the text and separately highlighting the important features of the E1P tight state in the section “E1P tight is the consequence of an impaired E1P/E2P transition“ makes the story now more conclusive.

      Likewise, the authors claim that they have covered the entire conformational cycle with their 10 structures, but this is actually not correct, as there is no representative of an E2 state or functional E1P state after ADP release.

      This is correct, and we have adjusted the phrasing to “close to the entire conformational cycle” or “the entire KdpFABC conformational cycle except the highly transient E1P state after ADP release and E2 state after dephosphorylation.”

      3) A key hypothesis this paper suggests is that KdpFABC cannot undergo the transition from E1P tight to E2P and hence gets stuck in this dead end 'off cycle' state. To test this, the authors analysed an S162-P sample supplied with the E2P inducing inhibitor orthovanadate and found about 11% of particles in an E2P conformation. This is rationalised as a residual fraction of unphosphorylated, non-inhibited, protein in the sample, but the sample is not actually tested for residual unphosphorylated fraction or residual activity. Instead, there is a reference to Sweet et al, 2020. So the claim that the 11% E2P particles in the vanadate sample are irrelevant, whereas the 14% E1P tight from the turnover dataset are of key importance, would strongly benefit from some additional validation.

      We have added an ATPase assay that shows the residual ATPase activity of WT KdpFABC compared to KdpFABS162AC, both purified from E. coli LB2003 cells, which is identical to the protein production and purification for the cryo-EM samples (see Figure 2-Suppl. Figure 5). The residual ATPase activity is ca. 14% of the uninhibited sample, which correlates with the E2-P fraction in the orthovanadate sample.

      Reviewer #3 (Public Review):

      The authors have determined a range of conformations of the high-affinity prokaryotic K+ uptake system KdpFABC, and demonstrate at least two novel states that shed further light on the structure and function of these elusive protein complexes.

      The manuscript is well-written and easy to follow. The introduction puts the work in a proper context and highlights gaps in the field. I am however missing an overview of the currently available structures/states of KdpFABC. This could also be implemented in Fig. 6 (highlighting new vs available data). This is also connected to one of my main remarks - the lack of comparisons and RMSD estimates to available structures. Similarity/resemblance to available structures is indicated several times throughout the manuscript, but this is not quantified or shown in detail, and hence it is difficult for the reader to grasp how unique or alike the structures are. Linked to this, I am somewhat surprised by the lack of considerable changes within the TM domain and the overlapping connectivity of the K indicated in Table 1 - Figure Supplement 1. According to Fig. 6 the uptake pathway should be open in early E1 states, but not in E2 states, contrasting to the Table 1 - Figure Supplement 1, which show connectivity in all structures? Furthermore, the release pathway (to the inside) should be open in the E2-P conformation, but no release pathway is shown as K ions in any of the structures in Table 1 - Figure Supplement 1. Overall, it seems as if rather small shifts in-between the shown structures (are the structures changing from closed to inward-open)? Or is it only KdpA that is shown?

      We thank the reviewer for their positive response and constructive criticisms. We have addressed these comments as follows:

      1. The overview of the available structures has been implemented in Fig. 6, with the new structures from this study highlighted in bold.

      2. RMSD values have been added to all comparisons, with a focus on the deviations of the cytosolic domains, which are most relevant to our conformational assignments and discussions.

      3. To highlight the (comparatively small) changes in the TMD, we have expanded Table 1 - Figure Supplement 1 to include panels showing the outward-open half-channel in the E1 states with a constriction at the KdpA/KdpB interface and the inward-open half-channel in the E2 states. The largest observable rearrangements do however take place in the cytosolic domains. This is an absolute agreement with previous studies, which focused more on the transition occurring within the transmembrane region during the transport cycle (Stock et al, Nature Communication 2018; Silberberg et al, Nature Communication 2021; Sweet et al., PNAS 2021).

      4. The ions observed in the intersubunit tunnel are all before the point at which the tunnel closes, explaining why there is no difference in this region between E1 and E2 structures. Moreover, as we discussed in our last publication (Silberberg, Corey, Hielkema et al., 2021, Nat. Comms.), the assignment of non-protein densities along the entire length of the tunnel is contentious and can only be certain in the selectivity filter of KdpA and the CBS of KdpB.

      5. The release pathway from the CBS does not feature any defined K+ coordination sites, so ions are not expected to stay bound along this inward-open half-channel.

      My second key remark concerns the "E1-P tight is the consequence of an impaired E1-P/E2-P transition" section, and the associated discussion, which is very interesting. I am not convinced though that the nucleotide and phosphate mimic-stabilized states (such as E1-P:ADP) represent the high-energy E1P state, as I believe is indicated in the text. Supportive of this, in SERCA, the shifts from the E1:ATP to the E1P:ADP structures are modest, while the following high-energy Ca-bound E1P and E2P states remain elusive (see Fig. 1 in PMID: 32219166, from 3N8G to 3BA6). Or maybe this is not what the authors claim, or the situation is different for KdpFABC? Associated, while I agree with the statement in rows 234-237 (that the authors likely have caught an off-cycle state), I wonder if the tight E1-P configuration could relate to the elusive high-energy states (although initially counter-intuitive as it has been caught in the structure)? The claims on rows 358-360 and 420-422 are not in conflict with such an idea, and the authors touch on this subject on rows 436-450. Can it be excluded that it is the proper elusive E1P state? If the state is related to the E1P conformation it may well have bearing also on other P-type ATPases and this could be expanded upon.

      This a good point, particularly since the E1P·ADP state is the most populated state in our sample, which is also counterintuitive to “high-energy unstable state”. One possible explanation is that this state already has some of the E1-P strains (which we can see in the clash of D307-P with D518/D522), but the ADP and its associated Mg2+ in particular help to stabilize this. Once ADP dissociates and takes the Mg2+ with it, the full destabilization takes effect in the actual high-energy E1P state. Nonetheless, we consider it fair to compare the E1P tight with the E1P·ADP to look for electrostatic relaxation. We have clarified the sequence of events and our hypothesized role the ADP/Mg2+ have in stabilizing the E1P·ADP state that we can see (lines 609-619): “Moreover, a comparison of the E1P tight structure with the E1P·ADP structure, its most immediate precursor in the conformational cycle obtained, reveals a number of significant rearrangements within the P domain (Figure 5B,C). First, Helix 6 (KdpB538-545) is partially unwound and has moved away from helix 5 towards the A domain, alongside the tilting of helix 4 of the A domain (Figure 5B,C – arrow 2). Second, and of particular interest, are the additional local changes that occur in the immediate vicinity of the phosphorylated KdpBD307. In the E1P·ADP structure, the catalytic aspartyl phosphate, located in the D307KTG signature motif, points towards the negatively charged KdpBD518/D522. This strain is likely to become even more unfavorable once ADP dissociates in the E1P state, as the Mg2+ associated with the ADP partially shields these clashes. The ensuing repulsion might serve as a driving force for the system to relax into the E2 state in the catalytic cycle.”

      We believe it is highly unlikely that the reported E1-P tight state represents an on-cycle high-energy E1P intermediate. For one, we observe a relaxation of electrostatic strains in this structure, in particular when compared to the obtained E1P ADP state. By contrast, the E1P should be the most energetically unfavourable state possible to ensure the rapid transition to the E2P state. As such, this state should be a transient state, making it less likely to be obtainable structurally as an accumulated state. Additionally, the association of the N domain with the A domain in the tight conformation, which would have to be reverted, would be a surprising intermediary step in the transition from E1P to E2P. Altogether, the here reported E1P tight state most likely represents an off-cycle state.

    1. Author Response

      Reviewer #1 (Public Review):

      A novel approach is introduced for targeting Protein-RNA interactions. The approach (presented in Figure 1) integrates computational techniques with cellular assays, and is applicable, in principle, whenever the protein-RNA complex has a druggable binding pocket. It is demonstrated with the discovery of inhibitors of YB-1's interaction with its mRNA target. Of 22 putative hits, discovered based on virtual screen, 11 come out as very strong hits. Far beyond the 5-10 percent success rate that one often sees in drug discovery. The main strength here is the proof of concept that protein-RNA interactions are targetable.

      We agree with the reviewer that large computational screens to identify potential inhibitors generally lead to dead ends. This is why we have rationally designed this integrative approach where predictions are experimentally validated with different tools and the obtained results feed/orient the computational approach. The workflow illustrated in Figure 1 creates a vivid exchange between computational and experimental data and allows a back-and-forth between both to enhance and refine the computational screen. We have also put in place a refined physics-based computational approach to increase our chances in avoiding these dead-end screens (details are in Computational Methods and in Appendix 2). The high predictive power of our computational approach comes from a rationally designed workflow combining the following:

      1- Understanding the dynamic behavior of the target, the binding pocket, and identification of key residues using MD simulations.

      2- The starting 3D structures used and refined using MD simulations.

      3- The prior identification and validation of the binding site and the identification of F1 and F4 as hits by NMR spectroscopy. F1 was then used in the pharmacophore screen.

      4- The statistical mechanics-based filter played an important role in orienting and refining this selection. For example, the use of ligand-water interactions to qualitatively estimate the residence of the ligand in the binding site.

      Nevertheless, the high success rate also comes from human intervention, where visual inspection and rational selection of structurally promising candidates (sometimes intuition-driven) also played an important role in selecting the 111 molecules issued from the static virtual screen (pharmacophore screens). We now clarify this point on pages 5 and 6 of the revised manuscript and give more details on the selection criteria used. We also specify that the large computational screen we implemented was mandatory to validate the MT bench.

      Reviewer #2 (Public Review):

      In the manuscript "Targeting RNA-Protein Interactions with an Integrative Approach Leads to the Identification of Potent YB-1 Inhibitors" the authors have tried to integrate computational, structural, and cellular imaging approaches to identify small molecule inhibitors of RNA-protein interactions. They take up as their target YB-1, an abundant RNA-binding protein (RBP) involved in regulating the translation and/or processing of multiple mRNAs, many of which encode genes involved in tumorigenesis and tumor progression. Firstly, the authors find a binding pocket in the cold shock domain (CSD) of YB-1, for the flavonoid fisetin, and more so for the analog quercetin, by NMR spectroscopy, which they name the "quercetin pocket". They then delineate and refine the RNA-binding characteristics of this pocket by MD simulations. Further, they conduct a computational screen of a large library of small molecules to find candidates which bind to this pocket. They then check the selected candidates as inhibitors of YB1-mRNA interaction using the microtubule bench (MT-bench) method. They find 11 molecules as significant hits with this approach, including one FDA-approved PARP-inhibitor drug (P1). P1 is shown to bind YB-1 by MD-simulation and NMR spectroscopy and was also shown to interfere with YB-1-mRNA interaction by NMR and in cells by the MT-bench assay. Finally, they showed that the molecule P1 reduced cellular translation by a puromycin incorporation assay and this effect was not observed in cells depleted of YB-1.

      Together, these multifarious approaches appear to establish a workflow useful for scoring for inhibitors of RNA-protein interactions. The workflow is rationally designed, moving from the identification of a binding pocket to the identification of binding molecules and then selecting molecules that inhibit protein-mRNA interactions. This workflow may be useful for other researchers attempting to screen libraries of compounds targeting RNA interactions by other RNA-binding proteins. However, as many RNA-binding proteins have large intrinsically disordered regions or no recognizable RNA-binding domains, it is to be seen whether such a structural "binding-pocket"-based approach can be generalizable to all RNA-binding proteins.

      We agree with the reviewer that this is not sufficient to generalize to all RBPs. Performing a complete study for other RBPs would require a separate paper. In the current work, we did show that we can detect mRNA-RBP interactions with two other RBPs HuR and FUS and used them as a control to show the specificity of the tested small molecules towards YB-1 (Figures 3d and 4b,c). We have now tuned down the statements about the generality of the method (page 20).

      In the discussion, we now also explain that YB-1, because it has a single cold-shock domain and a druggable pocket, is an “ideal” target. We also explain that many RNPs harbors many RNA-binding domains, which may reduce the sensitivity of our method when a specific domain is targeted by small molecules because the other domains would contribute to the binding to mRNA. However, a single RNA-binding domain may be isolated and used as bait for the MT bench assay to overcome this obstacle. Developing molecules what would target a specific domain may be sufficient to modulate the biological function exerted by the full length protein.

      While the data presented in the paper is coherent and generally supports the demonstration of an inhibition of RNA-binding by YB-1, what appears to be lacking is evidence that the observed effect is specific to inhibition of YB-1-mediated regulation of translation and whether the expression of transcripts specifically regulated by YB-1 is affected. Secondly, it is not clear what is the effect of the putative inhibitor on cellular activity and behaviour, which is important to judge both specific phenotypic effects as well as non-specific cytotoxic effects.

      Overall the work is interesting and instructive, but the lack of the above observations detracts from its significance.

      We thank the reviewer for his feedback and for raising these interesting points. As indicated in the manuscript, it is very difficult to find functional cellular assays that would reveal a phenotype specific to a general RBP such as YB-1. This is even more difficult with YB-1 since it binds nonspecifically to most mRNAs as shown from CLIP analysis1. This was one of the reasons to develop a specific cellular assay such as the MT bench assay. YB-1 originates from cold shock proteins in bacteria which preserve global mRNA translation during cold stress, presumably by removing secondary structures. YB-1 in contrast with many RBPs has only a single structured RNA-binding domains, which is not favorable for a specific binding to some mRNA sequences/structures. As noticed by the reviewers, YB-1 is indeed not a general translation factor but is a general protein that binds to most non polysomal mRNA 2. mRNAs, even those highly translated, switch from a polysomal state (active) to a non polysomal state (dormant) from time to time. In a recent work, we showed that YB-1 prepared non polysomal mRNAs in a way to facilitate the translation from dormant to active state. We also showed that, accordingly, decreasing the expression of YB-1 reduces global mRNA translation rates in HeLa cells3. Consistent with this trend, a global decrease of mRNA translation as observed with Niraparib P1 that targets YB-1 makes sense. We have no knowledge of established 3’UTRs which would be highly specific to YB-1. YB-1 binds non specifically to both mRNA coding sequences and 3’UTRs (YBX1 data1, YBX3 data4). Large scale and in depth analysis should be performed to find out whether specific structures/sequences increase significantly the YB-1 dependency in mRNA translations. However, the expression of some proteins associated to malignancy have been associated to YB-1 expression level notably Vimentin and E-cadehrin3. For this we performed a new experiment where we measured the expression levels of these two proteins after silencing YB-1 expression in HeLa cells, in the absence and in the presence of Niraparib P1 and Olaparib P2 (used as a negative control). Results show that P1, but not P2, decrease the dependence on YB-1 of Vimentin expression level (significant) and that of E-cadherin (non-significant). Other proteins such as eIF5a and RPL36, used here as negative controls, did not show a similar behavior. These results were thus in agreement with a specific effect of Niraparib on YB-1-mediated translation. In agreement with these results, we now add a result from a recent report showing the down regulation of Vimentin expression in ovarian cancer cells when treated with Niraparib5. This is now discussed on pages 16 and 17 of the revised manuscript and the new data are included as a new figure Figure 8-Figure supplement 3.

      1. Wu, S.-L. et al. Genome-wide analysis of YB-1-RNA interactions reveals a novel role of YB-1 in miRNA processing in glioblastoma multiforme. Nucleic acids research 43, 8516-8528 (2015).

      2. Singh, G., Pratt, G., Yeo, G.W. & Moore, M.J. The clothes make the mRNA: past and present trends in mRNP fashion. Annual review of biochemistry 84, 325 (2015).

      3. Budkina, K. et al. YB-1 unwinds mRNA secondary structures in vitro and negatively regulates stress granule assembly in HeLa cells. Nucleic acids research 49, 10061-10081 (2021).

      4. Van Nostrand, E.L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711-719 (2020).

      5. Zhen Zeng, Jing Yu, Zhongqing Jiang, Ningwei Zhao, "Oleanolic Acid (OA) Targeting UNC5B Inhibits Proliferation and EMT of Ovarian Cancer Cell and Increases Chemotherapy Sensitivity of Niraparib", Journal of Oncology, vol. 2022, 12 pages, 2022. https://doi.org/10.1155/2022/5887671

      As for the effect of the putative inhibitor on cellular activity and behaviour, which is important to judge both specific phenotypic effects as well as non-specific cytotoxic effects. We agree with the reviewer on this remark. YB-1 is associated with the high proliferation rate of cancer cells (and silencing YB-1 does not induce apoptosis). Therefore, we performed cell proliferation assays using cells treated with siRNA and siNEG allowing us to manipulate the endogenous YB-1 expression level rather than a more artificial rescue experiment. These assays were performed in the presence of 3 PARP-1 inhibitors at low concentrations: Niraparib P1 our hit, and two negative controls Olaparib P2 and Talazoparib P3. We used a 48 h incubation time which allows to observe effects at lower concentration of compounds. All PARP-1 inhibitors decrease cell proliferation, albeit to a higher extent with P3. However, P2 or P3 further decrease cell proliferation in siRNA-treated cells compared to siNEG-treated cells (significant differences at 5 µM)). In contrast, Niraparib rather further decreases cell proliferation in siNEG-treated cells when YB-1 levels are high (non-significant variations but opposite to those observed with P2 and P3). This new result is now presented as new Figure 8a. In addition, we show that the separation distance between cells increases significantly in YB-1-rich cells treated with P1, in contrast to P2 and P3 (significant differences) (new figure Figure 8-Figure supplement 1). A short distance of separation between cells may be due to colony formation when cells were plated at low density and allowed to grow for 48 h. Again, it means that Niraparib better inhibits cell proliferation in YB-1-rich cells when compared with what is observed with the two other PARP inhibitors Talazoparib and Olaparib. The text on page 17 was rewritten to include these new results and put this in evidence.

      Reviewer #3 (Public Review):

      The authors introduce an integrative platform for identifying small molecule ligands that can disrupt RNA-protein interactions (RPIs) in vitro and in cells. The screening assay is based on prior work establishing the MT bench assay (Boca et al. 2015) for evaluating protein-protein interactions in cells by utilizing microtubules as a platform to recruit and detect PPIs in cells. In the current manuscript, the authors adapted this methodology to evaluate small molecules targeting RNA-binding protein (RBPs) interactions with mRNA in cells. By combining the MT bench assay with computational docking/screening and ligand-binding evaluations by NMR, the authors discover inhibitors of the RBP YB-1, which included FDA-approved PARP-1 inhibitors. The impact of this work could be high given the critical roles of RNA-binding proteins in regulating the function and fate of coding and non-coding RNA. While the presented data are promising, the ability to generally apply this method beyond YB-1 and to RBPs in general remains to be addressed.

      We agree with the reviewer on his comments. In the revised version of the manuscript, we have tuned down the statements about the generality of the method. In addition, we elaborate about the potential of our assays and how to deal with RBPs that often have more than one RNA-binding domain. If many RNA-binding domains participate to the binding of a given RBP to mRNA, we may lose the sensitivity of the MT bench assays. However, one point is to use as bait to target isolated RNA-binding domain which could be enough to impair/correct the function of the full length RBP target. A statement has been added on page 20 of the revised manuscript to discuss this point.

    1. Author Response

      Reviewer #1 (Public Review):

      GCaMP indicators have become common, almost ubiquitous tools used by many neuroscientists. As calcium buffers, calcium indicators have the potential to perturb calcium dynamics and thereby alter neuronal physiology. With so many labs using GCaMPs across a variety of applications and brain regions, it's remarkable how few have documented GCaMP-related perturbations of physiology, but there are two main contexts in which perturbations have been observed: after prolonged expression of a high GCaMP concentration (common several weeks after infection with a virus using a strong promoter); and when cytoplasmic GCaMP is present during neuronal development. As a result, GCaMP studies are often designed to avoid these two conditions.

      Here, Xiaodong Liu and colleagues ask whether GCaMP-X series indicators are less toxic that GCaMPs. GCaMP-X indicators are modified GCaMPs with an additional N-terminal calmodulin binding domain that reduces interactions of the calmodulin moiety of GCaMP with other cellular proteins. Xiaodong Liu and colleagues document effects of GCaMP expression on neuronal morphology in vitro, calcium oscillations in vitro, and sensory responses in vivo, in each case showing that GCaMP-X indicators are less toxic. Their results are compelling.

      Unfortunately, the paper suffers two main weaknesses. Firstly, the results demonstrate that GCaMP is toxic during development, after prolonged expression via viruses in vivo, and in cell culture where maturation of the culture likely recapitulates key steps in development. GCaMPs are known to be toxic in these circumstances, such toxicity is readily circumvented by driving expression in the adult, and there are countless examples of studies in which adequate GCaMP expression was achieved without toxicity. These new results are of little relevance to the majority of GCaMP experiments. That GCaMP-X indicators are less toxic during development is a new result and may be of interest to those who wish to deploy calcium indicators during development, but this is a relatively small number of neuroscientists.

      We thank the reviewer for providing valuable opinions on these critical matters. Here, we would like to clarify:

      1. In our work, the status of neurites (length, branching, etc.) is indeed one main aspect to monitor, and neuritogenesis during the early stages of development is known to have temporal trajectories with ample dynamic range thus helpful to quantitatively compare GCaMP-X versus GCaMP. However, the key factor is the actual time and level of probe expression in neurons, and the starting timepoint of expression could vary. We have conducted additional experiments using virus-infected neurons (Figure 5—figure supplement 1) and transgenic neurons with inducible expression (Figure 7—figure supplement 3), both starting to express the probes at the mature stage. Thus, GCaMP-X imaging is not necessarily limited to developing neurons. As in the original reports of GCaMP probes with toxicity, virus injection was performed for both immature (2-3 weeks, Tian 2009 PMID: 19898485) and mature mice (~2 months, Chen 2013 PMID: 23868258). According to the protocol (Huber 2012 PMID: 22538608), GCaMP virus injection was done for adult mice (>2 months), which exhibited functional and morphological deficits in nucleus-filled neurons beyond OTW (Figure 2, Figure 5 and Figure 6). Collectively, the central principles of GCaMP-X versus GCaMP are applicable to both immature and mature neurons.

      2. Chronic GCaMP-X imaging has a broad spectrum of potential applications, not limited to neural development (Resendez 2016 PMID: 26914316). As mentioned, GCaMP-X resolves the problem of longitudinal expression thus making chronic imaging more feasible. We agree with the reviewer that a large body of our data in the original version focused on the characteristics of calcium signals during the early stage of neuronal development, which served as an exemplary scenario to compare GCaMP-X with GCaMP. Indeed, the importance of Ca2+ oscillation in neural development is commonly accepted (Kamijo 2018 PMID: 29773754; Gomez 2006 PMID: 16429121). In vivo Ca2+ imaging (Figure 2 and Figure 5) and morphological analyses (e.g., Figure 6) have extended the major conclusions onto mature neurons where dysregulations of Ca2+ oscillations are also tightly coupled with neuronal health or death/damage. Importantly, GCaMP-X paves the way to unexplored directions previously impeded or discouraged due to GCaMP perturbations, e.g., chronic imaging of cultured neurons to concurrently monitor Ca2+ activities and cell morphology as in this study.

      3. To circumvent the toxicity of GCaMP is not a trivial procedure for viral infection. The expression levels need to be carefully adjusted experimentally, e.g., by dilution studies (Resendez 2016 PMID: 26914316). A delicate balance of GCaMP expression is critical: low level (or short time) of expression would result in weak signals and poor SNR whereas high level (or long time) of expression would cause nuclear filling and neural toxicity. Even for the work-around conditions of time window and dilution dosage, nucleus-filled neurons are not uncommon judged by the expression/fluorescence patterns, e.g., in the original reports of GCaMP6 (Supplementary Figure 7, Chen 2013 PMID: 23868258), and GCaMP3 (Supplementary Figure 11, Tian 2009 PMID: 19898485). Under particular conditions (subtypes of neurons, time window of imaging, dosage of virus injection, etc.), many neurons could be found without apparent perturbation/nuclear-filling to proceed with calcium imaging. Using GCaMP-X, dosage is less restricted (10fold higher concentration for GCaMP-X with improved SNR and overall performance in Figure 2, Figure 5 and Figure 6). Practically, GCaMP-X is a simple solution for the issues related to excessive/prolonged expression. Also, GCaMP-X is expected to help maintain the total number of healthy neurons and thus the general health of the brain. Reportedly, some GCaMP lines of transgenic mice exhibit epileptic activities (Steinmetz 2017 PMID: 28932809), awaiting future studies to explore whether GCaMP-X could help.

      4. As the reviewer pointed out, the key of GCaMP-X is to resolve the unwanted (apo)GCaMP binding to endogenous proteins in neurons. We agree with the reviewer that according to the empirical observations the following factors appear to increase the severity of GCaMP perturbations: prolonged time, high concentration and nuclear accumulation. GCaMP-X is able to protect GCaMP from unwanted binding and the consequent damage to neurons, validated by various tests thus far (in vitro and in vivo). In this context, the prolonged time would result in higher GCaMP concentration, meanwhile accumulating the effects due to GCaMP interactions; higher GCaMP concentration would interfere with more binding events and targets of endogenous CaM; and enhanced/prolonged expression of GCaMP is directly correlated with nuclear accumulation, a hallmark of neuronal damage.

      Secondly, the authors extend their claims to conclude that GCaMP indicators are toxic under other circumstances, claims supported by neither their results nor the literature. To provide one example, at the end of the introduction is the statement, 'chronic GCaMP-X imaging has been successfully implemented in vitro and in vivo, featured with long-term overexpression (free of CaM-interference), high spatiotemporal contents (multiple weeks and intact neuronal network) and subcellular resolution (cytosolic versus nuclear), all of which are nearly infeasible if using conventional GCaMP.' The statement's inaccurate: there are many chronic imaging studies in vitro and in vivo using GCaMP indicators without nuclear accumulation of GCaMP or perturbed sensory responses. There are more examples throughout the paper where the conclusions overreach the results and are inaccurate. The results are simply insufficient to support many of the strong statements in the paper.

      Overall, the critics and suggestions of the reviewer have been well taken and we have revised the text accordingly. For this particular paragraph here mentioned by the reviewer, we want to clarify that it was the summary of our results in the whole manuscript, where each claim referred to the data and analyses shown in corresponding figures. In details, these figures were: 'free of CaM-interference (Figure 1), multiple weeks and intact neuronal network (in vitro: Figure 3 and Figure 4; in vivo: Figure 2, Figure 5 and Figure 6; transgenic neurons: Figure 7) and cytosolic versus nuclear (Figure 1 and the previous Figure 8). The last sentence of 'all of which are nearly infeasible if using conventional GCaMP' was meant to summarize the results comparing GCaMP versus GCaMP-X in our experimental settings of chronic imaging with prolonged/excessive probe expression. Again, we agree that for particular experimental settings and purposes the toxicity of GCaMP can be circumvented empirically. To avoid miscommunications, we have revised this paragraph by moving it to the Discussion (after all the data), also ensuring that the statements on GCaMP are backed up with data or literature. Please also see Essential Revisions, Item 3.

      Reviewer #2 (Public Review):

      Geng and colleagues provide further evidence for the lower neuronal toxicity of their improved GECI, GCaMP-X, which allows improved recordings of Ca2+ signals in neurons. As reported previously and studied in more detail here, the improved properties are primarily due to a lower tendency of GCaMP-Xc (reporting cytosolic Ca2+) to enter the nucleus. They present a systematic comparison of their cytosolic or nucleus-targeted GCamP-Xc (and Xn) with the corresponding "conventional" GCaMPs (jGCaMP7b, GCaMP6m). They, again, confirm the absence of apoGCaMP-X binding to the CaM binding domain of Cav1.3 L L-type Ca2+ channels suggesting that this is the main or one of several GCaMP interactions leading to altered intracellular signaling affecting neuronal survival, development and architecture. Evidence for more (likely) physiological Ca2+ responses were obtained from a battery of experiments, including in vivo recordings of acute sensory responses after viral expression of GCaMPs, monitoring of long-term calcium oscillations in cultured neurons, correlations measured Ca2+ oscillations with hallmarks of neuronal development (soma size, neurite outgrowth/arborizations, and long-term recordings of spontaneous Ca2+ activities in vivo in S1 primary somatosensory cortex. The latter experiments also showed that much higher doses of AAV-GCaMP6m-Xc could be administered than of GCaMP6m. They also show that unfavorable effects of GCaMPs on neurons of adult GCaMP expressing transgenic mice, both in in slices and cultured neurons. While most experiments aim at demonstrating improved performance of GCaMP-X, one finding also provides potential novel insight into the role of neuronal activity patterns during neuronal development in culture. Assuming more undisturbed physiological Ca2+ signaling even through longer time periods they can follow different Ca2+ activity patterns during neuronal development. Oscillation amplitudes and the level of synchrony correlated with neurite length and frequency inversely correlated with neurite outgrowth.

      They provide convincing experimental evidence for the improvements claimed for their novel GCamP-X constructs. Some aspects should be clarified.

      A key finding explaining the construct differences is the nuclear localization. The authors should also provide numbers for the N/C ratio for Ca2+ imaging of sensoryevoked responses in vivo (Fig. 2; pg 6: nuclear accumulation was barely noticeable from GCaMP6m-Xc even beyond OTW). Also, for chronic experiments in brain slices they state for GCaMP6m-Xc in the text that (pg 12) "meanwhile the N/C ratio remained ultra-low", yet Fig. 6 shows a N/C ratio of 0.2. This does not appear to be "ultra low".

      We appreciate the reviewer for bringing up the matter of N/C ratio (indicative of nuclear accumulation). We have appended the values of N/C ratio for in vivo experiments (revised Figure 2). Following the previous report, the criteria of N/C ratio was set to 0.8 to regroup the neurons into two subpopulations. A significant fraction of GCaMP neurons were nucleus-filled (N/C ratio>0.8); meanwhile, nearly no neuron expressing GCaMP-XC was found with N/C ratio greater than 0.8 when examined 8-13 weeks post injection. Generally, due to imaging resolution, confocal microscopy provided more precise evaluation for N/C ratio than two-photon in vivo images. In Figure 6, even more clear difference in nuclear distribution was observed between GCaMP and GCaMP-X, which was described as “ultralow” (GCaMP-X). Of note, the N/C ratio of YFP itself was ~1.3. The N/C ratio for GCaMP-XC was not close to zero, consistent with the measurements from other NES-tagged peptides (Yang 2022 PMID: 35589958). GCaMP-XC was not completely excluded from cell nuclei, thus producing some fluorescence there. In light of this comment, we have revised the relevant text including the phrase of “ultralow” (Page 14, Line 393). In addition, Figure 5 was also revised accordingly.

      Along these lines, since nuclear-filled neurons were observed in their experiments with GCaMP-Xc, the authors should comment if altered Ca2+ signals were also seen for the few neurons expressing GCaMP-Xc in the nucleus.

      During 2-photon imaging experiments in vivo, occasionally GCaMP-XC neurons appeared to have some level of nuclear expression especially in those blurred images of low quality. Judged by the criteria of N/C ratio (0.8), these neurons rarely fell into the nucleus-filled group (Figure 2B and Figure 5C, also see confocal imaging Figure 1B). On the other hand, a small fraction of GCaMP-XC could be “leaked” into the nucleus. GCaMP-XN also eliminated toxic (apo)GCaMP interactions in neurons, sharing the same design principle with GCaMP-XC (Figure 1). Therefore, nuclear GCaMP-XC is expected to resemble GCaMP-XN. Experimentally, with GCaMP-XC or GCaMP-XN present in the nucleus, no significant change in neuronal Ca2+ or neurite morphology has been observed. Meanwhile, this comment has pointed out one important direction of future research, i.e., to more precisely confine GCaMP-X within the targeted organelles, e.g., by improving or replacing localization tags.

      Since they performed a systematic comparison of two constructs to demonstrate an (expected) superiority of one of them, the experiments, or at least the analysis, should ideally be performed in a blinded way. The authors should clarify how they avoided experimental bias.

      For in vitro experiments, multiple independent trials of experiments with analyses were performed by two (or more) researchers to ensure the reproducibility and to minimize any bias. And the results and conclusions have been highly consistent (among different trials/researchers). Following the suggestion, we have assured that in vivo experiments and data analyses were separately conducted by the researchers from two different labs. For long-term expression/imaging, the differences between GCaMP-X and GCaMP were often discernable directly in the images even without further calculations or statistics (e.g., Figure 3B). Related information can be found in the Methods (Page 32, Line 799).

      In their chronic Ca2+ fluorescence imaging for autonomous Ca2+ oscillations in cultured cortical neurons ultralong lasting signals (Fig. 3B, DIV 17, GCaMP6m) could be observed. It would be helpful to further describe the nature of these transients, ideally by adding it to their video collection.

      As suggested by the reviewer, the video for Figure 3B (DIV 17, GCaMP6m) has been included in this revision (Figure 3—video supplement 2). In contrast to the oscillatory signals normally observed from healthy neurons, the pronounced and sustained Ca2+ signals are associated with apoptosis and other pathological conditions in neurons (Khan 2020 PMID: 32989314; Nicotera 1998 PMID: 9601613; Harr 2010 PMID: 20826549). The Ca2+ wave with broadened width (FWHM) was indicative of damaged neurons by GCaMP (Figure 3F), rather than (altered) sensing characteristics of GCaMP. We agree that this observation is a notable and interesting phenomenon, worth to follow up in future studies.

      The discussion is very long. In my opinion it would benefit from shortening, avoid redundancies and focus only on the key findings in this paper. This includes the chapter on design and application guidelines for CaM-based GECIs. The main message what the advantage of their GCaMP-X modifications has been made before in the discussion. A more detailed discussion on this appears more suitable in a review article.

      In response to this suggestion, we have made it as concise as possible, by simplifying or removing several topics including the design and application guidelines for CaMbased GECIs.

      It may be worthwhile to include another aspect in the discussion: does the improved GCaMP-Xc cause no change in neuronal function or morphology or is it just less damaging than other GCaMPs. How can this issue be addressed experimentally.

      We have revised the discussion accordingly (Page 21, Line 588). We agree that additional experiments would help evaluate how close GCaMP-X data are to the reality, considering the Ca2+-buffering effect intrinsic to Ca2+ probes and also other factors. In light of this suggestion and also those from Reviewer #1, we have incorporated more experimental controls, including Ai140 mice (GFP, Figure 7—figure supplement 2) and Fluo-4 AM (Ca2+ dye, Figure 3—figure supplement 4). The results have been encouraging in that GCaMP-X neurons were nearly indistinguishable in the morphological and functional aspects from GFP or Fluo-4 AM controls. The incoming feedbacks from GCaMP-X users should continue to help clarify this matter, which we would like to follow up.

    1. Author Response

      Reviewer #1 (Public Review):

      This study uses the mouse calyx of Held synapse as a model to explore the presynaptic role of rac1, a regulator of actin signaling in the brain. Many of the now-classical methods and theory pioneered by Neher and colleagues are brought to bear on this problem. Additionally, the authors were able to make a cell-specific knockout of rac1 by developing a novel viral construct to express cre in the globular bushy cells of the cochlear nucleus; by doing this in a rac1 floxed mouse, they were able to KO rac1 in these neurons starting at around P14. The authors found that KO of rac1 enhanced EPSC amplitude, vesicle release probability, quantal release rates, EPSC onset time and jitter during high-frequency activity, and fast recovery rates from depression. Because the calyx synapses are the largest and most reliable of central nerve terminals, all these various effects had no effect on suprathreshold transmission during 'in vivo-like' stimulus protocols. Moreover, there was no effect morphologically on the synapse. Through some unavoidably serpentine reasoning, the authors suggest that loss of rac1 affects the so-called molecular priming of vesicles, possibly due to a restructuring of actin barriers at the active zone. The experimental analysis is at a very high level, and the work is definitely an important contribution to the field of presynaptic physiology and biophysics. It will be important to test the effects of the KO on other synapses that are not such high-performers as the calyx, and this direction might reveal significant effects on information processing by altered rac1 expression.

      We thank the reviewer for their comments and view that our work is an important contribution to the field of presynaptic physiology and biophysics.

      Major points:

      1) The measurement of onset delay was used to test whether rac1-/- affects positional priming. While there is a clear effect of the KO on the latency to EPSC onset, there is no singular interpretation one can take, due to the ambiguity of the 'onset delay'. Note that in the Results authors state Lines 201-203: "The time between presynaptic AP and EPSC onset (EPSC onset delay) is determined by the distance between SVs and VGCC which defines the time it takes for Ca2+ to bind to the Ca2+ sensor and trigger SV release (Fedchyshyn and Wang, 2007)." However, in Methods "The duration between stimulus and EPSC onset was defined as EPSC onset delay." Thus the 'onset' measured is not between presynaptic spike and EPSC but from axonal stimulus and EPSC. KO of rac might also affect spike generation, spike conduction, calcium channel function, etc. Indeed some additional options are offered in the Discussion. Since the change in onset is ~100usec at most, a number of small factors all could contribute here. Moreover, the authors conclude that the KO does NOT affect positional priming since they would have expected the onset to shorten, given the other enhancements observed in earlier sections.

      It seems to me that all the authors can really conclude is that the onset shifted and they do not know why. If onset is driven by multiple factors, and differentially affected in the KO, then all bets are off. Thus, data in this section might be removed, or at least the authors could further qualify their interpretations given this ambiguity.

      We have further qualified and clarified our interpretations of the EPSC onset measurement. To do so, we have added additional text to the Discussion (see lines 475-491). We would like to emphasize that we do not see a statistically significant change in EPSC1 onset delay and EPSC onset delays during 50 Hz train stimuli between the Rac1+/+- and Rac1−/− synapses but rather an activity-dependent increase in EPSC onset delays in Rac1−/− synapses during 500 Hz stimulation. It is important to note that based on these data, it is less likely that changes in spike generation, spike conduction, or calcium channel function are responsible for the change in EPSC onset delay. If SVs were closer to CaV2.1 channels, we would expect shorter initial EPSC onset delay time or shorter EPSC onset delay times during 50 Hz stimulation. However, changes in spike generation, spike conduction or calcium channel function could contribute to the increase in the EPSC onset delay at 500 Hz. Finally, it is important to note that EPSC onset delay increase during 50 Hz and 500 Hz stimulation in Rac1+/+ synapses indicating an activity-dependent regulation. However, this activity-dependent increase was pronounced in Rac1−/− synapses during both 50 Hz and 500 Hz stimulation (Fig 4B1-B3).

      2) If the idea is that the loss of Rac1 leads to a reduced actin barrier at the active zone, is there an ultrastructural way to visualize this, labeling for actin for example? Authors conclude that new techniques are needed, but perhaps this is 'just' an EM question.

      We are not aware of a method for ultrastructural visualization of actin and SV distributions relative to the plasma membrane. To do so requires specific labeling and detection of actin filaments while visualizing SVs using EM. While EM on samples prepared by high-pressure freeze with freeze substitution allows for detection of filamentous structures near the AZ, the molecular identity of these filamentous structures would remain uncertain. Super-resolution microscopy is amenable to immunohistochemical techniques to label actin, but visualizing SVs in 3D using super-resolution is a major technical challenge. Furthermore, changes in SV docking on the scale of 1-2 nanometers are correlated with severe changes in SV release, therefore we would need to be able to quantify structural changes at this level of resolution. Currently, we are not aware of any study or report that has analyzed SV docking or reported changes on the scale of 1-2 nm using super-resolution light microscopy. It might be possible to use expansion microscopy to achieve such resolution but the respective protocols would need to be established for the calyx synapse. In addition, it is proposed that the regulation of actin filaments is transient and happens on very fast time scales which complicates their investigation by conventional methods (O'Neil et al., 2021). Thus, even if we were able to solve all these technical hurdles, it is well possible to miss potential differences even if we were able to label actin. Therefore, while we agree that having this type of ultrastructural data available would strongly strengthen our hypothesis, the development of the techniques and protocols needed to perform these types of experiments would likely require many months if not years.

      3) Authors use 1 mM kynurenic acid in the bath to avoid postsynaptic receptor saturation. But since this is a competitive antagonist and since the KO shows a large increase in release, could saturation or desensitization have been enhanced in the KO? This would affect the interpretation of recovery rates in the KO, which are quite fast.

      We agree with the reviewer that differences in saturation or desensitization could potentially impact the measured recovery time course in Rac1−/−. However, we think this is unlikely because of the following reasons: Desensitization and saturation of synaptic AMPARs is strongly reduced during calyx synapse maturation (Taschenberger et al., 2002; Taschenberger et al., 2005). We recorded from >P28 calyx synapses which exhibit a claw-like, fenestrated terminal morphology offering many diffusional exits for released glutamate which is expected to speed up transmitter clearance and therefore reduce postsynaptic effects (Taschenberger et al., 2005; Yang et al., 2021). We used 1 mM Kynurenic acid in the external bath solution which resulted in a ~90% reduction in EPSC amplitude in both Rac1+/+ and Rac1−/−, which is comparable to previous reports (e.g. Lipstein et al., 2021). In our study, we performed all experiments in 1.2 mM Ca2+ and at body temperature which further reduces EPSC amplitudes and minimizes potential receptor saturation and desensitization compared to 2 mM Ca2+ at room temperature. Time constants of recovery from desensitization at the calyx are between 30 ms at P14-P16 (Joshi et al., 2004) and 16 ms at P21 (Koike-Tani et al., 2008), both measured at room temperature. It is conceivable that the recovery from desensitization at P30 and at physiological temperature will be significantly shorter. Since we observed the largest effect in recovery between 1 and 4 seconds, this is at least two orders of magnitude slower than the recovery from desensitization could likely account for. Finally, our numerical simulations are consistent with the possibility of faster recovery rates observed in Rac1−/− being a direct consequence of changes in SV priming. This faster pool replenishment likely also enabled increased steady-state EPSC amplitudes at 50 Hz in Rac1−/− synapses. The fact that we were able to measure enhanced steady-state release in Rac1−/− argues against steady-state EPSC amplitudes being limited by AMPARs desensitization.

      Reviewer #2 (Public Review):

      The aim of the study is an improved understanding of the role of the RhoGTPase Rac1 in neurotransmitter release beyond the known roles in synaptogenesis and postsynaptic function. To this end, Rac1 is ablated at P12 (when synapse development has largely progressed to maturation) and transmission is studied at the adult stage (P28 onwards). The study reports a number of interesting findings, in particular, a large increase in synaptic strength, which is interpreted as an '... increased release probability, which results in faster SV replenishment'. It is not clear whether this statement is supposed to suggest a causal relationship or just a correlation between the two parameters. By and large, the discussion of results is somewhat fuzzy with respect to the distinction between release itself (as characterized by release probability) and priming steps, which precede release.

      Besides, the authors present valuable data on Rac1-dependent timing and synchronicity of neurotransmitter release, which point towards a role of Rac1 in 'positional priming', i. e. the proper localization of synaptic vesicles relative to Ca-channels.

      We thank the reviewer for pointing out that our study present valuable data on Rac1-dependent timing and synchronicity of neurotransmitter release.

    1. Author Response

      Reviewer #1 (Public Review):

      Redox signaling is a dynamic and concerted orchestra of inter-connected cellular pathways. There is always a debate whether ROS (reactive oxygen species) could be a friend or foe. Continued research is needed to dissect out how ROS generation and progression could diverge in physiological versus pathophysiological states. Similarly, there are several paradoxical studies (both animal and human) wherein exercise health benefits were reported to be accompanied by increases in ROS generation. It is in this context, that the present manuscript deserves attention.

      Utilizing the in-vitro studies as well as mice model work, this manuscript illustrates the different regulatory mechanisms of exercise and antioxidant intervention on redox balance and blood glucose level in diabetes. The manuscript does have some limitations and might need additional experiments and explanation.

      The authors should consider addressing the following comments with additional experiments.

      1) Although hepatic AMPK activation appears to be a central signaling element for the benefits of moderate exercise and glucose control, additional signals (on hepatic tissue) related to hepatic gluconeogenesis such as Forkhead box O1 (FoxO1), phosphoenolpyruvate carboxykinase (PEPCK), and GLUT2 needs to be profiled to present a holistic approach. Authors should consider this and revise the manuscript.

      We appreciate the constructive suggestion. Besides glycolysis, gluconeogenesis and glucose uptake are critical in maintaining liver and blood glucose homeostasis.

      FoxO1 has been tightly linked with hepatic gluconeogenesis through inhibiting the transcription of gluconeogenesis-related PEPCK and G6Pase expression (1, 2). Herein, we found the expression of FoxO1 increased in the diabetic group but reduced in the CE, IE and EE groups (Fig. X1A, Fig.5E-F in manuscript). Meanwhile, the mRNA level of Pepck and G6PC (one of the three G6Pase catalytic-subunit-encoding genes) also decreased in the CE, IE, and EE groups (Fig. X1B-1C, Fig.5H-I in manuscript). These results indicates that these three modes of exercise all inhibited gluconeogenesis through down-regulating FoxO1.

      For the glucose uptake, we detected the protein expression of GLUT2 in the liver tissue. Glut2 helps in the uptake of glucose by the hepatocytes for glycolysis and glycogenesis. Accordingly, we found GLUT2,a glucose sensor in liver, was up-regulated in diabetic rats, but down-regulated by the CE and IE intervention. However, GLUT2 didn’t decrease in the EE group, which is consistent with the results of the unimproved blood glucose by EE intervention (Figure X1A, Fig.5E and 5G in manuscript).

      Taken together, moderate exercise could benefits glucose control through increasing glycolysis and decreasing gluconeogenesis. We added this part in Page 9 line 251-263 and Figure 5E-5I in this version.

      Figure X1. A. Representative protein level and quantitative analysis of FOXO1 (82 kDa), GLUT2 (60-70 kDa) and Actin (45 kDa) in the rats in the Ctl, T2D, T2D + CE, T2D + IE and T2D + EE groups. C-D. Expression of hepatic Pepck and G6PC mRNA in the Ctl, T2D, T2D + CE, T2D + IE and T2D + EE groups were evaluated by real-time PCR analysis. Values represent mean ratios of Pepck and G6PC transcripts normalized to GAPDH transcript levels.

      2) Very recently sestrin2 signaling is assumed significant attention in relation to exercise and antioxidant responses. Therefore, authors should profile the sestrin2 levels as it is linked to several targets such as mTOR, AMPK and Sirt1. Additionally, the levels of Nrf2 should be reported as this is the central regulator of the threshold mechanisms of oxidative stress and ROS generation.

      We appreciate reviewer’s expert comments. Nrf2 is an important mediator of antioxidant signaling, playing a fundamental role in maintaining the redox homeostasis of the cell. Under unstressed conditions, Nrf2 activity is suppressed by its innate repressor Kelch-like ECH-associated protein 1 (Keap1) (3). With the increase of ROS level in the development of diabetes, Nrf2 was activated to induce the transcription of several antioxidant enzymes (4, 5).

      Nrf2 expression level has been reported to increase in HFD mice or diabetic patients (6, 7). It has been found from in vitro studies that NRF2 activation is achieved with acute exposure to high glucose, whereas longer incubation times or oscillating glucose concentration failed to activate Nrf2 (8, 9). These suggest that the increase of ROS in diabetes can cause compensatory upregulation of Nrf2. In our study, we found that Nrf2 increased in diabetic rats, which can further initiate the expression of antioxidant enzymes. As shown in Fig.X2A (Fig.2H-2K in manuscript), Grx and Trx involved in thioredoxin metabolism were up-regulated accordingly like Nrf2. After CE intervention, the level of Nrf2 increased further more (Fig.2E-2F), suggesting that CE intervention could activate antioxidant system to achieve a high-level redox balance. We have added these new results into Figure 2.

      On the other hand, the expression level of Sestrin2 and Nrf2 decreased after antioxidant supplement. Our results suggest that the antioxidant treatment improved the diabetes through inhibiting ROS level to achieve a low-level redox balance, but moderate exercise enhanced ROS tolerance to achieve a high-level balance (Fig.X2D-F, Fig.3E-3G in manuscript).

      We added the new data in “Page 5 line 147-153 and Page 7 line 183-186” and Figure 2-3 in current version.

      Figure X2. A-C. Representative protein level and quantitative analysis of Nrf2 (97 kDa), Sestrin2 (57 kDa) and Actin (45 kDa) in the rats in the Ctl, T2D and T2D + CE groups. D-F. Representative protein level and quantitative analysis of Nrf2 (97 kDa), Sestrin2 (57 kDa) and HSP90 (90 kDa) in the rats in the Ctl, T2D and T2D + APO groups.

      3) Authors should discuss the exercise-associated hormesis curve. They should discuss whether moderate exercise could decrease the sensitivity to oxidative stress by altering the bell-shaped dose-response curve.

      We thank the reviewer’s valuable comments. According to literatures, Zsolt Radak et al proposed a bell-shaped dose-response curve between normal physiological function and level of ROS in healthy individuals, and suggested that moderate exercise can extend or stretch the levels of ROS while increases the physiological function (10). Our results validated this hypothesis and further proposed that moderate exercise could produce ROS meanwhile increase antioxidant enzyme activity to maintain high level redox balance according to the Bell-shaped curve, whereas excessive exercise would generate a higher level of ROS, leading to reduced physiological function. In this study, we found the state of diabetic individuals is more applicable to the description of a S-shaped curve, due to the high level of oxidative stress and decreased reduction level in diabetic individuals (Fig.8B). With the increase of ROS, the physiological function of diabetic individuals gradually decreases and enters a state of redox imbalance. Moderate exercise shifts the S-shaped curve into a bell-shaped dose-response curve, thus reducing the sensitivity to oxidative stress in diabetic individuals and restoring redox homeostasis. However, with excessive exercise, ROS production increases beyond the threshold range of redox balance, resulting in decreased physiological function (Fig.8B, see the decreasing portion of the bell curve to the right of the apex).

      Nevertheless, the antioxidant intervention increased physiological activity by reducing ROS levels in diabetic individuals, restoring a bell-shaped dose-response curve at low level of ROS (Fig.8B). Therefore, redox balance could be achieved either at low level of ROS mediated by antioxidant intervention or at high level of ROS mediated by moderate exercise, both of which were regulated by AMPK activation. Therefore, both high and low levels of redox balance can lead to high physiological function as long as they are in the redox balance threshold range. Then, the activation of AMPK is an important sign of exercise or antioxidant intervention to obtain redox dynamic balance which helps restore physiological function. Accordingly, we speculate that the antioxidant intervention based on moderate exercise might offset the effect of exercise, but antioxidants could be beneficial during excessive exercise. The human study also supports that supplementation with antioxidants may preclude the health-promoting effects of exercise (11). Therefore, personalized intervention with respect to redox balance will be crucial for the effective treatment of diabetes patients.

      We added this part into “Discussion” in this version (Page 13-14 line 389-418).

      4) It would not be ideal to single-out AMPK as a sole biomarker in this manuscript. Instead, authors should consider AMPK activation and associated signaling in relation to redox balance. This should also be presented in Fig 7.

      We thank reviewer’s critical comments. According to the comments, we have discussed the AMPK signaling in the discussion part (Page 13, line 373-384) and added the AMPK signaling in Fig.8A.

      Reference:

      1. R. A. Haeusler, K. H. Kaestner, D. Accili, FoxOs function synergistically to promote glucose production. J Biol Chem 285, 35245-35248 (2010).
      2. J. Nakae, T. Kitamura, D. L. Silver, D. Accili, The forkhead transcription factor Foxo1 (Fkhr) confers insulin sensitivity onto glucose-6-phosphatase expression. J Clin Invest 108, 1359-1367 (2001).
      3. M. McMahon, K. Itoh, M. Yamamoto, J. D. Hayes, Keap1-dependent proteasomal degradation of transcription factor Nrf2 contributes to the negative regulation of antioxidant response element-driven gene expression. J Biol Chem 278, 21592-21600 (2003).
      4. R. S. Arnold et al., Hydrogen peroxide mediates the cell growth and transformation caused by the mitogenic oxidase Nox1. Proc Natl Acad Sci U S A 98, 5550-5555 (2001).
      5. J. M. Lee, M. J. Calkins, K. Chan, Y. W. Kan, J. A. Johnson, Identification of the NF-E2-related factor-2-dependent genes conferring protection against oxidative stress in primary cortical astrocytes using oligonucleotide microarray analysis. J Biol Chem 278, 12029-12038 (2003).
      6. T. Jiang et al., The protective role of Nrf2 in streptozotocin-induced diabetic nephropathy. Diabetes 59, 850-860 (2010).
      7. X. H. Wang et al., High Fat Diet-Induced Hepatic 18-Carbon Fatty Acids Accumulation Up-Regulates CYP2A5/CYP2A6 via NF-E2-Related Factor 2. Front Pharmacol 8, 233 (2017).
      8. T. S. Liu et al., Oscillating high glucose enhances oxidative stress and apoptosis in human coronary artery endothelial cells. J Endocrinol Invest 37, 645-651 (2014).
      9. Z. Ungvari et al., Adaptive induction of NF-E2-related factor-2-driven antioxidant genes in endothelial cells in response to hyperglycemia. Am J Physiol Heart Circ Physiol 300, H1133-1140 (2011).
      10. Z. Radak et al., Exercise, oxidants, and antioxidants change the shape of the bell-shaped hormesis curve. Redox Biol 12, 285-290 (2017).
      11. M. Ristow et al., Antioxidants prevent health-promoting effects of physical exercise in humans. Proc Natl Acad Sci U S A 106, 8665-8670 (2009).
    1. Author Response

      Reviewer #2 (Public Review):

      Klein et al. have developed a high-throughput tracker to evaluate operant conditioning in Drosophila larvae. Employing this device, they train larvae to prefer bending towards one specific side (left or right), by using as unconditioned stimulus (US) the optogenetic activation of dopaminergic and serotoninergic neurons, demonstrating that larvae are able to perform this behaviour. Furthermore, they show that serotoninergic neurons alone are sufficient to mediate the reward signal, and that specifically serotoninergic neurons in the VNC are required for this behaviour. However, they do not show whether serotoninergic VNC neurons are sufficient. The results are interesting and novel. Operant conditioning had been shown for Drosophila adult. Furthermore, the existence of VNC circuits sufficient for operant conditioning had been shown for other species, as the authors point out in the discussion. Nonetheless, the genetic dissection to identify serotonine expressing neurons as mediators of operant conditioning in the Drosophila larva, and the identification of VNC serotonine cells as necessary are new. Furthermore, given the experimental advantages of the Drosophila larva, including genetic accessibility and a full connectome, the findings open the door to future research into the circuit mechanisms of operant conditioning. I have some comments that I think would be important to address.

      The high-throughput tracker is impressive. However, there is no sufficient documentation to ensure that an expert would be able to easily reproduce it. All of the hardware assembly files, the list of materials, as well as the electronic circuit maps and all of the required software needs to be appropriately documented and uploaded onto a public repository. This is a basic requirement when publishing new hardware/software, particularly in an open journal such as eLife.

      We have now included all the documentation and CAD files for the high-throughput tracker. The software is publicly available in the following Github repository (https://github.com/ZlaticLab/multi-larva-tracker-scripts-public). The CAD files are available in the Supplementary materials of the paper.

      • The differences observed in the results of operant conditioning are very subtle (see for example figure 3c), which means that it is extremely important that statistic analyses are correctly made. The sample number (n) for these experiments is really high (n>100) and for what I understood is not equivalent to the number of animals, because the same animal can generate n >1, eg. n = 2 or n =3 if it collides one or two times, as each time it collides a new identity is given to the larvae. This means that the datapoints collected are not independent, and I think in that case a Wilcoxon rank-sum test is not the appropriate test to take. I recommend the authors and eLife editors to consult with an expert in this type of statistics. Alternatively, the authors could, for each experiment, take into account only the data from larvae that did not collide, and for those that collide only take into account the data before the collision. This can be calculated easily as they just need to exclude from their analysis in each experiment all of the larval IDs where the ID is larger than the initial number of larvae identified by the software.

      We apologise if we did not clarify sufficiently that we only took into account (for each time bin) larvae that did not collide. Within the Materials and methods, we describe how objects retained for analysis had to satisfy several criteria. The first criterion is that the object needed to be detected in every frame of the given 60 s bin. In this way, the object identity is stable throughout the bin - a reflection that the object did not collide with another object. In other words, within a single time bin, the same animal only contributes once. Text has been added to the Materials and methods to clarify that this first criterion is selecting for larvae that did not collide.

      The reviewer mentions that Wilcoxon rank-sum test is not the appropriate nonparametric test for dependent samples. We agree. In accordance with this, the test used for within-bin comparisons was Wilcoxon signed-rank, which is also nonparametric but is for dependent samples. We believe, then, that there is no need to reconsider the statistical tests used.

      -The finding that serotoninergic neurons in the VNC, which with the line they used amount to only 2 neurons per VNC hemisegment, are required for operant conditioning is very interesting. It would be great if they could also test whether they are sufficient. It seems that they would just need to make two split Gal4 lines one for tsh and one for tph, so the experiment does not seem too difficult and would significantly add to their findings.

      Generating new intersections is beyond the scope of this already large study which has been significantly impacted by the pandemic. We have therefore added the following sections below explaining that we have identified candidate serotonergic neurons that are required for operant learning and that identifying specific single neuron types that may be sufficient would be an exciting avenue for future follow-up work.

      In the Results section entitled, “Serotonergic VNC neurons may play role in operant conditioning of bend direction” we have added:

      “The Tph-Gal4 expression pattern contains two neurons per VNC hemisegment (with the exception of a single neuron in each A8 abdominal hemisegment, Huser2012). Future experiments exclusively targeting a single serotonergic neuron per VNC hemisegment could be valuable in determining whether they are sufficient for operant learning.”

      In the Discussion section entitled: “Automated operant conditioning of Drosophila larvae”

      “Furthermore, developing sparser lines that target single serotonergic and dopaminergic neuron types will enable the identification of the smallest subsets of neurons that are sufficient for providing the operant learning signal. Behavioural experiments with these genetic lines may have the added benefit of mitigating conflicting or non-specific reinforcement signalling.”

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript is clear and well-written and provides a novel and interesting explanation of different illusions in visual numerosity perception. However, the model used in the manuscript is very similar to Dehaene and Changeux (1993) and the manuscript does not clearly identify novel computational principles underlying the number sense, as the title would suggest. Thus, while we were all enthusiastic about the topic and the overall findings, the paper currently reads as a bit of a replication of the influential Dehaene & Changeux (1993)-model, and the authors need to do more to compare/contrast to bring out the main results that they think are novel.

      Major concerns:

      1) The model presented in the current manuscript is very similar to the Dehaene and Changeux 1993 model. The main difference is in the implementation of lateral inhibition in the DoG layer where the 1993 model used a recurrent implementation, and the current model uses divisive normalization (see minor concern #1). The lateral inhibition was also identified as a critical component of numerosity estimation in the 1993 model, so the novelty in elucidating the computational principles underlying the number sense in the current manuscript is not evident.

      If the authors hypothesize that the particular implementation of lateral inhibition used here is more relevant and critical for the number sense than the forms used in previous work (e.g., the recurrent implementation of the 1993 model or the local response normalization of the more recent models), then a direct comparison of the effects of the different forms is necessary to show this. If not, then the focus of the manuscript should be shifted (e.g., changing the title) to the novel aspects of the manuscript such as the use of the model to explain various visual illusions and adaptation and context effects.

      Thank you for bringing up these issues. We acknowledge that there was a lack of clear explanations for the key differences between the proposed model and that of Dehaene & Changeux (hereafter D&C). Please see our revisions below where we: 1) explain the D&C model and its limitations in more in detail; 2) our critical changes to the D&C model; and 3) how those critical changes allow a novel way to explain numerosity perception.

      The paragraph in the Introduction where we first introduce D&C is modified to read:

      “The computational model of Dehaene and Changeux (1993) explains numerosity detection based on several neurocomputational principles. That model (hereafter D&C) assumes a one-dimensional linear retina (each dot is a line segment), and responses are normalized across dot size via a convolution layer that represents combinations of two attributes: 1) dot size, as captured by difference-of-Gaussian contrast filters of different widths; and 2) location, by centering filters at different positions. In the convolution layer, the filter that matches the size of each dot dominates the neuronal activity at the location of the dot owing to a winner-take-all lateral inhibition process. To indicate numerosity, a summation layer pools the total activity over all the units in the convolution layer. While the D&C model provided a proof of concept for numerosity detection, it has several limitations as outlined in the discussion. Of these, the most notable is that strong winner-take-all in the convolution layer discretizes visual information (e.g., discrete locations and discrete sizes yielding a literal count of dots), which is implausible for early vision. As a result, the output of the model is completely insensitive to anything other than number in all situations, which is inconsistent with empirical data (Park et al., 2021).”

      The revised Discussion describes our critical modifications to D&C and their consequences.

      “At first blush, the current model might be considered an extension of Dehaene and Changeux (1993). However, there are four ways in which the current model differs qualitatively from the D&C model. First, the D&C model is one-dimensional, simulating a linear retina, whereas we model a two-dimensional retina feeding into center-surround filters, allowing application to the two-dimensional images used in numerosity experiments (Fig. 1A). Second, extreme winner-take-all normalization in the convolution layer of the D&C model implausibly limits visual precision by discretizing the visual response. For example, the convolution layer in the D&C model only knows which of 9 possible sizes and 50 possible locations occurred. In contrast, by using divisive normalization in the current model, each dot produces activity at many locations and many filter sizes despite normalization, and a population could be used to determine exact location and size. Third, extreme winner-take-all normalization also eliminates all information other than dot size and location. By using divisive normalization, the current model represents other attributes such edges and groupings of dots (Fig. 1B) and these other attributes provide a different explanation of number sensitivity as compared to D&C. For example, the D&C model as applied to the spacing effect between two small dots (Fig. 4A) would represent the dots as existing discretely at two close locations versus two far locations, with the total summed response being two in either case. In contrast, the current model gives the same total response for a different reason. Although the small filters are less active for closely spaced dots, the closely spaced dots look like a group as captured by a larger filter, with this addition for the larger filter offsetting the loss for the smaller filter. Similarly, as applied to the dot size effect (Fig. 4B), the D&C model would only represent the larger dots using larger filters. In contrast, the current model represents larger dots with larger filters and with smaller filters that capture the edges of the larger dots, and yet the summed response remains the same in each case owing to divisive normalization (again, there are offsetting factors across different filter sizes). The final difference is that the D&C model does not include temporal normalization, which we show to be critical for explaining adaptation and context effects.”

      In sum, the current model explains a wider range of effects by using representations and processes that more closely reflect early vision. The change to two-dimensions allows application to real images. The inclusion of temporal normalization allows application to temporal effects. The change from winner-take-all to divisive normalization might appear to be a parameter setting, but it’s one that produces qualitatively different results and explanations (e.g., representations of edges and groupings that are part of the explanation of selective sensitivity to number). These behaviors are consistent with empirical data and are qualitatively different from that of the D&C model. Now that we’ve highlighted the ways in which this model differs qualitatively from the D&C model, we hope that our original title still works.

      Reviewer #2 (Public Review):

      This is a very interesting and novel model of numerosity perception, based on known computational principles of the visual system: center-surround mechanisms at various scales, combined with divisive normalization (over space and time). The model explains, at least qualitatively, several of the important aspects of numerosity perception.

      Firstly, the model makes major and minor predictions. Major: the effect of adaptation, at least 30%, as well as impendence of several densities and dot size; minor: tiny effects like irregularity, around 6%. I think it would make sense to separate these. To my knowledge, it is the first to account for adaptation, which was the major effect that brought numerosity into the realm of psychophysics: and it explains it effortlessly, using an intrinsic component of the model (divisive normalization), not with an ad-hoc add-on. This should be highlighted more. And perhaps, the fit can be more quantitative. Murphy and Burr (who they cite) showed that the adaptation is rapid. How does this fit the model? Very well, I would have thought.

      Thanks for the positive evaluation of our work. In the revised manuscript, we followed the reviewer’s suggestion to highlight the novelty of the model in its explanation of numerosity adaptation. As the reviewer says, one significant aspect of our work is that the model can explain a relatively large effect of numerosity adaptation with minimal effort. To be clear, even though we call it “numerosity” adaptation, the model does not know number in any explicit way. One way to highlight this aspect, we thought, is to compare the current adaptation results to a simulation where the adaptor and target are defined along the dimensions of size or spacing. In such cases (which are now reported in Fig. S6 and S7), no reliable under- or over-estimation was observed. These results suggest that numerosity adaptation is a natural byproduct of divisive normalization working across space and time.

      The question about the rapidity of adaptation is indeed an interesting one. However, the current model is not designed to simulate the effect of exposure duration on neural activity. More specifically, the current model operates across trials and stimuli (e.g., one response per stimulus), using a single parameter that captures the temporal gradient of divisive normalization from prior trials (e.g., the influence of two trials ago as compared to one trial ago). As currently formulated, the model does not address adaptation at the level of milliseconds, as would be necessary to model adaptor duration. To model adaptation at the millisecond level requires a dynamic model that not only specifies the rate of adaptation but also the rate of recovery from adaptation, such as in the visual orientation adaptation model of Jacob, Potter, and Huber (2021), which includes the dynamics of synaptic depression and synaptic recovery. In future work we hope to make such modifications to the model to expand the range of explained effects. Nevertheless, a dynamic version of the model should encompass this simpler trial-by-trial version of the model as a special case. Our goal in this study was a clear demonstration of the neural mechanisms underlying numerosity in early vision and so we have attempted to keep the model as simple as possible while still capturing neural behavior.

      We have elected not to fit data and instead we explored the behavior model in a qualitative way, asking whether the commonly observed numerosity effects emerge from the model in the qualitatively correct direction regardless of its parameter values (e.g., as reported in Fig S2). This approach follows from our central aim, which is to explain the neurocomputational principles of the number sense rather than produce a detailed model with specific parameters values fit to data. Our aim was to show that the correct qualitative behaviors naturally emerge from these principles without requiring specific parameter values (and more importantly, to show how these behaviors emerge from these principles).

      Jacob, L. P., Potter, K. W., & Huber, D. E. (2021). A neural habituation account of the negative compatibility effect. Journal of Experimental Psychology: General, 150(12), 2567.

      Among the tiny predicted effects (visually indistinguishable bar graphs) is the connectedness effect. But this is in fact large, up to 20%. I would say they fail here, by predicting only 6%. And I would say this is to be expected, as the illusion relies on higher-order properties (grouping), which would not immediately result from normalization. Furthermore, the illusion varies with individual personality traits (Pomè et al, JAD, 2021). The fact that it works with very thin lines suggests that it is not the physical energy of the lines that normalizes, but the perceptual grouping effect. I would either drop it, or give it as an example of where the predictions are in the right direction, but clearly fall short quantitatively. No shame in saying that they cannot explain everything with low-level mechanisms. A future revised model could incorporate grouping phenomena.

      Thank you for the suggestion. We agree that trying to explain the connectedness illusion with center-surround filters is not ideal. As the reviewer says, the main driver of the connectedness illusion is likely to be groupings of dots. The current model captures groupings of dots, but it does so in a circularly symmetric way, which is not ideal for capturing the oblong groupings (barbells) that are likely to play a role in the connectedness illusion. It is probably because of this mismatch (between the shape of the groupings and shape of the filters) that the model produces a smaller magnitude connectedness illusion. If the model included a subsequent convolution layer in which the filters were oriented lines of different sizes, it would likely produce a larger connectedness illusion. Following the reviewer’s suggestion, we have placed the connectedness illusion in the supplementary materials and only refer to this in the future directions section of the discussion, writing:

      “Another line of possible future work concerns divisive normalization in higher cortical levels involving neurons with more complex receptive fields. While the current normalization model with center-surround filters successfully explained visual illusions caused by regularity, grouping, and heterogeneity, other numerosity phenomena such as topological invariants and statistical pairing (He et al., 2015; Zhao and Yu, 2016) may require the action of neurons with receptive fields that are more complex than center-surround filters. For example, another well-known visual illusion is the effect of connectedness, whereby an array with dots connected pairwise with thin lines is underestimated (by up to 20%) compared to the same array without the lines connected (Franconeri et al., 2009). This underestimation effect likely arises from barbell-shaped pairwise groupings of dots, rather than the circularly symmetric groupings of dots that are captured with center-surround filters. Nonetheless, a small magnitude (6%) connectedness illusion emerges with center-surround filters (Fig. S10). Augmenting the current model with a subsequent convolution layer containing oriented line filters and oriented normalization neighborhoods of different sizes might increase the predicted magnitude of the illusion.”

      In short, I like the model very much, but think the manuscript could be packaged better. Bring out the large effects more, especially those that have never been explained previously (like adaptation). And try to be more quantitative.

      Thank you. We now highlight the novel computational demonstrations of adaptation to a greater degree and—as also suggested by Reviewer 1—provide more quantitative reports of the illusory effects that the model naturally produces.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors leverage novel computational tools to detect, classify and extract information underlying sharp-wave ripples, and synchronous events related to memory. They validate the applicability of their method to several datasets and compare it with a filtering method. In summary, they found that their convolutional neural network detection captures more events than the commonly used filter method. This particular capability of capturing additional events which traditional methods don't detect is very powerful and could open important new avenues worth further investigation. The manuscript in general will be very useful for the community as it will increase the attention towards new tools that can be used to solve ongoing questions in hippocampal physiology.

      We thank the reviewer for the constructive comments and appreciation of the work.

      Additional minor points that could improve the interpretation of this work are listed below:

      • Spectral methods could also be used to capture the variability of events if used properly or run several times through a dataset. I think adjusting the statements where the authors compare CNN with traditional filter detections could be useful as it can be misleading to state otherwise.

      We thank the reviewer for this suggestion. We would like to emphasize that we do not advocate at all for disusing filters. We feel that a combination of methods is required to improve our understanding of the complex electrophysiological processes underlying SWR. We have adjusted the text as suggested. In particular, a) we removed the misleading sentence from the abstract, and instead declared the need for new automatic detection strategies; b) we edited the introduction similarly, and clarified the need for improved online applications.

      • The authors show that their novel method is able to detect "physiological relevant processes" but no further analysis is provided to show that this is indeed the case. I suggest adjusting the statement to "the method is able to detect new processes (or events)".

      We have corrected text as suggested. In particular, we declare that “The new method, in combination with community tagging efforts and optimized filter, could potentially facilitate discovery and interpretation of the complex neurophysiological processes underlying SWR.” (page 12).

      • In Fig.1 the authors show how they tune the parameters that work best for their CNN method and from there they compare it with a filter method. In order to offer a more fair comparison analogous tuning of the filter parameters should be tested alongside to show that filters can also be tuned to improve the detection of "ground truth" data.

      Thank you for this comment. As explained before, see below the results of the parameter study for the filter in the very same sessions used for training the CNN. The parameters chosen (100- 300Hz band, order 2) provided maximal performance in the test set. Therefore, both methods are similarly optimized along training. This is now included (page 4): “In order to compare CNN performance against spectral methods, we implemented a Butterworth filter, which parameters were optimized using the same training set (Fig.1-figure supplement 1D).”

      • Showing a manual score of the performance of their CNN method detection with false positive and false negative flags (and plots) would be clarifying in order to get an idea of the type of events that the method is able to detect and fails to detect.

      We have added information of the categories of False Positives for both the CNN and the filter in the new Fig.4F. We have also prepared an executable figure to show examples and to facilitate understanding how the CNN works. See new Fig.5 and executable notebook https://colab.research.google.com/github/PridaLab/cnn-ripple-executable-figure/blob/main/cnn-ripple-false-positive-examples.ipynb

      • In fig 2E the authors show the differences between CNN with different precision and the filter method, while the performance is better the trends are extremely similar and the numbers are very close for all comparisons (except for the recall where the filter clearly performs worse than CNN).

      This refers to the external dataset (Grosmark and Buzsaki 2016), which is now in the new Fig.3E. To address this point and to improve statistical report, we have added more data resulting in 5 sessions from 2 rats. Data confirm better performance of CNN model versus the filter. The purpose of this figure is to show the effect of the definition of the ground truth on the performance by different methods, and also the proper performance of the CNN on external datasets without retraining. Please, note that in Grosmark and Buzsaki, SWR detection was conditioned on the

      coincidence of both population synchrony and LFP definition thus providing a “partial ground truth” (i.e. SWR without population firing were not annotated in the dataset).

      • The authors acknowledge that various forms of SWRs not consistent with their common definition could be captured by their method. But theoretically, it could also be the case that, due to the spectral continuum of the LFP signals, noisy features of the LFP could also be passed as "relevant events"? Discussing this point in the manuscript could help with the context of where the method might be applied in the future.

      As suggested, we have mentioned this point in the revised version. In particular: “While we cannot discard noisy detection from a continuum of LFP activity, our categorization suggest they may reflect processes underlying buildup of population events (de la Prida et al., 2006). In addition, the ability of CA3 inputs to bring about gamma oscillations and multi-unit firing associated with sharp-waves is already recognized (Sullivan et al., 2011), and variability of the ripple power can be related with different cortical subnetworks (Abadchi et al., 2020; Ramirez- Villegas et al., 2015). Since the power spectral level operationally defines the detection of SWR, part of this microcircuit intrinsic variability may be escaping analysis when using spectral filters” (page 16).

      • In fig. 5 the authors claim that there are striking differences in firing rate and timings of pyramidal cells when comparing events detected in different layers (compare to SP layer). This is not very clear from the figure as the plots 5G and 5H show that the main differences are when compare with SO and SLM.

      We apologize for generating confusion. We meant that the analysis was performed by comparing properties of SWR detected at SO, SR and SLM using z- values scored by SWR detected at SP only). We clarified this point in the revised version: “We found larger sinks and sources for SWR that can be detected at SLM and SR versus those detected at SO (Fig.7G; z-scored by mean values of SWR detected at SP only).” (page 14).

      • Could the above differences be related to the fact that the performance of the CNN could have different percentages of false-positive when applied to different layers?

      The rate of FP is similar/different across layers: 0.52 ± 0.21 for SO, 0.50 ± 0.21 for SR and 0.46 ± 0.19 for SLM. This is now mentioned in the text: “No difference in the rate of False Positives between SO (0.52 ± 0.21), SR (0.50 ± 0.21) and SLM (0.46 ± 0.19) can account for this effect.” (page 12)

      Alternatively, could the variability be related to the occurrence (and detection) of similar events in neighboring spectral bands (i.e., gamma events)? Discussion of this point in the manuscript would be helpful for the readers.

      We have discussed this point: “While we cannot discard noisy detection from a continuum of LFP activity, our categorization suggest they may reflect processes underlying buildup of population events (de la Prida et al., 2006). In addition, the ability of CA3 inputs to bring about gamma oscillations and multi-unit firing associated with sharp-waves is already recognized (Sullivan et al., 2011), and variability of the ripple power can be related with different cortical subnetworks (Abadchi et al., 2020; Ramirez-Villegas et al., 2015).” (Page 16)

      Overall, I think the method is interesting and could be very useful to detect more nuance within hippocampal LFPs and offer new insights into the underlying mechanisms of hippocampal firing and how they organize in various forms of network events related to memory.

      We thank the reviewer for constructive comments and appreciation of the value of our work.

      Reviewer #2 (Public Review):

      Navas-Olive et al. provide a new computational approach that implements convolutional neural networks (CNNs) for detecting and characterizing hippocampal sharp-wave ripples (SWRs). SWRs have been identified as important neural signatures of memory consolidation and retrieval, and there is therefore interest in developing new computational approaches to identify and characterize them. The authors demonstrate that their network model is able to learn to identify SWRs by showing that, following the network training phase, performance on test data is good. Performance of the network varied by the human expert whose tagging was used to train it, but when experts' tags were combined, performance of the network improved, showing it benefits from multiple input. When the network trained on one dataset is applied to data from different experimental conditions, performance was substantially lower, though the authors suggest that this reflected erroneous annotation of the data, and once corrected performance improved. The authors go on to analyze the LFP patterns that nodes in the network develop preferences for and compare the network's performance on SWRs and non-SWRs, both providing insight and validation about the network's function. Finally, the authors apply the model to dense Neuropixels data and confirmed that SWR detection was best in the CA1 cell layer but could also be detected at more distant locations.

      The key strengths of the manuscript lay in a convincing demonstration that a computational model that does not explicitly look for oscillations in specific frequency bands can nevertheless learn to detect them from tagged examples. This provides insight into the capabilities and applications of convolutional neural networks. The manuscript is generally clearly written and the analyses appear to have been carefully done.

      We thank the reviewer for the summary and for highlighting the strengths of our work.

      While the work is informative about the capabilities of CNNs, the potential of its application for neuroscience research is considerably less convincing. As the authors state in the introduction, there are two potential key benefits that their model could provide (for neuroscience research): 1. improved detection of SWRs and 2. providing additional insight into the nature of SWRs, relative to existing approaches. To this end, the authors compare the performance of the CNN to that of a Butterworth filter. However, there are a number of major issues that limit the support for the authors' claims:

      Please, see below the answers to specific questions, which we hope clarify the validity of our approach

      • Putting aside the question of whether the comparison between the CNN and the filter is fair (see below), it is unclear if even as is, the performance of the CNN is better than a simple filter. The authors argue for this based on the data in Fig. 1F-I. However, the main result appears to be that the CNN is less sensitive to changes in the threshold, not that it does better at reasonable thresholds.

      This comment now refers to the new Fig.2A (offline detection) and Fig.2C,D (online detection). Starting from offline detection, yes, the CNN is less sensitive than the filter and that has major consequences both offline and online. For the filter to reach it best performance, the threshold has to be tuned which is a time-consuming process. Importantly, this is only doable when you know the ground truth. In practical terms, most lab run a semi-automatic detection approach where they first detect events and then they are manually validated. The fact that the filter is more sensible to thresholds makes this process very tedious. Instead, the CNN is more stable.

      In trying to be fair, we also tested the performance of the CNN and the filter at their best performance (i.e. looking for the threshold f¡providing the best matching with the ground truth). This is shown at Fig.3A. There are no differences between methods indicating the CNN meet the gold standard provided the filter is optimized. Note again this is only possible if you know the ground truth because optimization is based in looking for the best threshold per session.

      Importantly, both methods reach their best performance at the expert’s limit (gray band in Fig.3A,B). They cannot be better than the individual ground truth. This is why we advocate for community tagging collaborations to consolidate sharp-wave ripple definitions.

      Moreover, the mean performance of the filter across thresholds appears dramatically dampened by its performance on particularly poor thresholds (Fig. F, I, weak traces). How realistic these poorly tested thresholds are is unclear. The single direct statistical test of difference in performance is presented in Fig. 1H but it is unclear if there is a real difference there as graphically it appears that animals and sessions from those animals were treated as independent samples (and comparing only animal averages or only sessions clearly do not show a significant difference).

      Please, note this refers to online detection. We are not sure to understand the comment on whether the thresholds are realistic. To clarify, we detect SWR online using thresholds we similarly optimize for the filter and the CNN over the course of the experiment. This is reported in Fig.2C as both, per session and per animals, reaching statistical differences (we added more experiments to increase statistical power). Since, online defined thresholds may still not been the best, we then annotated these data and run an additional posthoc offline optimization analysis which is presented in Fig.2D. We hope this is now more clear in the revised version.

      Finally, the authors show in Fig. 2A that for the best threshold the CNN does not do better than the filter. Together, these results suggest that the CNN does not generally outperform the filter in detecting SWRs, but only that it is less sensitive to usage of extreme thresholds.

      We hope this is now clarified. See our response to your first bullet point

      Indeed, I am not convinced that a non-spectral method could even theoretically do better than a spectral method to detect events that are defined by their spectrum, assuming all other aspects are optimized (such as combining data from different channels and threshold setting)

      As can be seen in the responses to the editor synthesis, we have optimized the filter parameter similarly (new Fig.1-supp-1D) and there is no improvement by using more channels (see below). In any case, we would like to emphasize that we do not advocate at all for disusing filters. We feel that a combination of methods is required to improve our understanding of the complex electrophysiological processes underlying SWR.

      • The CNN network is trained on data from 8 channels but it appears that the compared filter is run on a single channel only. This is explicitly stated for the online SWR detection and presumably, that is the case for the offline as well. This unfair comparison raises the possibility that whatever improved performance the CNN may have may be due to considerably richer input and not due to the CNN model itself. The authors state that a filter on the data from a single channel is the standard, but many studies use various "consensus" heuristics, e.g. in which elevated ripple power is required to be detected on multiple channels simultaneously, which considerably improves detection reliability. Even if this weren't the case, because the CNN learns how to weight each channel, to argue that better performance is due to the nature of the CNN it must be compared to an algorithm that similarly learns to optimize these weights on filtered data across the same number of channels. It is very likely that if this were done, the filter approach would outperform the CNN as its performance with a single channel is comparable.

      We appreciate this comment. Using one channel to detect SWR is very common for offline detection followed by manual curation. In some cases, a second channel is used either to veto spurious detections (using a non-ripple channel) or to confirm detection (using a second ripple channel and/or a sharp-wave) (Fernandez-Ruiz et al., 2019). Many others use detection of population firing together with the filter to identify replay (such as in Grosmark and Buzsaki 2019, where ripples were conditioned on the coincidence of both population firing and LFP detected ripples). To address this comment, we compared performance using different combinations of channels, from the standard detection at the SP layer (pyr) up to 4 and 8 channels around SP using the consensus heuristics. As can be seen filter performance is consistent across configurations and using 8 channels is not improving detection. We clarify this in the revised version: ”We found no effect of the number of channels used for the filter (1, 4 and 8 channels), and chose that with the higher ripple power” (see caption of Fig.1-supp-1D).

      • Related to the point above, for the proposed CNN model to be a useful tool in the neuroscience field it needs to be amenable to the kind of data and computational resources that are common in the field. As the network requires 8 channels situated in close proximity, the network would not be relevant for numerous studies that use fewer or spaced channels. Further, the filter approach does not require training and it is unclear how generalizable the current CNN model is without additional network training (see below). Together, these points raise the concern that even if the CNN performance is better than a filter approach, it would not be usable by a wide audience.

      Thank you for this comment. To handle with different input channel configurations, we have developed an interpolation approach, which transform any data into 8-channel inputs. We are currently applying the CNN without re-training to data from several labs using different electrode number and configurations, including tetrodes, linear silicon probes and wires. Results confirm performance of the CNN. Since we cannot disclose these third-party data here, we have looked for a new dataset from our own lab to illustrate the case. See below results from 16ch silicon probes (100 um inter-electrode separation), where the CNN performed better than the filter (F1: p=0.0169; Precision, p=0.0110; 7 sessions, from 3 mice). We found that the performance of the CNN depends on the laminar LFP profile, as Neuropixels data illustrate.

      • A key point is whether the CNN generalizes well across new datasets as the authors suggest. When the model trained on mouse data was applied to rat data from Grosmark and Buzsaki, 2016, precision was low. The authors state that "Hence, we evaluated all False Positive predictions and found that many of them were actually unannotated SWR (839 events), meaning that precision was actually higher". How were these events judged as SWRs? Was the test data reannotated?

      We apologize for not explaining this better in the original version. We choose Grosmark and Buzsaki 2016 because it provides an “incomplete ground truth”, since (citing their Methods) “Ripple events were conditioned on the coincidence of both population synchrony events, and LFP detected ripples”. This means there are LFP ripples not included in their GT. This dataset provides a very good example of how the experimental goal (examining replay and thus relying in population firing plus LFP definitions) may limit the ground truth.

      Please, note we use the external dataset for validation purposes only. The CNN model was applied without retraining, so it also helps to exemplify generalization. Consistent with a partial ground truth, the CNN and the filter recalled most of the annotated events, but precision was low. By manually validating False Positive detections, we re-annotated the external dataset and both the CNN and the filter increased precision.

      To make the case clearer, we now include more sessions to increase the data size and test for statistical effects (Fig.3E). We also changed the example to show more cases of re-annotated events (Fig.3D). We have clarified the text: “In that work, SWR detection was conditioned on the coincidence of both population synchrony and LFP definition, thus providing a “partial ground truth” (i.e. SWR without population firing were not annotated in the dataset).” (see page 7).

      • The argument that the network improves with data from multiple experts while the filter does not requires further support. While Fig. 1B shows that the CNN improves performance when the experts' data is combined and the filter doesn't, the final performance on the consolidated data does not appear better in the CNN. This suggests that performance of the CNN when trained on data from single experts was lower to start with.

      This comment refers to the new Fig.3B. We apologize for not have had included a between- method comparison in the original version. To address this, we now include a one-way ANOVA analysis for the effect of the type of the ground truth on each method, and an independent one- way ANOVA for the effect of the method in the consolidated ground truth. To increase statistical power we have added more data. We also detected some mistake with duplicated data in the original figure, which was corrected. Importantly, the rationale behind experts’ consolidated data is that there is about 70% consistency between experts and so many SWR remain not annotated in the individual ground truths. These are typically some ambiguous events, which may generate discussion between experts, such as sharp-wave with population firing and few ripple cycles. Since the CNN is better in detecting them, this is the reason supporting they improve performance when data from multiple experts are integrated.

      Further, regardless of the point in the bullet point above, the data in Fig. 1E does not convincingly show that the CNN improves while the filter doesn't as there are only 3 data points per comparison and no effect on F1.

      Fig.1E shows an example, so we guess the reviewer refers to the new Fig.2C, which show data on online operation, where we originally reported the analysis per session and per animal separately with only 3 mice. We have run more experiments to increase the data size and test for statistical effects (8 sessions, 5 mice; per sessions p=0.0047; per mice p=0.033; t-test). This is now corrected in the text and Fig.1C, caption. Please, note that a posthoc offline evaluation of these online sessions confirmed better performance of the CNN versus the filter, for all normalized thresholds (Fig.2D).

      • Apart from the points above regarding the ability of the network to detect SWRs, the insight into the nature of SWRs that the authors suggest can be achieved with CNNs is limited. For example, the data in Fig. 3 is a nice analysis of what the components of the CNN learn to identify, but the claim that "some predictions not consistent with the current definition of SWR may identify different forms of population firing and oscillatory activities associated to sharp-waves" is not thoroughly supported. The data in Fig. 4 is convincing in showing that the network better identifies SWRs than non-SWRs, but again the insight is about the network rather than about SWRs.

      In the revised version, have now include validation of all false positives detected by the CNN and the filter (Fig.4F). To facilitate the reader examining examples of True Positive and False Positive detection we also include a new figure (Fig.5), which comes with the executable code (see page 9). We also include comparisons of the features of TP events detected by both methods (Fig.2B), where is shown that SWR events detected by the CNN exhibited features more similar to those of the ground truth (GT), than those detected by the filter. We feel the entire manuscript provides support to these claims.

      Finally, the application of the model on Neuropixels data also nicely demonstrates the applicability of the model on this kind of data but does not provide new insight regarding SWRs.

      We respectfully disagree. Please, note that application to ultra-dense Neuropixels not only apply the model to an entirely new dataset without retraining, but it shows that some SWR with larger sinks and sources can be actually detected at input layers (SO, SR and SLM). Importantly, those events result in different firing dynamics providing mechanistic support for heterogeneous behavior underlying, for instance, replay.

      In summary, the authors have constructed an elegant new computational tool and convincingly shown its validity in detecting SWRs and applicability to different kinds of data. Unfortunately, I am not convinced that the model convincingly achieves either of its stated goals: exceeding the performance of SWR detection or providing new insights about SWRs as compared to considerably simpler and more accessible current methods.

      We thank you again for your constructive comments. We hope you are now convinced on the value of the new method in light to the new added data.

    1. Author Response

      We thank the reviewers for their very thorough, detailed, and fair reviews that will help us improve the manuscript. We have two minor comments. First, we emphasize that the evidence is for pervasive positive selection being the main driver of the genetic diversity of Atlantic cod. Secondly, regarding the application of the Moran process to model the reproduction of high fecundity organisms. In the Moran process, a single individual is chosen at random to reproduce at any time, and another individual is chosen to die. However, the parent also persists in the population and can generate a large number of offspring in its lifetime. Hence, the Moran process does not imply an especially low level of fecundity. The multiple mergers seen in coalescent models of highly fecund organisms arise from a combination of high fecundity and reproductive skew; models of high fecundity without skewness are consistent with genealogies with binary mergers only. Hence, the Durrett-Schweinsberg model we employ can be thought of as a model for a highly fecund organism for which reproductive skewness manifests through selective sweeps.

    1. Author Response

      Public Evaluation Summary:

      This is potentially an interesting paper in which extensive MD simulations are used to probe the effect of phosphorylation of a tyrosine residue on the conformational ensemble of Ras GTPase. The insights form the basis for a screen of small molecule(s) that disrupt interaction with its target Raf kinase, and predictions are tested experimentally. Overall, the integrated approach is of interest to a wide range of biochemist and protein scientists and could potentially be used to modulate the activities of other proteins.

      We would like to thank the reviewers for their valuable comments/suggestions. We provided detailed responses to the questions raised by the reviewers and also submit the revised manuscript where the modified parts are highlighted in yellow. We believe that the original manuscript is improved in light of these changes.

      In the revised version, we (i) increased the number of replicates of MD simulations to four per system studied, (ii) extended previous simulations, which were presented in the original submission, up to 1 µs to test the statistical significance of the main results, and (iii) increased the number of SMDs to 70 per system. We provided time-line data for each replicate of the classical MD simulation in the SI and showed the results obtained from these combined trajectories in the main text along with respective statistical error values. We also repeated calculations such as RMSF, PCA, and the number of waters including the new trajectories and provided updated values/distribution plots in the revised version.

      In general, we obtained similar results to those presented in the original submission except the flexibilities of G60 and Q61. They seemed to display similar behavior among the systems studied as presented in Table 1 upon inclusion of the new replicates. On the other hand, the two residues reached relatively higher RMSF values in the phosphorylated RAS when considering the error values calculated. We presented these values in Table 1 and revised the text accordingly.

      Also, we revised a part in the original submission pertaining to the criterion used for describing the opening of the nucleotide binding pocket in HRASG12D. We noticed that Q61 was not considered for describing the wideness of the nucleotide binding pocket in the references provided. It is also important to mention that the opening of the nucleotide binding pocket, which was described by the distance measured between the Cα atoms of D12 and D34, did not change by the distance measured between the side chain of Q61 and γ-phosphate atom of GTP. Therefore, we dropped the respective distribution of Q61 in the revised version.

      In the application of the PSP methodology, we increased the number of SMD simulations for each of the ligand-bound and ligand-free systems to 70. We also made a more detailed analysis of the results, and we can now rely on not just the qualitative features of the PMFs, but also on the quantities obtained. In particular, the large barrier to cavity opening (ca. 30 kcal/mol) in the ligand-bound form is now clearly shown, and the fact that cerubidine binding leads to a barrierless transition that requires about 1/3 of the energy is demonstrated.

      Reviewer #3 (Public Review):

      In their manuscript "Inhibition of mutant RAS-RAF interaction by mimicking structural and dynamic properties of phosphorylated RAS", Ilter, Kasmer, et al. search for druggable sites in the RAS mutant G12D in computer calculations, and verify their results by experiments. RAS is a major oncogene for various types of cancer and is notoriously hard to target with drugs. Any significant insight into how to find drugs targeting RAS mutants is therefore of high interest. The present manuscript tries to provide such insight, and the connection between simulation and theory appears sound, as the identified compound cerubidine apparently indeed blocks mutant RAS activity.

      As I am an expert in simulations, but not in experiments, I will only focus on the presented computational part. In this function, however, I see some significant problems with the results: The data basis that the authors base their analysis on is quite small (only two simulations of 2.5 µs total simulation time), and from the presented data set I do not see any information on if the results on Y32 dynamics are anecdotal or reproducible. All presented distance distribution plots miss error bars/error ranges, as well as some time course plots that the simulations have indeed converged. So I cannot confirm whether the presented results are valid or if the authors were just lucky in their small data set.

      We would like to thank the reviewer for sharing his comments pertaining to inadequacy of the data used. During the revision period, we performed additional simulations to have four replicates, each of which is about 1 µs, per system. For ligand-bound RAS systems, we ran the simulations until Switch I was displaced from the nucleotide-binding pocket and extended it for an additional ca. 200-300 ns to check if it comes back to its original position. Respective time-line plots of replicates of both ligand-bound and non-liganded systems were provided in Figures S4 – S6 and S11-S14 the SI of the revised MS. We also provided error values in the caption of corresponding figures in the main text. The updated simulation times were provided in the methods section. We presented the total simulation times of each ligand-bound RAS system in the SI.

      To show the convergence of the systems, we provided RMSD profiles for each replicate of the system studied in panel A of Figures S4–S6 and S11-S14. For HRASWT, HRASG12D, and HRASPY32, RMSDs reached a plateau after some time while those of ligand-bound systems did not, as Switch I was highly fluctuating. Importantly, we observed similar behavior in each replicate of the systems so it can be said that the results presented in the original MS are reproducible.

      Interestingly, Switch I was displaced in one of the four replicates of HRASG12D which might lead to the release of the nucleotide from the pocket, thus triggering transitioning towards the apo state. In fact, this observation does not contradict with the findings in the literature.

      It has been shown that mutant RAS can also adopt the apo state albeit with low probability due to its low intrinsic GTPase activity. Therefore, except for KRASG12C, which has a relatively higher intrinsic GTPase activity, either the GDP or GTP-bound state of RAS mutants have been targeted for therapeutic purposes. This information is now included in the manuscript on page 2 of the current version.

      Furthermore, it might be that I have overlooked this information, but this work is not the first finding of druggable sites in RAS (see e.g. review of Moor et al., Nat. Rev. Drug Discov. 2020). The authors should include such a comparison in their manuscript.

      We would like to thank the reviewer for suggesting this comprehensive review. We included it along with the sentence below in the revised version of the manuscript (page 2):

      ‘In these studies, the mutant RAS was targeted directly or in combination with other proteins including SOS, tyrosine kinase, SHP2, and RAF. Also, except for the KRASG12C mutant, the GTP-bound state has been targeted, as RAS mutants either lose their intrinsic or GAP-mediated GTPase activity. However, the intrinsic GTPase activity of KRASG12C is relatively higher than the other mutants which enables targeting the GDP-bound state of KRAS (Moor et al., Nat. Rev. Drug Discov. 2020).’

      We would also like to clarify that we do not claim our study is the first in the field presenting druggable sites in RAS but rather we claim that the study provides a perspective for mimicking the impact of phosphorylation in targeting undruggable mutant RAS.

      Especially the PMF presented in Figure 9 is erroneous, and all arguments based on this plot need to be discarded from the manuscript. From the Methods and Eq. (9), I assume the authors indeed use only the first two cumulants to calculate the PMF. The artificially low PMF with a difference of up to ~800 kcal/mol is a well-understood artefact (see Jäger et al., J. Chem. Mol. Model. 2022) that indicates the breakdown of the second-order approximation in Eq. (9) due to the presence of different pathways in the steered MD data set. This artefact overlays the PMF and obfuscates any information on the true free energy profile.

      We thank the reviewer for these details. The pulling directions remain the same. We indeed found that the absence of enough number of samples along with the breakdown of the second-order approximation due to the presence of different pathways in the SMD data set led to this behavior. We have also included a more detailed error analysis by implementing block averaging (this information now appears on page 18). We hope that the conclusions we draw from the updated PMF curves support the findings to the satisfaction of the reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      A limitation here is that this colony morphology only seems to manifest strongly in mutants lacking flagella, which I don't think is common among wild P. aeruginosa isolates. To the extent that groups of P. aeruginosa cells have been imaged in situ, e.g. in the sputum of CF patients, this kind of channel formation does not occur in more realistic conditions. See DePas et al. (2015) https://journals.asm.org/doi/epub/10.1128/mBio.00796-16. I think it's more likely that this colony morphology is idiosyncratic to the agar growth substrate on which the cells are growing in this case, so the more interesting thing here is the physics of the system rather than its applications to clinical or ecological settings.

      We thank the Reviewer for appreciating the novelty of our work. We have revised the third paragraph of Discussion section to limit the generality of our findings in clinical or ecological settings (lines 440-456). Results of imaging P. aeruginosa cells in situ in sputum samples from cystic fibrosis patients are compared, and the shortage of using flagellum mutants is highlighted.

      The authors have established that flgK-null P. aeruginosa forms colonies with channels in this agar growth and incubation environment, and made a strong case for the physics underlying the spontaneous formation of this morphology. The idea that this morphology reflects a multicellular developmental program for P. aeruginosa is not strong, though, as this morphology is not found in the wild. In general, the idea that groups of microbes on agar are analogous to multicellular organisms with circulatory systems has little support from in-situ imaging experiments, or from fundamental evolutionary theory. So, I would advise shifting the introduction and discussion away from the multicellular organism focus toward a greater focus on the physics of the system and its potential for synthetic systems. See for example Yan et al. (2019) https://elifesciences.org/articles/43920

      We thank the Reviewer for the suggestion. We now focus more on the physics of canal formation in Introduction and Discussion (revising/adding texts in lines 93-99 and restructuring the paragraphs in Discussion section). We also put greater emphasis on the application of our findings for engineering living materials based on synthetic microbial consortia (lines 58-64, 428-438), while deleting the texts related to the implication for multicellularity in Introduction/Discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The study presented by AL Seufert et al. follows the trajectory of trained immunity research in the context of sterile inflammatory diseases such as gout, cardiovascular disease and obesity. Previous studies in mice have shown that a 4 week Western-type diet is sufficient to induce systemic trained immunity, with gross reorganization of the bone marrow to support a potentiated inflammatory response [PMID: 29328911]. The current study demonstrates that mice on a Western-type diet (WD) and the more extreme Ketogenic diet (KD; where carbohydrates are essentially eliminated from the diet) for 2 weeks results in a state of increased monocyte-driven immune responsiveness when compared to standard chow diets (SC). This increased immune responsiveness after high-fat diet resulted in a deadly hyper-inflammatory in the mice in response to endotoxin (LPS) challenge in vivo.

      These initial findings as displayed in Figure 1 are made difficult to interpret because the authors use a mix of male and female mice coupled with very small sample sizes (n = 5 - 9). Male and female mice are shown to have dimorphic responses to LPS exposure in vivo, with males having elevated cytokine levels (TNF, IL-6, IL1β, and also interesting IL-10) increased rates severe outcomes to LPS challenge [PMID: 27631979]. As a reader it is impossible to discern from their methodological description what the proportion of the sexes were in each group, and therefore cannot determine if their data are skewed or biased due to sexual dimorphic responses to LPS rather than diet. Additionally due to the very small sample sizes, the authors can't perform a stratified analysis based on sex to determine whether the diets are having the greatest effects in accordance with LPS induce inflammation.

      The Reviewer brings up an important point, all studies with endotoxemia in wild-type conventional mice were carried out in 6–8-week female BALB/c mice, as mentioned in the Methods section under “Ethical approval of animal studies” and “endotoxin-induced model of sepsis” sections. This is extremely important to mention more clearly in the results text, because the Reviewer 1 is correct, sexual dimorphism and age differences can have very large effects on LPS treatment outcome. This was not stated clearly enough in the results and now the age, sex, and background of mice have been explicitly stated in each Results and Figure Legend section for each experiment.

      When comparing SC to the KD, the authors identify large changes in fatty acid distribution circulating in the blood. The majority of the fatty acids were shown to relate to saturated fatty acids (SFA). Although Lauric, Myristic, and Myristovaccenic acid where the most altered after KD, the authors focus their research on the more thoroughly studied palmitic acid (PA).

      We followed up on multiple saturated fatty acids (SFAs; Myristic, Lauric, and Behenic acid) that were identified in the lipidomic data, and found no robust or repeatable phenotypes in vitro using physiologically relevant concentrations. The inability to reproduce some of the findings with these SFAs may be due to the instability of some of these fats in solution, and plan to troubleshoot these assays in order to understand the complexity of SFA-dependent control of inflammation in macrophages. Please see Fig. R1 in this document for data showing LPS-stimulated BMDMs pre-treated with Myristic (Fig R1 A-C), Lauric (Fig R1 D-F), or Behenic (Fig R1 G-I) fatty acids. The physiological concentrations used in these studies were referenced from Perreault et. al., 2014.

      Figure R1. The effect of Myristic Acid, Lauric Acid, and Behenic Acid on the response to LPS in macrophages. Primary bone marrowderived macrophages (BMDMs) were isolated from aged-matched (6-8 wk) C57BL/6 female and male mice. BMDMs were plated at 1x106 cells/mL and treated with either ethanol (EtOH; media with 0.05% or 0.35% ethanol to match MA and LA solutions respectively), media (Ctrl), LPS (10 ng/mL) for 24 h, or myristic or lauric acid (MA, LA stock diluted in 0.05%, or 0.35% EtOH; conjugated to 2% BSA) for 24 h, with and without a secondary challenge with LPS (10 ng/mL). After indicated time points, RNA was isolated and expression of (A, B) tnf, (D, E) il- 6, and (G, H) il-1β was measured via qRT-PCR. RAW 264.7 macrophages were thawed and cultured for 3-5 days, pelleted and resuspended in DMEM containing 5% FBS and 2% BSA, and treated identical to BMDM treatments with behenic acid (BA stock diluted in 1.7% EtOH) used as the primary stimulus. (C) tnf, (F) il-6, and (I) il-1β was measured via qRT-PCR. For all plates, all treatments were performed in triplicate. For all panels, a student’s t-test was used for statistical significance. p< 0.05; p < 0.01; **p< 0.001. Error bars shown mean ± SD.

      PA was shown to increase the expression of inflammatory cytokines gene expression and protein production of TNF, IL-6 and IL-1β in bone marrow derived macrophages (BMDMs). The authors tie these effects to ceramide synthesis through a pharmacological blockade as well as the use of oleic acid, which allegedly sequesters ceramide synthesis. The author's claim that oleic acid supplementation reverses the inflammatory signaling induced by PA is invalid, as oleic acid was shown to induce a high level of cytokines in their model. When PA was added along with oleic acid, the cytokine levels returned to the levels produced by BMDM's stimulated with PA alone (see Figure 4 panels D- F).

      This was an unfortunate oversight in our revisions of this manuscript, original Figure 5A-C was mislabeled (though colored the correct colors) – OA-12h → LPS-24h should have been switched with PA-12h → LPS-24h. These data were labeled correctly in the source file: Source_data_Fig5 and have since been updated in Figure 5 of the manuscript with correct labels. The corrected graphs have been split up in the resubmission in light of new data collected. Please see Fig 3K-M and Fig 5A-C.

      Finally the authors test whether injection of PA into mice can recapitulate the systemic inflammatory response seen by WD and KD feeding followed by LPS exposure. They were able to demonstrate that injecting 1 mM of PA, waiting for 12h, and then exposing the mice to LPS for 24h could similarly result in a hyper-inflammatory state resulting in greater mortality. The reviewer is skeptical that 1 mM of PA truly represents post-prandial PA levels as one would expect to see after a single fatty meal, and whether this injection is generally well tolerated by mice. Looking into the paper cited by Eguchi et al. to inform their methods, it's shown that the earlier study continuously infused an emulsified ethyl palmitate solution (which contained 600 mM) at a rate of 0.2 uL/min. As far as I can read by Eguchi, they only managed to reach a serum PA concentration of 0.5 mM. This is hardly the same thing as a single i.p. injection of 1 mM PA. and reflects a single bolus injection of double the serum concentration of PA achieved by Eguchi et al.

      The reviewer brings up an important point, Eguchi et al. did use infusions. From their data (Fig 1A), we calculated that after 600mM of i.v. injection (total = 267uL within 14h; 0.2L/min) there was ~420uM absolute PA within the blood. They were using C57BL/6 mice that were 23g on average. Using these results, we extrapolated that one single 200uL injection of a 750mM PA solution within 6–8-week female BALB/c mice (~15-18g) would equate to ~500-1mM of PA within the blood. Considering obese healthy and unhealthy humans vary widely in total PA concentrations in the blood (0.3-4.1 mM) (1, 2), we moved forward with these calculations. Considering this, we thank the reviewer for this advice, and we agree that we have not definitively shown we are increasing systemic levels of PA. Thus, we ran a lipidomic analysis of serum from SC-fed mice with Veh or PA for 12 h. We show that a 750 mM i.p. injection of ethyl palmitate enhances free PA levels in the serum to 173-425 μM at 2 h post-injection, which is within the reported range for humans on high-fat diets (0.34.1mM). We have added this new data to Fig. S7A of the main manuscript.

      Importantly, the concentration in the PA-treated mice is greater than that of the Veh-treated mice, however we believe the value shown is an underestimate of maximum serum PA levels enhanced by i.p. injection, because free PA is known to be packaged into chylomicrons within enterocytes and travel through the circulation with a half-life of less than an hour (3, 4). Thus, serum concentrations of free PA are only transiently enhanced by i.p. injection, and is quickly taken up by adipose tissue, skeletal muscle, heart, and liver tissue. These complex lipid transport processes make it difficult to determine maximum concentrations of free PA in the serum.

      While all of the details concerning PA circulation following an i.p. injection are unknown, we suggest that this method of “force-feeding” is similar to dietary intake in that uptake of PA into the circulation occurs within the peritoneal space prior to traveling to the blood via the thoracic duct and right lymphatic duct (5).

      PA is known to induce inflammation in monocytes and macrophages, therefore the findings certainly make sense in the context of previously published literature. However the authors have made some poor methodological decisions in their mouse studies, namely haphazardly switching between groups of young and old mice (4-6 weeks, 8-9 weeks, and 14-23 weeks), using different LPS injection protocols (6, 10, and 50 mg/ml of LPS), and including multiple sexes of mice. All of which are drastically alter the interpretation of the data, and preventing solid conclusions from being drawn.

      We appreciate this review and suggest that:

      1) For the LPS models, mice were all female and aged matched between 6-8 weeks. We are aware of sex differences in the endotoxemia model, which is why we specifically use female mice in our studies (6, 7). This is mentioned twice in the methods under the sections “Endotoxin-induced model of sepsis” and “Ethical approval of animal studies”. We have added these specifics of our model to all Results and Figure Legend sections for clarification.

      2) For Germ-free models, it is notoriously difficult to breed C57BL/6 germ-free mice. It was inherently difficult to obtain enough mice within the same sex and age to carry out these experiments, however since we have published in this model before with mixed sex and age we were aware that our WD phenotype is robust enough in these backgrounds (7). Further, we believe that seeing our robust phenotype independent of age or sex within germ-free mice provides more evidence of the strength of this phenotype. It is important to note that we induce endotoxemia within Germ-free mice with 50mg/kg, instead of 6mg/kg which is used in conventional mice, because this is our reported LD50 for mixed sex Germ-free C57BL/6, as we have published previously in detail (7). This difference is due to the presence of the microbiota (8, 9) and also germ-free mice have an immature immune system that correlates with a hyporesponsiveness to microbial products (10-12). We agree with the reviewer that the ages of the C57BL/6 germ-free mice are significantly older than our conventional 6-8 week mice, thus we confirmed that WD- and KD-fed conventional C57BL/6 female mice aged 20 – 21 weeks old still show enhanced disease severity and mortality in an LPS-induced endotoxemia model, compared to mice fed SC (Fig. S1G-H).

      Figure R2. PA treatment enhances survival in both female and male RAG-/- mice. Age-matched (8-9 wk) RAG-/- mice were injected i.v. with ethyl palmitate (PA, 750mM) or vehicle (Veh) solutions 12 h before C. albicans infection. Survival was monitored for 40h post-infection.

      3) In our preliminary results, we stratified survival during C. albicans infection between male and female C57BL/6 and found no notable difference in survival at 40h post IP infection with Candida albicans (Fig R2 A-B). However, the data presented in the manuscript on CFU is female kidney burden and we do not have data on fungal burden within male mice. This is an important piece of data that we would like to collect for understanding sex differences in the PA-dependent enhanced resistance to systemic C. albicans. We are currently addressing this question within the lab as well as elucidating the cell type and mechanism of PA-dependent enhanced fungal resistance.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have used many cleverly chosen mouse models (periodontitis models; various models that lead to an on-switch of genes) and methods (immune localizations of high quality; single cell RNA sequencing) for the quest of elucidating a role for telocytes. They describe that more telocytes are present around teeth in mice that had periodontitis. These cells proliferated, and they expressed a pattern of genes that allowed macrophages to differentiate into a different direction. In particular, they showed that telocytes in periodontitis express HGF, a molecule that steers macrophage differentiation towards a less inflammatory cell type, paving the way for recovery. As a weakness, one could state that an attempt to extrapolate to human cells is missing.

      In the Discussion, we have a sentence that states further investigation in human periodontitis is required (see page 20, paragraph 416).

      Reviewer #3 (Public Review):

      Zhao and Sharpe identified telocytes in the periodontium. To address their contribution to periodontal diseases, they conducted scRNA-seq analysis and lineage tracing in mice. They demonstrated that telocytes are activated in periodontitis. The activated telocytes send HGF signals to surrounding macrophages, converting M2 to M1/M2 hybrid status. The study implies that targeting telocytes and HGF signal for the potential treatment of periodontitis.

      The significance of the study could be improved by authors testing if targeting telocytes or HGF signals could ameliorate periodontitis in the mouse model. The current form of the manuscript lacks the data that demonstrate the actual contribution of telocytes in the homeostasis of periodontium or progression of periodontitis.

      Major comments:

      1) I see the genetic validation of the role of telocytes or HGF signals are crucial to assure the significance of this manuscript. I recommend either of two experiments. a. testing the role of HGF signals by deleting the Hgf gene in telocytes. Using Wnt11-Cre; Hgf f/f mice, the authors could address the role of HGF signals in periodontitis. CX3CR1-Cre; cMet f/f mice will delete HGF signals in monocyte-derived macrophages. This will be another verification, but not sure if the PDL macrophages are derived from yolk sac or monocytes. b. measuring the contribution of telocytes in the homeostasis or disease progression. The mouse model could be challenging though, the system if achieved will be very informative. The authors could first check the expression of telocyte enriched genes, such as Lgr5 or Foxl1 reported previously in other tissue telocytes. Delete those genes under the Wnt1-Cre driver and check if telocyte lineage is removed. The system would be very useful for next-level study. DTA model could be an alternative, but Wnt1-Cre is vastly expressed in neural crest lineage.

      These are good suggestions but unfortunately not feasible as we do not have all the mouse lines (e.g., Hgf f/f mice). Lgr5 and Foxl1 are used in intestine but is not suitable for PDL tissue. CD34;DTA show CD34+ cells, however, we encountered challenges associated with induced genetic heterogeneity when using this model, preventing us from making concrete conclusions from the experiments using the CD34;DTA model. Lgf5/Foxl1 are either not expressed or overlap with CD34 in and therefore do not seem suitable for us to pursue.

      2) This paper points out that the M1/M2 hybrid state of macrophages appears upon periodontitis. The authors could further characterize the hybrid macrophages by the expression of more markers, production of cytokines, and morphology. Need to clarify if this means some macrophages are in M1 state and others are in M2 state, or one macrophage possesses both M1 and M2 phenotype. Please conduct either FACS or immunofluorescence to demonstrate if one macrophage expresses both markers. Please introduce more information about the M1/M2 hybrid state of macrophage based on other present literature.

      Unlike our single cell sequencing data, we were unsuccessful in determining if one macrophage possesses both M1 and M2 phenotype by immunolabelling.

      3) In the introduction part, the author lists several markers that can be used for telocyte identification, such as CD34+CD31-, CD34+c-Kit+, CD34+Vim+, CD34+PDGFRα+. Could authors explain why they chose CD34 CD31, but not other markers?

      As shown in the cluster images below, the other markers do not overlap very well with CD34 cells or in the case of Vim, expressed more ubiquitously. We generated a new supplementary figure (Supp Fig2) and explained this in the text (page 12, lines 235-238).

      4) In figure 5g, I don't think the yellow color cell shows the reduction trend in the Tivantinib treatment group compared with a control group. Please validate the observation by gene expression analysis, WB, etc. In addition, please show c-Met+ cells level in the Tivantinib treatment group and control group.

      New Supp Fig4 is included to show Met expression in homeostasis and periodontitis.

    1. Author Response

      Reviewer #2 (Public Review):

      Members of the WTF gene family can result in distorted meiosis (away from predicted Mendelian segregation) due to a "poison-antidote" like system. The authors find that members of the WTF gene family are found in numerous species long diverged species of fission yeast, that these genes show signatures of ongoing adaptive evolution, and that some of the novel wtf genes discovered here can also distort meiosis. Additionally, the authors show that gene conversion is quite common, and suggest that processes like gene conversion, expansion, and contraction underlie the long-term maintenance of this system in the face of potential loss of function by fixation and/or suppression. While interesting, the support for this vague model is unclear, and the novelty of this system compared to other drive systems was not sufficiently justified.

      The presented work is interesting, and I trust the bioinformatic and functional work (although both are a bit beyond my specialties). I am quite concerned, however, with the introduction, discussion, and take-home conclusions, which at times go beyond the data presented.

      Active meiotic driver genes throughout their history in fission yeasts?

      For example, the authors claim that "Our results suggest that the gene family has contained active meiotic driver genes throughout their history in fission yeasts." Evidence for such a claim would be interesting, but very difficult to obtain and not presented in this manuscript. Rather the authors show that wtf genes are present, evolving rapidly, and can distort meiosis in numerous species. What has happened in the intervening 100 million years is unclear, but I would be surprised if it included an unbroken streak of active meiotic drive. It is well known that drivers spread rapidly, and this group's previous modeling of the system showed that a wtf driver would spread rapidly. I also don't know of evidence for a strong enough cost to wtf/homozygotes in this system to sustain long-term balancing selection (which is what is needed for long-term driving). Otherwise, it seems that most of a driver's history would be fixed (or at least locally fixed), and that continuous drive activity is unlikely (unless the authors mean it "could drive").

      We agree that we have not demonstrated perpetual meiotic drive over the last 100 million years. Instead, we argue the family has retained the capacity to drive for that amount of time. We have modified the text to be more precise.

      We also disagree that long-term balancing selection is needed for long-term drive. Our work suggests an alternative option where long-term drive is not tied to a single locus, but is a property shared across the gene family. Active drive likely comes and goes at individual loci. We propose the evolution of wtf drivers is better described as a cycle of novel drivers being born and spreading (perhaps to local fixation), rather than one driver that is maintained at a given locus for a long time.

      The "model"

      The authors present a brief verbal "model" of the rejuvenation of wtf drivers by expansion/contraction/non-allelic gene conversion etc. While these processes all appear to occur in this system and likely play an important role in its evolution, it is hard to make much of this model. For example, I have trouble understanding the time scale at which these processes operate (e.g. do we expect fixation - which the authors have previously shown to occur quite rapidly at a single locus - to generally occur before an opportunity for one of these processes to occur and/or before suppression evolves? My sense is "probably"). If the scale of fixation is much more rapid than the other processes this system seems to fit in well with the other case discussed in the intro. Rather, it appears that the true excitement of the system, is the fast rate at which wtf emerges (likely facilitated by expansion/contraction/non-allelic gene conversion, etc.) and perhaps their slow breakdown after fixation (unexplained here).

      We have modified our discussion to better highlight the limitations in our understanding and clarify local fixation of a driver from global fixation in the species. We also clarify that mutations can rejuvenate fixed, suppressed, or psuedogenized wtf drivers.

    1. Author Response:

      Reviewer #1:

      It was previously shown that HGF and Met controls development of the diaphragm muscle. In particular, the signal induces delamination and migration of muscle progenitor cells that colonize the diaphragm. The present manuscript by Sefton and coworkers confirms and extends these observations using (i) conditional mouse lines in which the HGF gene was targeted by Cre/loxP recombination in the pleuroperitoneal folds (Prx1-cre) and at other sites PdgfraCreERT2, and of (ii) Met inhibitors. Overall, the technical quality of the data on diaphragm muscle development is excellent; the conceptual advance over previous work is not exceptional; the evidence for Met/HGF-dependent development of the phrenic nerve is marginal and needs to be strengthened.

      The data show that fibroblasts provide HGF signals received by Met in muscle progenitor cells that is essential for diaphragm development. The PdgfraCreERT2 line was used to demonstrate that HGF produced by fibroblasts but not by muscle progenitors is essential for diaphragm development. Moreover, development of dorsal and ventral regions of diaphragm muscle requires continuous MET signaling. Thus, HGF is not only required for the delamination of progenitors, but also for proliferation and survival of those muscle progenitors that reached the anlage of the diaphragm.

      My major concern is the limited data on the HGF-dependent development of the phrenic nerve (defasciculation). While it is well documented that HGF acts as a trophic factor for motor neurons in culture, its role in development of motor neurons was highly debated due to the fact that some changes observed in Met or HGF mutant mice in vivo are also present in other mutants that lack the muscle groups derived from migrating muscle progenitors. Moreover, careful genetic analyses previously demonstrated indirect mechanisms of Met during motor neuron development, i.e. a non-cell-autonomous function of Met during the recruitment of motor neurons to PEA3-positive motor pools (Helmbacher et al., Neuron 2003).

      Sefton et al. provide an analysis of a single time point, one histological picture (3G, magnified in 3H) that indicate that in Met+/- animals defasciculation of the phrenic nerve does not occur correctly. This is accompanied by a quantification that barely reaches significance (Fig. 3K). Data shown in Fig. 7 using Met inhibitors show a major change in phrenic nerve branching, which is presumably due to the major change in diaphragm development, as conceded by the authors.

      Despite this weakness on the experimental side, the role of HGF/Met in phrenic nerve development is strongly emphasized in abstract /intro/discussion (e.g. line 414: However, PPF-derived HGF is crucial for the defasciculation and primary branching of the nerve, independent of muscle). The data need to be strengthened in order to conclude that HGF coordinates both, diaphragm muscle and phrenic development.

      In response to comments from the reviewers, we have more thoroughly investigated the role of Met in the development of the phrenic nerve and include two new sets of genetic experiments. In our first submission, we found a decreased number of phrenic nerve branches at E11.5 in Met Δ/ Δ  and Met Δ/+ compared with Met+/+ embryos. In the Met Δ/ Δ  embryos, no muscle is present in the diaphragm. Therefore, the greatly reduced branching in these embryos is likely a secondary effect of the requirement of Met in muscle progenitors for diaphragm muscularization. Of particular interest is the reduced branching in the Met Δ/+ embryos. Because the diaphragm is muscularized in these embryos, this suggested that Met may be required intrinsically in the phrenic nerve. One reviewer suggested that the reduced branching in the Met Δ/+ embryos could be due to a developmental delay in the whole embryo. However, we found that Met Δ/ Δ  and Met Δ/+ embryos are not overall delayed relative to Met+/+ embryos (as measured by crown rump length or limb length; Figure 3—figure supplement 1). Also, to increase the robustness of these data, we added additional embryos to the analysis. We then extended our analysis of Met Δ/ Δ, Met Δ/+ and Met+/+ embryos to E12.5 (Figure 3—figure supplement 1) to see whether the branching phenotype persisted; we found that while the of Met Δ/ Δ embryos continue to have very few branches, the number of branches in Met Δ/+ embryos recovers and matches that of Met+/+ embryos.

      To explicitly test whether Met is required within the phrenic nerve, we used Olig2Cre/+_to conditionally delete _Met. This line was chosen for its early expression in motor neurons (Zawadzka et al. 2010). We examined Olig2Cre/+;Met Δ_/flox_embryos compared to Olig2Cre/+; Metflox/+ embryos. We chose to include Olig2Cre in our controls because the Olig2Cre is a knock-in/knock-out and Olig2 has important roles in nerve development. However, deletion of Met did not affect the number of branches at E11.5 (Figure 3—figure supplement 2) or E12.5 (data not shown). These data suggest that Met does not intrinsically regulate phrenic nerve branching. This suggests that PPF-derived HGF regulates phrenic nerve branching indirectly via muscle. To test if HGF is sufficient to promote early stages of nerve branching in the absence of muscle, we  turned to Pax3SpD/SpD mutants in which a point mutation in Pax3 prevents migration of muscle progenitors into the diaphragm (Figure 3—figure supplement 2). In these embryos, the diaphragm is muscleless, but the PPFs still express HGF. In these diaphragms the number of branches at E11.5 is severely reduced. These data demonstrate that in the absence of muscle the presence of HGF in the PPF fibroblasts is not sufficient to support diaphragm branching.

      Altogether our data demonstrate that PPF-derived HGF, via its regulation of muscle, controls the primary branching of phrenic nerve. The Met Δ/+ data demonstrate that Met controls phrenic nerve branching at E11.5 in a dose-dependent manner, but this effect is lost by E12.5. Although we see no obvious defects in muscle of Met Δ/+ diaphragms at later stages, the most parsimonious explanation of the reduced phrenic nerve branching at E11.5 is that this is due to fewer muscle progenitors at this time point.

      We thank the reviewers for prompting us to look at the role of HGF/Met in the phrenic nerve more closely. Our revised conclusions are presented in the Results and Discussion. We show that PPF-derived HGF is critical for integrating both muscle and phrenic nerve development, but now demonstrate that HGF’s regulation of phrenic nerve branching is via muscle, which is well-known to express multiple trophic factors required by motor neurons.

      In response to the specific point about the Met+/- raised, the images shown in Figure 3G and H are representative whole mount confocal images of Met Δ/+ phrenic nerves. For each genotype, we immunolabeled, confocal imaged, rendered in 3-dimensions the phrenic nerves, and counted (blinded to genotype) the number of branches. We also have added several additional embryos to this analysis. In Figure 7 the branching defects resulting from application of the BMS777607 are similar, as expected, to the severe branching defects seen in the Met Δ/ Δ embryos.

      Reviewer #2:

      In this study Sefton et al interrogated the source of HGF in the developing mouse embryo that produces HGF, required for muscularization and also proper innervation of the diaphragm. The authors extended previous results that are over 20 years old by generating cell type specific mutants of Met and Hgf and found that inactivation of Hgf in fibroblasts via PDGFRa-CreERT2 results in muscle-less diaphragms. Similarly, Hgf inactivation in fibroblasts via PDGFRa-CreERT2 mostly abrogated limb muscle formation, formally identifying PDGFRa+ mesenchymal cells as the main source of HGF for generation of muscles in the limb and diaphragm. Similarly, inactivation of Hgf using Prx1-Cre, which targets fibroblasts derived from the pleuroperitoneal folds (PPT) also prevented muscularization of the diaphragm and branching of the phrenic nerve. Interestingly, branching of the phrenic nerve was reduced in heterozygous Met mutants with normal diaphragm musculature, indicating that HGF-MET signaling plays a direct role in phrenic nerve branching and that failure of nerve branching in homozygotic Met or Hgf mutants is not solely due to the loss of the musculature. Finally, the authors performed co-cultures between PPFs and myoblasts and found that pharmacological inhibition of MET lowered motility, survival and MyoD expression of myoblasts, leading to the claim that HGF-MET does also play a role in myogenic commitment

      The study sheds further light on the source of HGF required for muscularization of the diaphragm and is well executed. However, the gain of knowledge is mainly incremental and deeper molecular insights are missing.

      We appreciate this critique and have added data to increase the molecular insight into the role of MET signaling in cell survival in the PPFs (Figure 6—figure supplement 1).

      The most interesting part of the study is the formal demonstration that HGF is not only required for delamination of muscle progenitor cells from the epithelium of somites but also to maintain migration at later stages. Similar conclusions have been made many years ago based on studies in chicken embryos but the current study clearly goes a step further.

      We agree that this is one of the interesting findings in our study.

      The part that claims a role of HGF-MET in myogenic commitment is not that well developed and may need further proof.

      We apologize for the misunderstanding here and have altered the text to indicate that we do not propose a role for Met in myogenic commitment, but rather that Met regulates the number of MyoD+ cells by promoting their survival.

      Reviewer #3:

      In this MS by Sefton et al., the authors investigate the role of HGF/MET pathway, as well as the cellular source of these molecules, during diaphragm development. In particular, the authors address the function of this pathway on muscle progenitors and phrenic nerve. They further provide evidence for the expression of HGF in pleuroperitoneal folds and for its requirement for muscle progenitor recruitment and maintenance during diaphragm muscle formation. This study is interesting and in general the results support the conclusions. The work could be improved by (1) providing appropriate controls for the role of HGF in the connective tissue and (2) linking the muscleless diaphragms and HGF to the hernia phenotype.

      We appreciate this review and have added controls for the role of HGF in the connective tissue. Specifically PDGFRaCreER/+; HGF-/fl; RosamTmG/+ embryos have fibroblasts present in muscleless regions. We further link muscleless diaphragms and HGF to the hernia phenotype in our abstract. Absence of muscle is necessary for herniation, but not sufficient.

    1. Author Response

      Reviewer #1 (Public Review):

      1-1. I do have some concerns that the differences in network clustering reported in Fig 6 may be due to noise and I think the comparisons against the HCP parcellation could be more robust. Specifically, with regard to the network clustering in Fig 6. The authors use a clustering algorithm (which is not explained) to cluster the parcels into different functional networks. They achieve this by estimating the mean time series for each parcel in each individual, which they then correlate between the n regions, to generate an nxn connectivity matrix. This they then binarise, before averaging across individuals within an age group. It strikes me that binarising before averaging will artificially reduce connections for which only a subset of individuals are set to zero. Therefore averaging should really occur before binarising. Then I think the stability of these clusters should be explored by creating random repeat and generation groups (as done for the original parcells) or just by bootstrapping the process. I would be interested to see whether after all this the observation that the posterior frontoparietal expands to include the parahippocampal gryus from 3-6 months and then disappears at 9 months - remains.

      We thank the reviewer for this insightful comment on our clustering process. For the step of “binarizing before averaging”, we followed the method proposed by Yeo et al (1). In this method, all correlation matrices are binarized according to the individual-specific thresholds. Specifically, each individual-specific threshold is determined according to the percentile, and only 10% of connections are kept and set to 1, while all other connections are set to 0. Yeo et al. (1) explained their motivation for doing so as “the binarization of the correlation matrix leads to significantly better clustering results, although the algorithm appears robust to the particular choice of the threshold”. We consider that the possible reason is that the binarization of connectivity in each individual offers a certain level of normalization so that each subject can contribute the same number of connections. If averaging occurs before binarizing, the actual connectivity contributed by different subjects would be different, which leads to bias. Meanwhile, we tested the stability of ‘binarizing first’ and ‘averaging first’, and the result is shown in Fig. R1 below. This figure suggests a similar conclusion as (1), where binarizing first before averaging leads to better clustering stability. We added the motivation of binarizing before averaging in the revised manuscript between line 577 and line 581.

      Fig. R1. The comparison of clustering stability of different methods. The red line refers to the clustering stability when binarizing the correlation matrices first and then averaging the matrices across individuals, while the blue line refers to the clustering stability when averaging the correlation matrices across individuals first and then binarizing the average matrix.

      For the final clustering results, we performed our clustering method using bootstrapping 100 times, and the final result is a majority voting of each parcel. The comparison of these two results is shown in Fig. R2. Overall, we do observe good repeatability between these two results. However, we also observed that some parcels show different patterns between the two results, especially for those parcels that are spatially located around the boundaries of networks or the medial wall. The pattern of the observation that “the posterior frontoparietal expands to include the parahippocampal gyrus from 3-6 months and then disappears at 9 months – remains” was not repeated in the bootstrapped results. These results might suggest that the clustering method is quite robust, the discovered patterns are relatively stable, and the differences between our original results and bootstrapping results might be caused by noises or inter-subject variabilities.

      Fig. R2. Top panel: the network clustering results using all data in the original manuscript. Bottom panel: the network clustering results using majority voting through 100 times of bootstrapping. Black circles and red arrows point to the parahippocampal gyrus, which was included in the posterior frontoparietal network, and is not well repeated in the bootstrapped results. (M: months)

      1-2. Then with regard to the comparison against the HCP parcellation, this is only qualitative. The authors should see whether the comparison is quantitatively better relative to the null clusterings that they produce.

      Thank you for this great suggestion! As suggested, we added this quantitative comparison using the Hausdorff distance. Similar to the comparison in parcel variance and homogeneity, the 1,000 null parcellations were created by randomly rotating our parcellation with small angles on the spherical surface 1,000 times. We compared our parcellation and the null parcellations by accordingly evaluating their Hausdorff distances to some specific areas of the HCP parcellation on the spherical space, including Brodmann's area 2, 3b, 4+3a, 44+45, V1, and MT+MST. The results are listed in Figure 4. From the results, we can observe that our parcellation generally shows statistically much lower Hausdorff distances to the HCP parcellation, suggesting that our parcellation generates parcel borders that are closer to HCP parcellations compared to the null parcellations.

      However, we noticed very few null parcellations that show smaller Hausdorff distances compared to our parcellation. A possible reason comes from our surface registration process with the HCP template purely based on cortical folding, without using functional gradient density maps, which are not available in the HCP template. As a result, this does not ensure high-quality functional alignment between our infant data and the HCP space, thus inevitably increasing the Hausdorff distance between our parcellation and the HCP parcellation.

      1-3. … not all individuals appear (from Fig 8) to be acquired exactly at the desired timepoints, so maybe the authors might comment on why they decided not to apply any kernel weighted or smoothing to their averaging? Pg. 8 'and parcel numbers show slight changes that follow a multi-peak fluctuation, with inflection ages of 9 and 18 months' explain - the parcels per age group vary - with age with peaks at 9 and 18 - could this be due to differences in the subject numbers, or the subjects that were scanned at that point?

      We do agree with the reviewer that subjects are not scanned at similar time points. This is designed in the data acquisition protocol to seamlessly cover the early postnatal stage so that we will have a quasi-continuous observation of the dynamic early brain development.

      We didn’t apply kernel weighted average or smoothing when generating the parcellation, as we would like each scan to contribute equally, and each parcellation map could be representative of the cohort of the covered age, instead of only part of them. Meanwhile, our final ‘age-common parcellation’ could be representative of all subjects from birth to 2 years of age. However, we do agree that the parcellation map that is only designed for the use of a specific age, e.g., 1-year-olds, kernel weighted average, or even a more restricted age range could be a more appropriate solution.

      For the parcel number that likely shows fluctuations with subject numbers, we added an experiment, where we randomly selected 100 scans by considering the minimum scan number in each age group using bootstrapping and repeated this process 100 times. The average parcel number of each age is reported in the following Table R1. We didn’t observe strong changes in parcel numbers when reducing scan numbers, which further demonstrates that our parcel numbers do not show a strong relation to subject numbers. However, the parcel number does not increase greatly from 18M to 24M in the bootstrapping results, so we modified the statement in the manuscript about the parcel number to ‘… all parcel numbers fall between 461 to 493 per hemisphere, where the parcel number attains a maximum at around 9 months and then reduces slightly and remains relatively stable afterward. …’, which can be found between line 121 and line 122.

      1-4. I also have some residual concerns over the number of parcels reported, specifically as to whether all of this represents fine-grained functional organisation, or whether some of it represents noise. The number of parcels reported is very high. While Glasser et al 2016 reports 360 as a lower bound, it seems unlikely that the number of parcels estimated by that method would greatly exceed 400. This would align with the previous work of Van Essen et al (which the authors cite as 53) which suggests a high bound of 400 regions. While accepting Eickhoff's argument that a more modular view of parcellation might be appropriate, these are infants with underdeveloped brain function.

      We thank the reviewer for this insightful comment. We agree that there might be noises for some of the parcels, as noises exist in each step, such as data acquisition, image processing, surface reconstruction, and registration, especially considering functional MRI is noisier than structural MRI. Though our experiments show that our parcellation is fine-grained and is suitable for the study of the infant brain functional development, it is hard to directly quantitatively validate as there is no ground truth available.

      Despite these, we are still motivated to create fine-grained parcellations, as with the increase of bigger and higher resolution imaging data and advanced computational methods, parcellations with more fine-grained regions are desired for downstream analyses, especially considering the hierarchical nature of the brain organization (2). And the main reason that our method generates much finer parcellation maps, is that both our registration and parcellation process is based on the functional gradient density, which characterizes a fine-grained feature map based on fMRI. This leads to both better inter-subject alignment in functional boundaries and finer region partitions. This strategy is different from Glasser et al (3), which jointly considers multimodal information for defining parcel boundaries, thus parcels revealed purely by functional MRI might be ignored in the HCP parcellation. We hope our parcellation framework can be a useful reference for this research direction. We added this discussion in the revised manuscript between line 268 and line 271.

      For the parcel number, even without performing surface registration based on fine-grained functional features, recent adult fMRI-based parcellations greatly increased parcel numbers, such as up to 1,000 parcels in Schaefer et al. (4), 518 parcels in Peng et al. (5), and 1,600 parcels in Zhao et al. (6). For infants, we do agree that the infant functional connectivity might not be as strong as in adults. However, there are opinions (7-9) that the basic units of functional organization are likely to present in infant brains, and brain functional development gradually shapes the brain networks. Therefore, the functional parcel units in infants could be possibly on a comparable scale to adults. Even so, we do agree that more research needs to be performed on larger datasets for better evaluations. We added this discussion in the revised manuscript between line 275 and line 280.

      1-5. Further comparisons across different subjects based on small parcels increases the chances of downstream analyses incorporating image registration noise, since as Glasser et al 2016 noted, there are many examples of topographic variation, which diffeomorphic registration cannot match. Therefore averaging across individuals would likely lose this granularity. I'm not sure how to test this beyond showing that the networks work well for downstream analyses but I think these issues should be discussed.

      We agree with the reviewer that averaging across individuals inevitably brings some registration errors to the parcellation, especially for regions with high topographic variation across subjects, which would lead to loss of granularity in these regions. We believe this is an important issue that exists in most methods on group-level parcellations, and the eventual solution might be individualized parcellation, which will be our future work. We added this discussion in the revised manuscript between line 288 and line 292.

      We also agree with the reviewer that downstream analyses are important evaluations for parcellations. We provided a beta version of our parcellation with 602 parcels (10) to our colleagues, and they tested our parcellation in the task of infant individual recognition across ages using functional connectivity, to explore infant functional connectome fingerprinting (10). We compared the performance of different parcellations with 602 ROIs (our beta version), 360 ROIs (HCP MMP parcellation (3)), and 68 ROIs (FreeSurfer parcellation (11)). The results (Fig. R3) show that our parcellation with a higher parcellation number yields better accuracy compared to other parcellations. We added a description of this downstream application in the discussion between line 284 and line 287.

      Fig. R3. The comparison of different parcellations for infant individual recognition across age based on functional connectivity (figure source: Hu et al. (10)). The parcellation with 602 ROIs is the beta version of our parcellation, 360 ROIs stands for HCP MMP parcellation (3) and 68 ROIs stands for the FreeSurfer parcellation (11). This downstream task shows that a higher parcellation number does lead to better accuracy in the application.

      1-6. Finally, I feel the methods lack clarity in some areas and that many key references are missing. In general I don't think that key methods should be described only through references to other papers. And there are many references, particular to FSL papers, that are missing.

      We thank the reviewer for this great suggestion. We added related references for FLIRT, FSL, MCFLIRT, and TOPUP For the alignment to the HCP 32k_LR space, we first aligned all subjects to the fsaverage space using spherical demons, and then used part of the HCP pipeline (12) to map the surface from the fsaverage space to HCP 164k_LR space, and downsampled to 32k_LR space. We modified this citation by referencing the HCP pipeline by Glasser et al. (12) instead and detailed this registration process in the revised manuscript between line 434 to line 440 in the revised manuscript and as below:

      “… The population-mean surface maps were mapped to the HCP 164k ‘fs_LR’ space using the deformation field that deforms the ‘fsaverage’ space to the ‘fs_LR’ space released by Van Essen et al. (13), which was obtained by landmark-based registration. By concatenating the three deformation fields of steps 1, 3, and 4, we directly warped all cortical surfaces from individual scan spaces to the HCP 164k_LR space and then resampled them to 32k_LR using the HCP pipeline (12), thus establishing vertex-to-vertex correspondences across individuals and ages …”

      Reviewer #2 (Public Review):

      2-1. Diminishing enthusiasm is the lack of focus in the result section, the frequent use of jargon, and figures that are often difficult to interpret. If those issues are addressed, the proposed atlas could have a high impact in the field especially as it is aligned with the template of the Human Connectome Project.

      We’d like to thank Reviewer #2 for the appreciation of our atlas. According to the reviewer’s suggestion, we went through the manuscript again by focusing on correcting the use of jargon, clarity in the result section, as well as figures and figure captions. We hope our corrections can help explain our work to a broader community. Our revisions are accordingly detailed in the following. Meanwhile, our parcellation maps have been aligned with the templates in HCP and FreeSurfer and made available via NITRC at: https://www.nitrc.org/projects/infantsurfatlas/.

      References

      1. B. Thomas Yeo, F. M. Krienen, J. Sepulcre, M. R. Sabuncu, D. Lashkari, M. Hollinshead, J. L. Roffman, J. W. Smoller, L. Zöllei, J. R. Polimeni, The organization of the human cerebral cortex estimated by intrinsic functional connectivity. Journal of neurophysiology 106, 1125-1165 (2011).

      2. S. B. Eickhoff, R. T. Constable, B. T. Yeo, Topographic organization of the cerebral cortex and brain cartography. NeuroImage 170, 332-347 (2018).

      3. M. F. Glasser, T. S. Coalson, E. C. Robinson, C. D. Hacker, J. Harwell, E. Yacoub, K. Ugurbil, J. Andersson, C. F. Beckmann, M. Jenkinson, S. M. Smith, D. C. Van Essen, A multi-modal parcellation of human cerebral cortex. Nature 536, 171-178 (2016).

      4. A. Schaefer, R. Kong, E. M. Gordon, T. O. Laumann, X.-N. Zuo, A. J. Holmes, S. B. Eickhoff, B. T. J. C. C. Yeo, Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. 28, 3095-3114 (2018).

      5. L. Peng, Z. Luo, L.-L. Zeng, C. Hou, H. Shen, Z. Zhou, D. Hu, Parcellating the human brain using resting-state dynamic functional connectivity. Cerebral Cortex, (2022).

      6. J. Zhao, C. Tang, J. Nie, Functional parcellation of individual cerebral cortex based on functional mri. Neuroinformatics 18, 295-306 (2020).

      7. W. Gao, S. Alcauter, J. K. Smith, J. H. Gilmore, W. Lin, Development of human brain cortical network architecture during infancy. Brain Structure and Function 220, 1173-1186 (2015).

      8. W. Gao, H. Zhu, K. S. Giovanello, J. K. Smith, D. Shen, J. H. Gilmore, W. J. P. o. t. N. A. o. S. Lin, Evidence on the emergence of the brain's default network from 2-week-old to 2-year-old healthy pediatric subjects. 106, 6790-6795 (2009).

      9. K. Keunen, S. J. Counsell, M. J. J. N. Benders, The emergence of functional architecture during early brain development. 160, 2-14 (2017).

      10. D. Hu, F. Wang, H. Zhang, Z. Wu, Z. Zhou, G. Li, L. Wang, W. Lin, G. Li, U. U. B. C. P. Consortium, Existence of Functional Connectome Fingerprint during Infancy and Its Stability over Months. Journal of Neuroscience 42, 377-389 (2022).

      11. R. S. Desikan, F. Ségonne, B. Fischl, B. T. Quinn, B. C. Dickerson, D. Blacker, R. L. Buckner, A. M. Dale, R. P. Maguire, B. T. Hyman, An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968-980 (2006).

      12. M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage 80, 105-124 (2013).

    1. Author Response

      Reviewer #1 (Public Review):

      Point 1) There is affluent evidence that the cortical activity in the waking brain, even in head restrained mice, is not uniform but represents a spectrum of states ranging from complete desynchronization to strong synchronization, reminiscent of the up and down states observed during sleep (Luczak et al., 2013; McGinley et al., 2015; Petersen et al., 2003). Moreover, awake synchronization can be local, affecting selective cortical areas but not others (Vyazovskiy et al., 2011). State fluctuations can be estimated using multiple criteria (e.g., pupil diameter). The authors consider reduced glutamatergic drive or long-range inhibition as potential sources of the voltage decrease but do not attempt to address this cortical state continuum, which is also likely to play a role. For example: does the voltage inactivation following ripples reflect a local downstate? The authors could start by detecting peaks and troughs in the voltage signal and investigate how ripple power is modulated around those events.

      Our study is correlational, and hence, we cannot speak as to any casual role that the awake hippocampal ripples may play in the post-ripple hyperpolarization observed in aRSC. It is indeed possible that the post-awake-ripple neocortical hyperpolarization is independent of ripples and reflects other mechanisms that our experiments have possibly been blind to. One such mechanism is neocortical synchronization in the awake state. As reviewer 1 pointed out, it is possible that a proportion of hippocampal ripples occur before neocortical awake down-states. To test this hypothesis, we triggered the ripple power signal by the troughs (as proxies of awake down-states) and peaks (as proxies of awake up-states) of the voltage signals, captured from different neocortical regions, during periods of high ripple activity when the probability of neocortical synchronization is highest (McGinley et al., 2015; Nitzan et al., 2020). According to this analysis (see the figure below), the ripple power was, on average, higher before troughs of aRSC voltage signal than before those of other regions. On the other hand, the ripple power, on average, was not higher after the peaks of aRSC voltage signal than after those of other regions. This observation supports the hypothesis that a local awake down-state could occur in aRSC after the occurrence of a portion of hippocampal ripples. However, a recent work whose preprint version was cited in our submission (Chambers et al., 2022, 2021) reported that, out of 33 aRSC neurons whose membrane potentials were recorded, only 1 showed up-/down-states transitions (bimodal membrane potential distribution). Still, a portion (10 out of 30) of the remaining neurons showed an abrupt post-ripple hyperpolarization. In addition, they reported a modest post-ripple modulation of aRSC neurons’ membrane potential (~ %20 of the up/down-states transition range). Hence, these results suggest that the post-ripple aRSC hyperpolarization is not necessarily the result of down-states in aRSC. A paragraph discussing this point was added to the discussion lines 262-279.

      Mean ripple power triggered by troughs and peaks of voltage signal captured from aRSC, V1, and FLS1. Zero time represents the timestamp of neocortical troughs/peaks. The shading represents SEM (n = 6 animals).

      Point 2) Ripples are known to be heterogeneous in multiple parameters (e.g., power, duration, isolated events/ ripple bursts, etc.), and this heterogeneity was shown to have functional significance on multiple occasions (e.g. Fernandez-Ruiz et al., 2019 for long-duration ripples; Nitzan et al., 2022 for ripple magnitude; Ramirez-Villegas et al., 2015 for different ripple sharp-wave alignments). It is possible that the small effect size shown here (e.g. 0.3 SD in Fig. 2a) is because ripples with different properties and downstream effects are averaged together? The authors should attempt to investigate whether ripples of different properties differ in their effects on the cortical signals.

      The seeming small effect size (e.g. 0.3 SD in Fig. 2a) is because the individual peri-ripple voltage/glutamate traces were z-scored against a peri-non-ripple distribution and then averaged. Alternatively, the peri-ripple traces could have been averaged first, and the averaged trace could have been z-scored against a sampling distribution constructed from the abovementioned peri-non-ripple distribution where the sample size would have been the number of ripples detected for a specific animal. In the latter case, the standard deviation of the sampling distribution would have been used as the divisor in the z-scoring process as opposed to the former case where the standard deviation of the original peri-non-ripple distribution would have been used. Since the standard deviation of the sampling distribution is smaller than the standard deviation of the original distribution by a factor of √(sample size), the final z-scored values in the latter would be higher than those in the former case by a factor of √(sample size). For instance, if the sample size in Fig. 2A (number of ripples) was 100, the mean z-scored value would be 0.3*10 = 3. In any case, it is of interest to investigate the relationship between the ripple and neocortical activity features.

      To investigate the relationship between the hippocampal ripple power and the peri-ripple neocortical voltage activity, we focused on the agranular retrosplenial cortex (aRSC) as it showed the highest level of modulation around ripples. To get an idea of what features of the aRSC voltage activity might be correlated with the ripple power, the ripples were divided into 8 subgroups using 8-quantiles of their power distribution, and the corresponding aRSC voltage traces were averaged for each subgroup (similar to the work of Nitzan et al. (Nitzan et al., 2022)). The results of this analysis are summarized in the figure below.

      Left: peri-ripple aRSC voltage trace was triggered on ripples in the odd-numbered ripple power subgroups for each animal and then averaged across 6 animals. The standard errors of the mean were not shown for the sake of simplicity. Right: the same as the left panel but for only lowest and highest power subgroups. The shading represents the standard error of the mean.

      These results suggested that there might be a positive correlation between the ripple power and the pre-ripple and post-ripple aRSC voltage amplitude. To test this possibility, Pearson’s correlation between the ripple power and pre-/post-ripple aRSC amplitude was calculated for each animal separately. The ripple power for each detected ripple was defined as the average of the ripple-band-filtered, squared, and smoothed hippocampal LFP trace from -50 ms to +50ms relative to the ripple's largest trough timestamp (ripple center). The pre- and post-ripple aRSC amplitude for each ripple was calculated as the average of the aRSC voltage trace over the intervals [-200ms, 0] and [0, 200ms], respectively. The results come as follows.

      Top: the scatter plots of the ripple power and pre-ripple aRSC voltage amplitude for individual animals. The black lines in each graph represent the linear regression line. The blue circles in each graph are associated with one ripple. The Pearson’s correlation values (ρ) and the p-value of their corresponding statistical significance are represented on top of each graph. Bottom: the same as top graphs but for post-ripple aRSC amplitude.

      According to this analysis, 4 out of 6 animals showed a weak positive correlation (ρ = 0.0806 ± 0.0115; mean ± std), 1 animal showed a negative correlation (ρ = -0.20183), and 1 animal did not show a statistically significant correlation (p-value > 0.05) between ripple power and pre-ripple aRSC voltage amplitude. Moreover, 2 out of 6 animals showed a negative correlation (ρ = -0.1 and -0.14), and 4 animals did not show a statistically significant correlation (p-value > 0.05) between ripple power and post-ripple aRSC voltage amplitude.

      To check that the correlation results were not influenced by the extreme values of the ripple power and aRSC voltage, we repeated the same correlation analysis after removing the ripples associated with top and bottom %5 of the ripple power and aRSC voltage values. According to this analysis, 1 out of 6 animals showed a negative correlation (ρ = -0.13), and 5 animals did not show a statistically significant correlation (p-value > 0.05) between ripple power and pre-ripple aRSC voltage amplitude. Moreover, 2 out of 6 animals showed a negative correlation (same animals that showed negative correlation before removing the extreme values; ρ = -0.12 and -0.14), 1 animal showed a positive correlation (ρ = 0.1), and 3 animals did not show a statistically significant correlation (p-value > 0.05) between ripple power and post-ripple aRSC voltage amplitude.

      Based on these results, we cannot conclude that there is a meaningful correlation between the ripple power and amplitude of aRSC voltage activity before and after the ripples. It is noteworthy to mention that Nitzan et al. (see Fig S6 in (Nitzan et al., 2022)) did not report a statistically significant correlation between ripple power octile number (by discretizing a continuous-valued random variable into 8 subgroups) and pre-ripple firing rate of the mouse visual cortex. However, they reported a statistically significant negative correlation (ρ = -0.13) between the ripple power octile number and post-ripple firing rate of the mouse visual cortex. It appears that their reported negative correlation was influenced by the disproportionately larger values of the firing rate associated with the first ripple power octile compared to the other octiles. Therefore, repeating their analysis after removing the first octile would probably lead to a weak correlation value close to 0.

      Next, we investigated the relationship between ripple duration and aRSC voltage activity. To get an idea of what features of the aRSC voltage activity might be correlated with the ripple duration, the ripples were divided into 8 subgroups using 8-quantiles of their duration distribution, and the corresponding aRSC voltage traces were averaged for each subgroup. The results of this analysis are summarized in the figure below.

      Left: peri-ripple aRSC voltage trace was triggered on ripples in the odd-numbered ripple duration subgroups for each animal and then averaged across 6 animals. The standard errors of the mean were not shown for the sake of simplicity. Right: the same as the left panel but for only lower and highest duration subgroups. The shading represents standard error of the mean.

      These results do not reveal a qualitative difference between the patterns of aRSC peri-ripple voltage modulation and ripple duration. However, the same correlation analysis performed for the ripple power was also conducted for the ripple duration. Only 1 animal out of 6 showed a statistically significant correlation (ρ = 0.08) between pre-ripple aRSC voltage amplitude and ripple duration.

      Moreover, only 1 animal out of 6 showed a statistically significant correlation (ρ = -0.08) between post-ripple aRSC voltage amplitude and ripple duration. In conclusion, there does not seem to be a meaningful linear relationship between peri-ripple aRSC voltage amplitude and ripple duration.

      Next, we investigated whether the peri-ripple aRSC voltage modulation differs depending on whether a single or a bundled ripple occurs in the dorsal hippocampus. The bundled ripples were detected following the method described in our previous work (Karimi Abadchi et al., 2020). We found that 9.4 ± 3.5 (mean ± std across 6 animals) percent of the ripples occurred in bundles. Then, the aRSC voltage trace was triggered by the centers of the single as well as centers of the first/second ripples in the bundled ripples, averaged for each animal, and averaged across 6 animals. The results of this analysis are represented in the following figure.

      Left: animal-wise average of mean peri-ripple aRSC voltage trace triggered by centers of the single and centers of the first ripple in the bundled ripples. Right: Same to the left but triggered by the centers of the second ripple in the bundled ripples.

      These results suggest that the amplitude of aRSC voltage activity is larger before bundled than single ripples, and the timing of aRSC voltage activity is shifted to the later times for bundled versus single ripples. The pre-ripple larger depolarization might signal the occurrence of a bundled ripple (similar to larger pre-bundled- than pre-single-ripple deactivation observed during sleep (Karimi Abadchi et al., 2020)).

      Point 3) The differences between the voltage and glutamate signals are puzzling, especially in light of the fact that in the sleep state they went hand in hand (Karimi Abadchi et al., 2020, Fig. 2). It is also somewhat puzzling that the aRSC is the first area to show voltage inactivation but the last area to display an increase in glutamate signal, despite its anatomical proximity to hippocampal output (two synapses away). The SVD analysis hints that the glutamate signal is potentially multiplexed (although this analysis also requires more attention, see below), but does not provide a physiologically meaningful explanation. The authors speculate that feed-forward inhibition via the gRSC could be involved, but I note that the aRSC is among the two major targets of the gRSC pyramidal cells (the other being homotypical projections) (Van Groen and Wyss, 2003), i.e., glutamatergic signals are also at play. To meaningfully interpret the results in this paper, it would be instrumental to solve this discrepancy, e.g., by adding experiments monitoring the activity of inhibitory cells.

      Observing that glutamate and voltage signals do not go hand-in-hand in awake versus sleep states was surprising for us as well, and it was the main reason that SVD analysis was performed. Especially that a portion of aRSC excitatory neurons showed elevated calcium activity despite the reduction of voltage and delayed elevation of glutamate signals in aRSC at the population level. At the time of initial submission, pre-ripple reduction and post-ripple elevation of calcium activity in a portion of three subclasses of the superficial aRSC inhibitory neurons were reported (Chambers et al., 2022, 2021), and it was the basis of our speculation on the potential involvement of feed-forward inhibition in the post-ripple voltage reduction. We speculated that the source of this potential feed-forward inhibition could stem from gRSC excitatory neurons, as the reviewer 1 pointed out, or from other neocortical or subcortical regions projecting to aRSC. It is also possible that feedback inhibition would be involved where the principal aRSC neurons that are excited by gRSC (as reviewer 1 pointed out) or any other region, including aRSC itself, excite aRSC inhibitory neurons.

      Point 4) I am puzzled by the ensemble-wise correlation analysis of the voltage imaging data: the authors point to a period of enhanced positive correlation between cortex and hippocampus 0-100 ms after the ripple center but here the correlation is across ripple events, not in time. This analysis hints that there is a positive relationship between CA1 MUA (an indicator for ripple power) and the respective cortical voltage (again an incentive to separate ripples by power), i.e. the stronger the ripple the less negative the cortical voltage is, but this conclusion is contradictory to the statements made by the authors about inhibition.

      A closer look at Figure 2B iv reveals that elevation of the cross-correlation function between peri-ripple aRSC voltage and hippocampal MUA starts with a short delay (~20 ms) and peaks around 75 ms after the ripple centers. It means the maximum correlation between the two signals occurs at point (75ms, 75ms) on the MUA time-voltage time plane whose origin (i.e. the point (0, 0)) is the ripple centers in the hippocampal MUA and corresponding imaging frame in the voltage signal. Reviewer 1’s interpretation would be correct if the maximum correlation occurred at the point (0, 0) not at the point (75ms, 75 ms). It is because the MUA value at the time of ripple centers (t = 0) is the indicator of the ripple power not at the time t = 75ms. Figure 2B iii shows that the amplitude of hippocampal MUA is more than 2 dB less at t = 75ms than at t = 0 which is a reflection of the fact that ripples are often short-duration events. Instead, if the maximum correlation occurred at the point (0, 100ms) where the ripples had maximum power and aRSC voltage was at its trough (Figure 2B iii), it could have been concluded that “the stronger the ripple the less negative the cortical voltage”.

      Point 5) Following my previous point, it is difficult to interpret the ensemble-wise correlation analysis in the absence of rigorous significance testing. The increased correlation between the HPC and RSC following ripples is equal in magnitude to the correlation between pre-ripple HPC MUA and post-ripple cortical activity. How should those results be interpreted? The authors could, for example, use cluster-based analysis (Pernet et al., 2015) with temporal shuffling to obtain significant regions in those plots. In addition, the authors should mark the diagonal of those plots, or even better compute the asymmetry in correlation (see Steinmetz et al., 2019 Extended Fig. 8 as an example), to make it easier for the reader to discern lead/lag relationships.

      The purpose of calculating the ensemble-wise correlation coefficient was to provide further information about the relationship between the two random processes peri-ripple HPC MUA and peri-ripple neocortical activity. In general, the correlation between the two random processes cannot be inferred from the temporal relationship between their mean functions. In other words, there are infinitely many options for the shape of the correlation function between two random processes with given mean functions. Moreover, the point was to compare the correlation of peri-ripple neocortical activity and HPC MUA across neocortical regions. The fact that mean peri-ripple activity in, for example, RSC and FLS1 are different does not necessarily mean their correlation functions with peri-ripple HPC MUA are also different.

      As requested, we performed cluster-based significant testing via temporal shuffling for each individual VSFP (n = 6), iGluSnFR Ras (n = 4), and iGluSnFR EMX (n = 4) animals. The following figures summarize the number of animals showing significant regions in their correlation functions between peri-ripple HPC MUA and different neocortical regions. The diagonal of the correlation functions is marked; however, the temporal lead/lag should not be inferred from these results mainly because the temporal resolution of the two signals, one electrophysiological and one optical, are not the same.

      Point 6) For the single cell 2-photon responses presented in Fig. 3, how should the reader interpret a modulation that is at most 1/20 of a standard deviation? Was there any attempt to test for the significance of modulation (e.g., by comparing to shuffle)? If yes, what is the proportion of non-modulated units? In addition, it is not clear from the averages whether those cells represent bona fide distinct groups or whether, for instance, some cells can be upmodulated by some ripples but downmodulated by others. Again, separation of ripples based on objective criteria would be useful to answer this question.

      As explained in response to point 2, the seeming small modulation size (e.g. 0.05 SD in Fig. 3b) is because the individual peri-ripple calcium traces were z-scored against a peri-non-ripple distribution and then averaged. Alternatively, the peri-ripple traces could have been averaged first, and the averaged trace could have been z-scored against a sampling distribution constructed from the abovementioned peri-non-ripple distribution where the sample size would have been the number of ripples detected for a specific animal. In this latter case, the standard deviation of the sampling distribution would have been used as the divisor in the z-scoring process as opposed to the former case where the standard deviation of the original peri-non-ripple distribution would have been used. Since the standard deviation of the sampling distribution is smaller than that of the original distribution by a factor of √(sample size), the final z-scored values in the latter would be higher than those in the former case by a factor of √(sample size).

      As suggested by the reviewer and to make our results more comparable with those of electrophysiological studies, we deconvolved the calcium traces and tested for the significance of the modulation of each neuron by comparing its mean peri-ripple deconvolved trace with a neuron-specific shuffled distribution (see the methods section for details). We found %8.46 ± 3 (mean ± std across 11 mice) of neurons were significantly modulated over the interval [0, 200ms] and %81.08 ± 8.91 (mean ± std across 11 mice) of which were up-modulated. If the criterion of being distinct is being significantly up- or down-modulated, these two groups could be considered distinct groups. The following figures show mean peri-ripple calcium and deconvolved traces, averaged across up- or down-modulated neurons for each mouse and then averaged across 11 mice.

      Point 7) Fig. 3: The decomposition-based analysis of glutamate imaging using SVD needs to be improved. First, it is not clear how much of the variance is captured by each component, and it seems like no attempt has been made to determine the number of significant components or to use a cross-validated approach. Second, the authors imply that reconstructing the glutamate imaging data using the 2nd-100th components 'matches' the voltage signal but this statement holds true only in the case of the aRSC and not for other regions, without providing an explanation, raising questions as to whether this similarity is genuine or merely incidental.

      The first 100 components explained about %99.9 of the variance in the concatenated stack of peri-ripple neocortical glutamate activity for each animal which is practically equivalent to the entire variance in the data. Our goal was not to obtain a low-rank approximation of the data for which the number of significant components had to be determined. Instead, we decomposed the data into the activity along the first principal component for which there was no noticeable topography among neocortical regions and the activity along the rest of the components for which there was a noticeable topography among neocortical regions. The first component explained %83.11 ± 6.75 (mean ± std across 4 iGluSnFR Ras mice) and %83.3 ± 5.07 (mean ± std across 4 iGluSnFR EMX mice) of variance in the concatenated stack of peri-ripple neocortical glutamate activity.

      As we discussed in the discussion section of the manuscript, SVD is agnostic about brain mechanisms and only cares about capturing maximum variance. Specifically, it is not designed to capture the maximum similarity between glutamate and voltage activity in the brain. Therefore, the only thing we can say with certainty comes as follows: when the activity along the axis with maximum co-variability (1st principal component) across the neocortical regions’ glutamate activity is removed, only aRSC, and no other regions, show a post-ripple down-modulation, whose timing matches that of aRSC post-ripple voltage down-modulation. Moreover, the timing of activity of 1st principal component matches better with that of calcium activity among the up-modulated portion of aRSC neurons. Even though the genuineness of these results is not guaranteed, the similarity between the timing of SVD output in aRSC glutamatergic activity with that in two independently collected signals in aRSC, i.e. voltage and calcium, could support the idea that peri-ripple aRSC glutamatergic activity is likely a mixture of up- and down-modulated components.

      Point 8) The estimation of deep pyramidal cells' glutamate activity by subtracting the Ras group (Fig. 4d) is not very convincing. First, the efficiency of transgene expression can vary substantially across different mouse lines. Second, it is not clear to what extent the wide field signal reflects deep cells' somatic vs. dendritic activity due to non-linear scattering (Ma et al., 2016), and it is questionable whether a simple linear subtraction is appropriate. The quality of the manuscript would improve substantially if the authors probe this question directly, either by using deep layer specific line/ 2-P imaging of deep cells or employing available public datasets.

      Simulation studies have suggested that the signal, captured by wide-field imaging of voltage-sensitive dye, can be modeled as a weighted sum of voltage activity across neocortical layers (Chemla and Chavane, 2010; Newton et al., 2021). Hence, modeling the glutamate signal as a weighted sum of the glutamate activity across neocortical layers is a good starting point. Future studies would be needed to improve this starting point by imaging glutamate activity in a cohort of mice with iGluSnFR expression in only deep layers’ neurons. Moreover, Ma et al. (Ma et al. 2016) stated that “This means that signal detected at the cortical surface (in the form of a two-dimensional image) represents a superficially weighted sum of signals from shallow and deeper layers of the cortex”.

      Reviewer #2 (Public Review):

      Point 1) The authors throughout the manuscript compare the correlation between hippocampal MUA and the imaged cortical ensemble activity (Example: Lines 120-122). There is a potential time lag in signal detection with regard to the two detection methods. While the time lag using electrophysiological recording is at the scale of milliseconds, the glutamate-sensitive imaging might take several 100s of ms to be detected. It is not clear in the manuscript how the authors considered this problem during the analysis.

      The ensemble-wise correlation analysis characterizes the relationship between two random processes, peri-ripple HPC MUA and peri-ripple neocortical activity (please see the response to reviewer 1’s major point 5). Although it is a valid point that the temporal resolution of the two signals is not the same which could introduce an error in the exact timing of the relationship between the two processes, we did not draw any conclusion based on the exact timing of the elevated correlation between the two processes. Moreover, we smoothed (equivalent to low-pass filtering) and down-sampled the MUA signal (please see the methods section) to bring the temporal scale of the two processes closer to each other. We also want to clarify that the temporal resolution of voltage and glutamate imaging is in the range of 10s of ms (Xie et al., 2016).

      Point 2) In the results section "The peri-ripple glutamatergic activity is layer dependent", are the Ras and EMX expressed in two different experimental animal groups? If yes, and there was a time lag between the two groups, is it valid to estimate the deeper layer activity using a scaled version of the Ras from the EMX signal?

      This comment is addressed in response to reviewer 1’s major point 8.

      Point 3) The authors did not discuss the results adequately in the discussion section. Since there is no behavioral paradigm and no behavioral read-out to induce or correlate it with possible planning and future decision-making process, the significance of the paper will be enhanced by discussing the possible underlying circuitry mechanism that might cause the reported observations. With no planning periods in the task (instead just sitting on a platform), it is actually quite unclear what the purpose of wake ripples should be. For example, the authors discuss the superficial and deep layer responses and their relation to the memory index theory. However, the RSC possesses different groups of excitable neurons in different layers. Specifically, three excitable neurons are found within the different layers of the RSC; the intrinsically bursting neurons (IB), regular spiking (RS), and low-rheobase (LR) neurons. These neurons are distributed heterogeneously within the RSC cortical layer. Although the RS are abundant in the deeper layers of the RSC, they occupy 40% of the total amount of excitable neurons found in layers II/III. On the other hand, the LR is the dominant excitable neuron in the superficial layers. It will add to the significance of the work if the authors discussed the results in the context of the cellular structure of the RSC and how would that impact the observed inhibition in the peri-ripple time window. It would be helpful for the readers and the reviewers to add a schematic diagram to the discussion section.

      The goal of our study was to characterize the patterns of neocortical activity around hippocampal ripples in the awake state and not shed light on the function (purpose) of awake ripples. However, we speculated about what our results could mean in the discussion section. To address the reviewer’s comment on the differences across RSC layers, the following paragraph was added to the discussion section lines 342-353.

      “Our results suggest that dendrites of deep pyramidal neurons, arborized in the superficial layers of the neocortex, receive glutamatergic modulation earlier than those of the superficial ones. However, the results do not provide a mechanistic explanation of the phenomenon. It is possible that the observed layer-dependency of the glutamatergic modulation would partially result from the heterogeneity of the excitatory as well as inhibitory neurons across aRSC layers. But, the question is how this heterogeneity may lead to the above-mentioned layer-dependency to which our data does not provide an answer. It could be speculated that the difference in the dendritic morphology and firing type of different types of RSC excitatory neurons (Yousuf et al., 2020) or the difference in connectivity of different RSC layers with other brain regions would play a role (Sugar et al., 2011; van Groen and Wyss, 1992; Whitesell et al., 2021). This is a complicated problem and could only be resolved by conducting experiments specifically designed to address this problem.”

      Point 4. A general issue (in addition to the missing behaviour), is the mix of the methods. On one side this makes the article very interesting since it highlights that with different methods you actually observe different things. But on the other side, it makes it very difficult to follow the results. It would be a major improvement of the article if the authors could include (as mentioned above) a schematic of the results and their theory, especially highlighting how the different methods would capture different parts of the mechanism. Finally, the authors should not use calcium signals as a direct measure of neuronal firing. Calcium influx is only seen in bursts of firing, not with individual spikes. It is a plasticity signal and therefore should be treated and discussed as such. Just recently it was shown by Adamantidis lab that the calcium signal changes between wake and sleep and this change does not parallel changes in neuronal firing/spikes.

      We agree with the reviewer that the calcium signal is biased toward burst of spikes (Huang et al., 2021). To address this concern, the term “spiking activity” was replaced with “calcium activity” throughout the manuscript. Moreover, the calcium signal was deconvoled to get a better estimate of the spiking activity (please refer to our response to the reviewer 1’s point 6).

      Point 5. In the discussion section, the authors focus their discussion on the connectivity between the CA1 area and the RSC. Although it is an important point, since the authors are examining the peri-ripple cortical dynamics, it is critical to discuss other possible connectivity effects. Furthermore, the hippocampal input preferentially targets the granular RSC, how would that impact the results and the interpretation of the authors? Additionally, a previous study reported the suppression of the thalamic activity during hippocampal ripples (Yang et al., 2019). Importantly, the thalamic inputs to the RSC target the superficial layers. It will add to the value of the paper if the authors expanded the discussion section and elaborated further on the possible interpretation of the results.

      At the time of our initial submission, pre-ripple reduction and post-ripple elevation of calcium activity in a portion of three subclasses of the superficial aRSC inhibitory neurons were reported (Chambers et al., 2022, 2021), and it was the basis of our speculation on the potential involvement of feed-forward inhibition in the post-ripple voltage reduction. We speculated that the source of this potential feed-forward inhibition could stem from gRSC excitatory neurons or other neocortical or subcortical regions projecting to aRSC (please see the discussion section). However, the source being from the thalamus is less likely because multiple studies have observed the suppression of the majority of thalamic neurons during awake ripples (Logothetis et al., 2012; Nitzan et al., 2022; Yang et al., 2019). Moreover, peri-awake-ripple suppression of thalamic axons projecting to the first layer of aRSC is reported (Chambers et al., 2022). On the other hand, it is also possible that feedback inhibition would be involved where the excitatory aRSC neurons that are excited by gRSC (as reviewer 1 pointed out) or any other region, including aRSC itself, excite aRSC inhibitory neurons which in turn inhibit pyramidal cells. To address this comment, the following paragraph was added to the discussion section in lines 323-328.

      “Thalamus is another source of axonal projections to aRSC (Van Groen and Wyss, 1992). However, it is less likely that thalamic projections contribute to the peri-awake-ripple aRSC activity modulation because multiple studies have observed the suppression of the majority of thalamic neurons during awake ripples (Logothetis et al., 2012; Nitzan et al., 2022; Yang et al., 2019). Moreover, peri-awake-ripple suppression of thalamic axons projecting to the first layer of aRSC is reported (Chambers et al., 2022).”

    1. Author Response

      Reviewer #3 (Public Review):

      Lillvis et al present a new method for quick targeted analysis of neural circuits through a combination of tissue expansion and (lattice) light sheet microscopy. Three color labeling is available which allows to label neurons of a molecularly specific type, presynaptic and/or post-synaptic sites.

      Strengths:

      • The experimental technique can provide much higher throughput than EM

      • All source code has been made available

      • Manual correction of automatic segmentations has been implemented, allowing for an efficient semi-automatic workflow

      • Very different kinds of analyses have been demonstrated

      • Inclusion of electrical connections is really exciting, what a great complement to the existing EM volumes!

      Weaknesses:

      • Limitations of the method are not really discussed. While the approach is simpler and cheaper than EM, it's still important to give the readers a clear picture of the use cases where it's not expected to work before they embark on the journey of acquiring tens of terabytes of data. Here are just a few examples of the questions I would have if I wanted to implement the method myself - I am a computational person and can easily imagine my "wet lab" colleagues would have even more to ask about the experimental side:

      Please see our response to the Essential Revisions (for the authors) section above in addition to the responses to each point below.

      • It is not very clear to me if the resolution of the method is sufficient to disentangle individual neurons of the same type. It has been demonstrated for a few examples in the paper, but is it generally the case? Are there examples of brain regions/neuron types where it wouldn't be possible? If another column was added to the table in Figure 1, e.g. "individual neuron connectivity", EM would be "+", LM "-", what would ExLLSM be?

      Individual neuron connectivity is possible using this current version of ExLLSM either by labeling individual neurons genetically or by manually segmenting neurons in sparsely labeled samples. Of course, the exact answer to this question depends on labeling density and sample quality, and we have added a statement to address this.

      Lines 585-591: The difficulty of such manual segmentation can vary substantially depending on labelling density and signal quality. For instance, manually segmenting individual L2 outputs (Fig. 3) took ~10 minutes/neuron whereas segmenting a pair of SAG neurons from off-target neurons (Fig. 4) took 1-5 hours depending on the sample. Of course, more densely labeled samples will take more time. Finally, while it is possible to segment individual neurons from entangled bundles as shown here and elsewhere (Gao et al., 2019), the expansion factor will need to be increased by an order or magnitude or more and neuron labels must be continuous to approach EM levels of reconstruction density.

      • Similarly, the procedures for filling gaps in the signal could result in falsely merged neurons. Does it ever happen in practice?

      Because the gap filling process is not utilized until after semi-automatic segmentation this was not a concern (the gaps were filled on manually inspected neuron masks that should only include signals from the neuron(s) of interest). This would certainly be a concern if we were using this gap filling step – or the fully automated neuron segmentation approach – to segment individual neurons from samples in which off-target neurons are also labeled, but that was not the case here.

      • How long does semi-manual analysis take in person-hours/days for a new biological question similar in scope to the ones demonstrated in the paper?

      The statement discussed above (lines 585-591) and an additional statement (lines 581-583) aim to address this.

      Lines 580-582: As such, analyzing the DA1-IPN data, for example, required relatively little human time. The semi-automatic neuron segmentation steps required a maximum of one hour per sample and all other steps are automated.

      • How robust are the networks for synaptic "blob" detection? The authors have shown they work for different reporters, when are they expected to break? Would you recommend to retrain for every new dataset? How would you recommend to validate the results if no EM data is available?

      We expect that the network for blob detection is quite robust as it essentially acts as high signal detector for punctate signals, as opposed to classifying a high-level shape or structure. We have modified the text to suggest that the synapse and neuron segmentation models we include be attempted before automatically retraining.

      Lines 368-372: Furthermore, the convolutional neural network models for synapse and neuron segmentation are classifiers of high signal punctate and continuous structures, respectively. As such, the models may already work well for segmenting similar structures from other species or microscopes. If not, these models can be retrained with a suitable ground truth data set and the entire computational pipeline can be applied to these new systems.

    1. Author Response

      Reviewer #2 (Public Review):

      Burger et al. present their compilation of 3 well established cervical cancer natural history micro-simulation models from the US (Harvard), Denmark (Miscan) and Australia (Policy 1) to evaluate what effect Covid, or any systemic problem impeding screening over a time duration, will have on cervical cancer incidence ("symptomatic cervical cancer") in the short and long term. They use the United States for the modeling example and establish that a temporary screening delay has less deleterious effect on cervical cancer incidence and morbidity than being under-screened. Screening test and previous screening frequency also impact on the outcomes.

      The authors evaluate a number of factors in their analysis:

      1. Screening test type: HPV (every five years) or cytology (every three years), as per guidelines.

      2. Screening delay such as with Covid: 1, 2, or 5 years from the participant's last screening encounter.

      3. The participant screening frequency: 1, 3, 5 or 10 yearly.

      4. Three birth cohorts: 35yo, 45yo and 55yo in 2020.

      As the Covid pandemic meant a delay of months toward a year, a key finding was the projected relative increase in symptomatic cervical cancer cases from a year delay which varied from 38% higher with Policy 1, to 170% higher with Miscan-Cervix. The comparison was for women who had not had cytology screening for 5 years before the delay versus those appropriately screened at 3 years. In the long term, over a lifetime, a one year (up to 5 year) delay, had less effect on developing cervical cancer than screening frequency or test type. This finding is reassuring for the general public. Most importantly, however, the authors showed that not being screened for a long duration (underscreened) is the most significant factor to developing cervical cancer, especially with a further systemic delay such as Covid. Being under, or never screened, is a clinically well known fact in the cervical cancer screening community. HPV screening type was also shown to be more protective against developing cervical cancer given its superior sensitivity for longer duration over cytology allowing HPV to be done every 5 years versus cytology at 3 years.

      The strength of the paper is showing the above findings through the multiple permutations of effects in detailed analytic tables for quantitative mathematical modeling experts, and summary figures simple enough for a more general reader to follow. The variation in results among the models was explained well with most of the difference due to the "dwell" time before an HPV infection develops into precancer and cancer, the Miscan model having the shortest dwell time and thus some of the higher relative rate and absolute increases in cancer. The authors emphasized that "heterogeneity" in screening history could be due to socioeconomic factors that aren't directly evaluated in the model, but women with greater socioeconomic barriers, tend to be those that are under-screened and most at risk of developing cervical cancer.

      The results are grouped into short and long term impacts. For the short term impact, the authors concentrated on showing excess cancer in women screening less frequently than guidelines, and used cytology every 3 years as a baseline. So women screening 5 or 10 years before disruption, did worse than q3 yearly guideline compliant women. There was then discussion about guideline compliant HPV screening which is done every 5 years, so only the 10 year group was non compliant. The authors discuss changing to HPV at 30 yo. Without knowing the actual guidelines for screening in the US, this section can be a bit confusing for the reader. It would be very helpful if the authors clearly state that cytology is offered every 3 years to women under 30, starting at 21 yo, and then HPV is offered q 5 yearly from 30yo. Alternatively, q3 yearly cytology can be done throughout a screening lifetime. This background information makes the short term results clearer to understand. The Figure is helpful and clear for interpretation.

      For the long term impacts, the authors are able to show that a temporary disruption in screening is less deleterious than overall poor screening history (not following guidelines). They also show that HPV testing from age 30 is better at preventing cervical cancer than 3 yearly cytology, and had less impact from a screening delay. (the notation to figure reads right panel but is likely Lower panel).

      Thank you for flagging this typo; we have changed the notation.

      Overall, the authors clearly show the effects of a temporary screening disruption in the context of a women's overall screening history, frequency and test node.

      This work is very relevant and timely in the cervical screening field and emphasizes the importance of assuring women are not under-screened, the greatest risk factor for cervical cancer. They give a comprehensive discussion of how their results are relevant for cervical cancer screening today and in the near future.

      Thank you for the nice summary and feedback.

      As alluded to earlier, clarification about the age related switch to HPV testing at 30yo would help the reader better understand the point about the two factors having to be balanced when considering HPV testing. Are the two factors the greater protective test sensitivity vs the benefit of the actual screening moment? This section was slightly confusing.

      In addition to the request for a more complete description of the US guidelines (addressed in Essential Revisions), we have clarified the description of the “2 competing factors” on page 6-7 of the manuscript.

      “Similarly, we found that the impact of disrupting an HPV-based screening program has different implications than the disruption of a cytology-based program. This can be explained by the fact that HPV screening has a higher sensitivity to detect (pre-invasive) cervical lesions than cytology; therefore, the cancer risk at time of disruption is lower (as there are fewer undetected lesions) and this may provide a greater buffer to endure temporary disruptions. On the other hand, in case of the more sensitive HPV test, disruption takes away a relatively more valuable (i.e., sensitive) screening moment. The balance between these two factors causes a greater or smaller excess risk per delay duration in case of HPV screening compared to cytology screening, which contributes to the within model differences of cytology-based versus HPV-based screening in Table 1. If in a model the first effect (HPV screening contributes to lower risk at the time of disruption) is larger than the second effect (removal of a valuable screening moment), disruption of the HPV program would have a smaller effect than that of the cytology program, which is the case for all screening frequencies in both the Harvard and Policy1-cervix models and the annual screeners in MISCAN-cervix (Table 1). The MISCAN-Cervix model predicted relatively more excess cancers for women screened with HPV 3-yearly, 5-yearly or 10-yearly due to disruptions, where delaying a more sensitive test (the second factor) seems to outweigh the first (less underlying disease at the time of the disruption). Differences in dwell time for HPV and cervical precancer among the three models predominately contributes to this balance between the two factors (Appendix), where the MISCAN-Cervix model has the shortest preclinical dwell time from HPV acquisition to cancer development (20). In addition to the shorter dwell times, the MISCAN model also assumes that some precancerous lesions are structurally missed over time by cytology-based screening because they are located deeper into the cervical canal. For women with such lesions, missing a cytology screen due to a disruption is less harmful, which increases the relative difference between primary cytology and primary HPV screening in case of a disruption, and increases the effect of women missing a very sensitive screen (second factor).”

      The authors speak to self collection as a potential solution for some underscreened women (people with cervix). It would be important to outline how self sampling is actually done. Some people believe cytology can be done on self sample. Self sampling can also include urine HPV, thus some detail about self sampling in the discussion would be helpful and give another benefit to HPV testing (DNA based for self sampling).

      Although there is research interest in urine-based HPV screening, this is not yet at the point where it has been widely implemented in screening programs; however, we agree some additional information on self-sampling would be helpful for the general reader.

      On page 7 we have added, “Importantly, vaginal HPV-based screening (unlike cytology-based screening) enables self-collection of samples at home, which may provide a tool to reduce screening barriers and facilitate outreach to under-screened people who are also most vulnerable to screening disruptions.”

    1. Author Response

      Reviewer #1 (Public Review):

      Anopheles is an important disease vector and the efforts to characterize the extent of genetic variation in the system are welcome. In this piece, the authors propose a Variational Autoencoders method to assign species boundaries in a large sample of Anopheles mosquitoes using a panel of 62 nuclear amplicons. Overall, the method performs well as it can assign samples to an acceptable granularity. The main advantage of the method is that it takes reduced representation genome sampling which should cut costs in genotyping. The authors do not compare the effectiveness of their amplicon panel with other approaches to do reduced representation sequencing, or the computational method with other previously published methods. Additionally, the manuscript does not clearly state what is the importance of species assignments and the findings/method are -by definition- limited to a single biological system.

      It is important to draw the reviewer’s attention to the fact that this is a two part approach – the reviewer seems to have overlooked the Nearest Neighbour component of the work. The approach is not solely a VAE – the VAE only comes into play at the species complex level. The higher level assignments are done using NN approaches.

      The manuscript has three main limitations. First, there is no explicit test of the performance of ANOSPP compared to other methods of low-dimensional sampling. While the authors state that the ANOSPP panel will lead to genotyping for low cost (justifiably so), there is no direct comparison to other low-representation methods (e.g., RAD-Seq, MSG).

      The key advantage of ANOSPP is that it works on the entire Anopheles genus; while the other suggested sequencing methods are more applicable to a group of specimens of the same or closely related species. The purpose of the panel is to do species identification for the whole genus; so it really is an alternative to the current methods of species identification, which commonly consists of morphological identification of the species complex, followed by complex-specific PCR amplification of a single species-diagnostic locus. The only other species identification method for Anopheles that is not limited to a single species complex, that we are aware of, is a mass spectrometry approach (Nabet et al. Malar J, 2021); however, they only investigate three different species and reach a classification accuracy of at most 67.5%.

      The main advantage of ANOSPP over other reduced representation sequencing methods, like MSG and RAD-Seq, is that it is specifically designed to work for the entire Anopheles genus to support genus-wide species identification. In a genus comprising an estimated 100 million years of divergence, a sequencing approach that relies on restriction enzymes is likely to introduce a lot of variability in which parts of the genome are sequenced for different species. Moreover, both MSG and RAD-Seq typically map the reads to a reference genome; any choice of reference genome will likely introduce considerable bias when dealing with such diverged species. In general, the sequence data generated by those sequencing methods require more complicated and labour intensive processing. And lastly, the costs per sample for library preparation and sequencing are substantially lower with ANOSPP than with MSG and RAD-Seq: for library prep <1 USD (ANOSPP) versus 5 USD (RAD-Seq) (Meek and Larson, Mol Ecol Resour, 2019) and with 768 samples (ANOSPP), 384 samples (MSG; Andolfatto et al, Genome Res., 2011) and 96 samples (RAD-Seq; Meek and Larson, Mol Ecol Resour, 2019) per run.

      Second, and on a related vein, the authors present NNoVAE as a novel solution to determine species boundaries in Anopheles. Perusing the very references the authors cite, it is clear that VAEs have been used before to delimit species boundaries which diminishes the novelty of the approach on its own.

      The VAE is only a part of the method presented in this manuscript. We believe a substantial amount of the value of NNoVAE lies in its ability to perform assignments for the entire Anopheles genus comprising over 100 MY of divergence - the closest analogous approach would be COI or ITS2 DNA barcoding, neither of which is robust for species complexes. Using NNoVAE, samples are first assigned to their relevant groups, and in many cases to their species, by the Nearest Neighbour method. Only those samples that are identified by the Nearest Neighbour method as members of the An. gambiae complex and cannot be unambiguously assigned to a single species, are passed through the VAE assignment method.

      Indeed, in (Derkarabetian et al, Mol Phylogenet Evol, 2019) VAEs are used to delimit species boundaries in an arachnid genus. However, this study works with ultra conserved elements, incorporating a total of 76kB of sequence, which is much more data than the approximately 10kB we get for all amplicons combined. Moreover, a crucial difference is that the referenced work uses SNP calls, based on alignment to one of their sequenced samples, as input for the VAE, where our VAE takes k-mer based inputs. This is also an important consideration in working with a large number of highly diverged species.

      Perhaps more importantly, the manuscript does not present a comparison with other methods of species delimitation (SPEDEStem, UML -this approach is cited in the paper though-), or even of assessment of population differentiation, such as STRUCTURE, ADMIXTURE, or ASTRAL concordance factors (to mention a few among many). The absence of this comparative framework makes it unclear how this method compares to other tools already available.

      NNoVAE is primarily a method for species assignment rather than for species delimitation. SPEDEStem addresses the question whether different groups of samples are separate species or not; different groups can be defined by e.g. described races, described subspecies, different morphotypes or different collection locations. The aim of ANOSPP and NNoVAE is to remove the necessity of any prior sorting of samples into groups – all that needs to be known is that the sample is an Anopheline. This avoids the issues associated with morphological identification and single marker molecular barcodes. So to perform species assignment with SPEDEStem, we’d have to run many replicates, each time asking whether a single sample is of the same species as one of the species represented in our reference database. For example, for the 2218 samples presented in the case studies, we would have to run SPEDEStem more than 130,000 times, to check for each of these samples whether they are any of the 62 species represented in the reference dataset NNv1.

      However, we agree that it would be good to check that the species-groups in the reference database, NNv1, are indeed supported as separate species. We attempted to run SPEDEStem, but the web browser no longer exists, and we were not able to install the command line application, which runs on Python 2. Moreover, the example files provided in the tutorial are not complete. Therefore, we were unable to even carry out this basic comparison.

      UML (unsupervised machine learning) approaches comprise quite a wide range of methods, including VAE. We have conducted a comparison between the VAE assignments and assignments based on UMAP, for the discussion see below and page 20 in the manuscript and newly added supplementary information section 4.

      As requested by the reviewer, we have compared our assignment approach to ADMIXTURE on the Anopheles gambiae complex training set (see Supplementary information section 5). It is a good sanity check to compare the structure revealed by ADMIXTURE to the structure revealed by the VAE. We found that ADMIXTURE does not satisfyingly differentiate between the species in the complex that are only represented by a handful of samples, while the VAE suffers much less from the differences in group sizes in the training set. Moreover, we want to point out that ADMIXTURE is a tool for assessing population differentiation, not for species assignment. To use it as an assignment method, there are two options: either infer the allele frequencies in the ancestral populations from the training set and use those to compute the maximum likelihood of ancestry frequencies for the test set; or run ADMIXTURE on the training and test sets combined and use the labels from the training set to label ancestral populations. A major drawback from the former approach is that it is tricky to discover cryptic taxa or outliers in the test set; while with the second approach we create a dependency of the training set results on the test set it is combined with during the run. But more importantly, ADMIXTURE performs worse than the VAE on the An. gambiae complex training set by itself; and identifies only two to three different groups among the five diverged species (An. melas, An. merus, An. quadriannulatus, An. bwambae and An. fontenillei). For more information, see page 20 in the manuscript and newly added supplementary information section 5

      One important use case of our method is to identify interesting samples, e.g. potential hybrids or cryptic taxa, for subsequent whole genome sequencing. After selection and whole genome sequencing of interesting samples detected by ANOSPP+NNoVAE, ADMIXTURE may be useful as one of the tools to investigate such samples.

      A final concern is less methodological and more related to the biology of the system. I am curious about the possibility of ascertainment bias induced by the amplicon panel. In particular, the authors conclusively demonstrate they can do species assignment with species that are already known. Nonetheless, there is the possibility of unsampled species and/or cryptic species. This later issue is brought up in passing the 'Gambiae complex classifier datasets' section but I think the possibility deserves a formal treatment. This is particularly important because the system shows such high levels of hybridization that the possibility of speciation by admixture is not trivial.

      We appreciate the reviewer’s concern regarding ascertainment bias in the amplicon panel. The targets have been selected based on multiple sequence alignments of all Anopheles reference genomes at the time (Makunin et al. Mol Ecol Resour, 2022). Using sequenced species from four different subgenera, the species span a considerable amount of evolutionary time in the Anopheles genus. For all species we have since tested the panel on, we find that at least half of the targets get amplified.

      We share the reviewer’s concern regarding species which are not (yet) represented in the reference database. This is one of the main advantages of the Nearest Neighbour method: it works on three levels of increasing granularity. So for samples that cannot be assigned at species level, we are often able to identify the group of species from the reference database it is closest to. In particular, the situation of a test sample whose species is not represented in the reference database, is mimicked in the drop-out experiment by the species-groups which contain only one sample. On page 16 in the manuscript, we explain how NNoVAE deals with such samples and we show that in the majority of cases NNoVAE assigns the sample to a group of closely related species rather than misclassifying it more specifically to the wrong species.

      In summary, the main limitation of the manuscript is that the authors do not really elaborate on the need for this method. The manuscript does show that the method is feasible but it is not forthcoming on why this is of importance, especially when there is the possibility of generating full genome sequences.

      ANOSPP and NNoVAE are specifically designed for high throughput accurate species identification across the entire Anopheles genus – WGS is important to address many questions, but is complete overkill for doing species identification. ANOSPP costs only a small fraction of whole genome sequencing, which makes it possible to monitor mosquito populations at much larger scale (e.g., in partnership with our vector biologist collaborators in Africa, we have already generated ANOSPP data for approximately 10,000 mosquitoes and will be running 500,000 over the next few years). Moreover, for most analyses using whole genome sequencing, a reference genome of a sufficiently similar species is required. While we are in a position of privilege having reference genomes for more than 20 species in Anopheles, we have a long way to go before we have 100s of reference genomes covering the true diversity of the genus.

      NNoVAE can also be used to select interesting samples (e.g. species that have not been through the panel before, divergent populations, potential hybrids), which can be submitted for whole genome sequencing subsequently.

      Since Anopheles is arguably one of the most important insects to characterize genetically, the ANOSPP panel is certainly important but I am not completely sure the method of species assignment is novel or groundbreaking .

      Reviewer #2 (Public Review):

      The medically important mosquito genus Anopheles contains many species that are difficult or impossible to distinguish morphologically, even for trained entomologists. Building on prior work on amplicon sequencing, Boddé et al. present a novel set of tools for in silico identification of anopheline mosquitoes. Briefly, they decompose haplotypes generated with amplicon sequencing into kmers to facilitate the process of finding similar sequences; then, using the closest sequence or sequences ("nearest neighbors") to a target, they predict taxonomic identity by the frequency of the neighbor sequences in all groups present in a reference database. In the An. gambiae species complex, which is well-known for its historical and ongoing introgression between closely-related species, this approach cannot distinguish species. Therefore, they also apply a deep learning method, variational autoencoders, to predict species identity. The nearest neighbor method achieves high accuracy for species outside the gambiae complex, and the variational autoencoder method achieves high accuracy for species within the complex.

      The main strength of this method (along with the associated methods in the paper on which this work builds) is its ability to speed up the identification of anopheline mosquitoes, therefore facilitating larger sample sizes for a wide breadth of questions in vector biology and beyond. This technique has the added advantage over many existing molecular identification protocols of being non-destructive. This high-throughput identification protocol that relies on a relatively straightforward amplicon sequencing procedure may be especially useful for the understudied species outside the well-resourced gambiae complex.

      An additional and intriguing strength of this method is that, when a species label cannot be predicted, some basic taxonomic predictions may still be made in some cases. Indeed, even in the case of known species, the authors find possible cryptic variation within An. hyrcanus and An. nili, demonstrating how useful this new tool can be.

      The main weakness of this method is that, as the authors note, accuracy is dependent on the quality and breadth of the reference database (which in turn relies on the expertise of entomologists). A substantial portion of the current reference database, NNv1, comes from one species complex, An. gambiae. This is reasonable given the complex's medical importance and long history of study; however, for that same reason, robust molecular and computational tools for identifying species in this complex already exist. The deep learning portion of this manuscript is a valuable development that can eventually be applied to other species complexes, but building up a sufficient database of specimens is non-trivial. For that reason, the nearest neighbor method may be the more immediately impactful portion of this paper; however, its usefulness will depend on good sampling and coverage outside the gambiae complex.

      Another potential caveat of this method is its portability. It is not clear from either the manuscript or the code repository how easy it would be for other researchers to use this method, and whether they would need to regenerate the reference database themselves. The authors clearly have expansive and immediate plans for this workflow; however, as many researchers will read this manuscript with an eye towards using these methods themselves, clarifying this point would be valuable.

      This is an important point; currently the amplicon panel is only run on specialised robots, but we are working to adapt the protocol so that it can be run in any basic molecular lab. We have now clarified this in the conclusion. Furthermore, there is never a need to regenerate the reference databases – this is fully publicly available at github.com/mariloubodde/NNoVAE and version controlled. As we obtain ANOSPP data from additional samples, representing new species or new within-species diversity, we will add these to the reference database and create an updated openly available version.

      The authors present data suggesting that their method is highly accurate in most of the species or groups tested. While the usefulness of this method will depend on the reference database, two points ameliorate this potential concern: it is already accurate on a wide breadth of species, including the understudied ones outside the An. gambiae complex; additionally, even when a specific species identification cannot be made, the specimen may be able to be placed in a higher taxonomic group.

      Overall, these new methods offer an additional avenue for identifying anopheline species; given their high-throughput nature, they will be most useful to researchers doing bulk collections or surveillance, especially where multiple morphologically similar species are common. These methods have the potential to speed up vector surveillance and the generation of many new insights into anopheline biology, genetics, and phylogeny.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The relationship between lobe cannibalism and mtDNA reduction seems to be too mild. The authors first show that about half of mitochondria are removed in PGCs between the embryo and L1 stages. At this point, the number of mitoDNA/cell decreases by half compared to the embryonic stage, and based on this result, they propose that this is a bottleneck. To me (intuitively) 50% reduction does not seem strong as a bottleneck. Perhaps it is better to tone down the claim a bit here (unless they can provide stronger evidence, such as modeling, that a 50% reduction is sufficient to cause a bottleneck. Textual editing would suffice, though (unless they already have the evidence for bottleneck).

      We used the term ‘bottleneck’ to indicate the point when mtDNA numbers are at their lowest point (prior to mtDNA expansion) that we could detect in the early germ line lineage, which results from a combination of reductive cell divisions to get to the PGC lineage, followed by lobe cannibalism and autophagy for a final two-fold reduction. The ~150-fold total reduction we detect in PGC mtDNA number relative to the estimated number of mtDNAs present in the oocyte (inferred from analyzing whole early embryos) is comparable to the ~100-fold reduction in mtDNAs that occurs between the oocyte and early PGCs in mouse, which has been proposed to be a germline mtDNA genetic bottleneck based on computer simulation studies (Cree et al., 2008, NCB 40: 249-254). In addition, the number of mtDNAs we detect per PGC (~200) at its low point in L1 larvae is comparable to the number of mtDNAs in mouse PGCs purified using two different reporter transgenes (280 mtDNAs per PGC, Wai et al., 2008, NCB 40: 1484-1488; 203 mtDNAs per PGC, Cree et al., 2008, NCB 40: 249-254). However, we agree with Reviewer 1 that our data does not show whether the 150-fold total reduction in mtDNA number we detect in PGCs relative to the oocyte has a functional consequence on segregation of mtDNA mutations in the germ line. To clarify what we have shown and what we propose based on findings and simulations in other systems, we refer to the number of mtDNAs in L1 PGCs as a ‘low point’ and introduce in the discussion how this reduction could affect segregation and inheritance of mtDNA mutations (pg. 14, lines 333-339).

      2) Overall, one thing that struck me was that, when they assay 'selection' by mtDNA (e.g. the number of mtDNA, frequency of mutant mtDNA, reduction by autophagy pathway, reduction by pink1, etc), the effect seems to be way too mild. However, in Fig1c, d, and Fig2c, the amount of mitoGFP that goes to the lobe seems to be at least 80-90%. Is this because the 'striking' images were selected for presentation? Alternatively, I wonder if mito with more mtDNA actually end up surviving, and mito with fewer mitoDNA goes to the lobe (as a result, the amount of mito removed to the lobe is much higher than the amount of mtDNA removed). If so, is this actually THE selection that happens during embryo-to-L1 transition? Is there any way to measure the amount of mito and amount of mtDNA simultaneously?

      Thank you for bringing up this point. Reviewer 1 is correct that 80% of mitochondria are in lobes initially, so the images in Figures 1 and 2 are representative. However, prior to lobe scission, some mitochondria move back into the cell body, such that only 60% are in lobes at the two-fold stage (we have not performed this analysis even later, just as cannibalism begins between the two and three-fold stage, because of embryo rotation in the eggshell). This was shown and quantified in our original Fig. S1. To stress this point, the quantification of this data has been moved to Figure 1, while representative images remain in the supplement (Figure 1—figure supplement 1) (See Figure 1F). The movement of mitochondria back into the cell body is an avenue that we plan to explore in future studies, although we feel that it is beyond the scope of the current manuscript.

      Although we have not quantified mtDNA distribution due to the challenges of imaging late embryos, we have no evidence that there is a significant asymmetry in mtDNA density (mtDNAs per unit of mitochondrial mass) between lobe and cell body mitochondria; mtDNAs are distributed among mitochondria in both lobes and the cell body (see Figure 1I). Our experiments on uaDf5 mutant mtDNA also show that even if a small asymmetry in uaDf5 were present, it is not responsible for selecting against uaDf5 mtDNAs since uaDf5 mtDNA heteroplasmy in PGCs still decreases between embryo and L1 in nop-1 mutants, though we cannot exclude the possibility that a small asymmetry exists.

      Reviewer #2 (Public Review):

      Major points:

      1) I wish that the authors provided more direct evidence to support their conclusion that there is no mtDNA replication in embryonic PGCs and mtDNA only starts to replicate before the first division of PGCs in early L1.

      See essential revisions.

      2) It will also be interesting to show how compromising cannibalism (e.g. using nop1 mutant) affects the replication of mtDNA after the first division of PGCs in L1.

      See essential revisions.

      3)Finally, given that the total mtDNA copy number in later GSCs is similar between worms with and without the PGC cannibalism (wt vs nop-1 mutant) (Fig 3), and cannibalism does not selectively eliminate detrimental mtDNA mutation, I also wonder why PGCs need a bottleneck for the mtDNA population.

      We also do not fully understand why PGC lobe cannibalism is necessary. However, PGCs are born with a relatively high number of mtDNAs, as they arise from a small number of invariant cell divisions during embryogenesis (5) relative to somatic cells (on average ~8); lobe cannibalism could be a way to eliminate this excess to reach ~200 before PGCs differentiate into GSCs in larvae. Our experiments on nop-1 mutants clearly show that this number is important, as it is achieved through an independent mechanism even when lobe cannibalism is blocked. We have dedicated an entire paragraph to discussing these important points (pg. 13-14, lines 326-339).

    1. Author Response

      Reviewer #3 (Public Review):

      Understanding the relevance of skewed X-Chromosome Inactivation (XCI) in women disease susceptibility and development is an intriguing open question. In this manuscript entitled "Age acquired skewed X Chromosome Inactivation is associated with adverse health outcomes in humans" Roberts et al. attempt to characterize this relationship by assaying skewed X-Chromosome Inactivation in >1.500 females from the TwinsUK population cohort. The authors reported an association between skewed XCI and increased cardiovascular risk across the tested population. This association is reinforced by a twin study based on age matched twin pairs discordant in their degree of XCI skewing. This approach is indeed powerful as it controls for age in predicting the cardiovascular disease risk score. The authors also found an association between skewed XCI and a haematopoietic bias towards the myeloid lineage. Finally, skewed XCI was shown to be predictive of future cancer incidence. -This area of research is timely and of great interest for the community. However, in my opinion, the conclusions of this manuscript are not fully supported by the presented data and some aspects of the data analysis and results need to be extended.

      We thank the reviewer for their kind comments on the importance of our study. We do hope they agree that, through addressing the necessary changes outlined above, particularly with regards to the discussion, that the conclusions of the manuscript are now more nuanced and present the work in the broader context of related fields, in particular clonal haematopoiesis. Overall, we hope the reviewer agrees that the conclusions of the manuscript now better reflect the results presented.

    1. Author Response

      Reviewer #1 (Public Review):

      Liu et. al. applied an existing method to study the subtypes of CRC from a network perspective. In the proposed framework, the authors calculated the perturbation of expression-rank differences of predefined network edges in both tumor and normal samples. By clustering the derived perturbation scores in CRC tumors using publicly available gene expression datasets, they reported six subtypes (referred to as GINS 1-6) and then focused on the association of each subtype with clinical features and known molecular mechanisms and cell phenotypes. My recommendation is major revision.

      Major concerns:

      (1) While this study originates from the network-perspective, it is unclear to me if the new subtypes provide key novel insights into the gene regulatory mechanisms for the development of CRC. For example, the "Biological peculiarities of six subtypes" section is descriptive and lacks a punch point.

      Thanks for your professional suggestions. In this study, we focused on the global network perturbations instead of snapshot transcriptional profiles, because snapshot transcriptional profiles largely ignore the dynamic changes of gene expressions in a biological system, and conversely, biological networks remain relatively stable irrespective of time and condition. In this perturbation network, we only use the global network perturbation matrix to perform consensus clustering, rather than for exploring the gene regulatory mechanisms of each subtype. However, subtype-related studies tend to investigate the biological characteristics of each subtype.

      Thus, we then delineate the biological attributes inherent to GINS subtypes using two different algorithms (SSEA and GSVA). These works were done to understand the underlying biological characteristics of these subtypes and define them biologically, similar to previous subtype studies (PMID: 31563503; 31875970; 26457759; 30833271; 30842092; 32164750; 30837276). As you commented, the section is descriptive and lacks a punch point. Hence, we highlighted potential transformation among GINS2/4/5. In this study, GINS2 was endowed with higher stromal activity and lower immune activity, whereas GINS5 conveyed the opposite trend entirely, concordant with the tumor invasiveness and prognosis of two subtypes, and GINS4 was characterized by a mixed phenotype that displayed moderate level of stromal and immune pathways. As three subtypes with abundant TME components, GINS2/4/5 may mutually evolve in stromal and immune functions. Thus, we intended to extract consistently upregulated and downregulated genes among these three subtypes, using Mfuzz package, a noise-robust soft clustering analysis with the fuzzy c-means form(Kumar and M, 2007). The Mfuzz analysis revealed 10 gene clusters, and gene cluster 3 and 10 displayed the stable expression pattern from GINS2 to GINS5 (Figure 5C and Supplementary File 8). As expected, gene cluster 3 was prevailingly associated with immune infiltration and activation (Figure 5D), whereas gene cluster 10 was prominently characterized by stromal activation and remodeling (Figure 5E), which further supported our findings. This also indicated that TME had profound impacts on the progression and prognosis of tumors, and GINS2/5 acted as two extremes of TME components, indeed showing diametrically opposite clinical outcomes (Red mark in “Biological peculiarities of six subtypes” part). Subsequently, we further investigate the immune regulations of GINS subfamilies. We found that GINS5 was also characterized by higher immune infiltration and stronger immunogenicity based on the transcriptome and proteome analysis. For example, GINS5 harbored remarkably higher tumor mutation burden (TMB) and neoantigen load (NAL) (P <0.001, Figure 6C), possibly further inducing abundant immune elements and regulations. GINS5 also possessed the abundant infiltration of Th1, Th2, and M1 macrophages(Mills et al., 2016) (Figure 6-figure supplement 1A-C), which could secrete proinflammatory cytokines and enhance immune activation. Conversely, M2 traditionally regarded as promoting tumor growth by suppressing cell-mediated immunity and subsequent cancer cell killing(Mills et al., 2016), was significantly elevated in GINS2 (Figure 6-figure supplement 1D). In line with this, three other classical immunosuppressive cells, including fibroblasts, myeloid-derived suppressor cells (MDSC), and Treg cells(Hicks et al., 2022), were also significantly enriched in GINS2.

      Additionally, in the “GINS6 tumors conveyed rich lipid metabolisms” part, we further observed that lipid metabolisms were the most significant metabolic processes in GINS6. Metabolomics analysis suggested that GINS6 exhibited higher levels in four fatty acids including α-glycerophosphate, adipate, taurocholate, and aconitate. These findings validated that GINS6 was closely associated with metabolic reprogramming and accumulated fatty acids.

      Overall, it is difficult to profoundly investigate the underlying biological mechanisms of all subtypes in a paragraph, so we first used the ‘Hallmark’ genesets to preliminarily explore the biological characteristics of these subtypes, thus giving us inspiration and direction for further exploration, in fact the following studies in this part are refinement and deepening of this part.

      Thank you for your academic discussion with us.

      (2) To further demonstrate the novelty of the identified subtypes, the authors need to show the additional benefit of the GINS1-6 to patient stratification derived from existing methods, such as integrative clustering based on multiple genomic evidence (copy number alterations, gene expression and somatic mutations).

      Thanks for your thoughtful comments. We wanted to clarify this issue from the following three aspects:

      1) First of all, the basis that inspired us is that the global network perturbations have advantages over snapshot transcriptional profiles (main traditional methods in CRC), because snapshot transcriptional profiles largely ignore the dynamic changes of gene expressions in a biological system, and conversely, biological networks remain relatively stable irrespective of time and condition. The gene interactions in a biological network are overall stable in a particular type of normal human tissue but widely perturbed in diseased tissues (PMID: 29040359 and 25165092). These perturbations in gene interactions (edge perturbations) in each sample can be measured by the change in the relative gene expression value. The edge perturbations at an individual level can be used to characterize the perturbation of the biological network for each sample efficiently. Thus, this is the starting point for cancer clustering in this study.

      2) Second, the essence of molecular clustering is to investigate tumor heterogeneity. In order to detect multiple subtypes (some of which may represent relatively small fractions of the patient population) (PMID: 23584089), the clustering methods require moderately large numbers of samples – more than contained in any one of the individual CRC data sets published to date. With that in mind, we began our analysis by identifying suitable and comparable microarray datasets (n=2167, Supplementary File 15). The sample number in our discovery dataset is the largest among the current CRC subtype-related studies. For multi-omics clustering, there is currently no multi-omics sequencing cohort with a large number of samples and good sequencing quality, only the TCGA-CRC cohort has eligible multi-omics data (only less than 300 patients with multi-omics data). Therefore, subtypes represent relatively small fractions of the patient population cannot be detected.

      3) Third, we actually tested several methods and datasets before determining GINS subtypes. Clustering always divides tumors into several subgroups, but we expect these subgroups to reproduce in other cohorts. Thus, we need to validate the robustness of our subtypes in multiple independent cohorts. Our validation works focused on the following four contexts: (1) data from the same platform (GPL570); (2) data from different platforms and sequencing techniques (microarray or RNA-seq); (3) microdissected or whole tumors; (4) in-house clinical setting. However, as mentioned above, only TCGA-CRC has data, so a rigorous verification cannot be carried out, so more rigorous verification cannot be carried out.

      Thank you for your academic discussion with us.

    1. Author Response

      Reviewer #1 (Public Review):

      Huang et al. sought to study the cellular origin of Tuft cells and the molecular mechanisms that govern their specification in severe lung injury. First the authors show ectopic emergence of Tuft cells in airways and distal parenchyma following different injuries. The authors also used lineage tracing models and uncovered that p63-expressing cells and to some extent Scgb1a1-lineaged labeled cells contribute to tuft cells after injury. Further, the authors modulated multiple pathways and claim that Notch inhibition blocks tuft cells whereas Wnt inhibition enhances Tuft cell development in basal cell cultures. Finally, the authors used Trpm5 and Pou2f3 knock-out models to claim that tuft cells are indispensable for alveolar regeneration.

      In summary, the findings described in this manuscript are somewhat preliminary. The claim that the cellular origin of Tuft cells in influenza infection was not determined is incorrect. Current data from pathway modulation is preliminary and this requires genetic modulation to support their claims.

      We thank the reviewer for the comments and we have performed extensive experiments to address the reviewer’s comments. In the revised manuscript we provide additional data including genetic modulation findings to support our model.

      Major comments:

      1) The abstract sounds incomplete and does not cover all key aspects of this manuscript. Currently, it is mainly focusing on the cellular origin of Tuft cells and the role of Wnt and notch signaling. However, it completely omits the findings from Trpm5 and Pou2f3 knock-out mice. In fact, the title of the manuscript highlights the indispensable nature of tuft cells in alveolar regeneration.

      We have modified the abstract and title accordingly.

      2) In lines 93-94, the authors state that "It is also unknown what cells generate these tuft cells.....". This statement is incorrect. Rane et al., 2019 used the same p63-creER mouse line and demonstrated that all tuft cells that ectopically emerge following H1N1 infection originate from p63+ lineage labeled basal cells. Therefore, this claim is not new.

      We thank the reviewer’s comment. Although Rane et al. reported the p63-expressing lineage-negative epithelial stem/progenitor cells (LNEPs) could contribute to the ectopic tuft cells after PR8 virus infection, it is still not clear whether the p63+ cells immediately give rise to tuft cells or though EBCs. Thus, we performed TMX injection after PR8 infection, different from Rane et al (Rane et al., 2019). who performed Tmx injection before viral infection to indicate the ectopic tuft cells are derived from EBCs, as shown in revised Figure 2.

      3) Lines 152-153 state that "21.0% +/- 2.0 % tuft cells within EBCs are labeled with tdT when examined at 30 dpi...". It is not clear what the authors meant here ("within EBC's")? And also, the same sentence states that "......suggesting that club cell-derived EBCs generate a portion of tuft cells....". In this experiment, the authors used club cell lineage tracing mouse lines. So, how do the authors know that the club cell lineage-derived tuft cells came through intermediate EBC population? Current data do not show evidence for this claim. Is it possible that club cells can directly generate tuft cells?

      We apologize for the confusion and revised the text accordingly. Here, “within EBCs” means within the “pods” area where p63+ basal cells are ectopically present. The sentence is revised as “21.0% +/- 2.0 % tuft cells that are ectopically present in the parenchyma are labeled by tdT. Notably, these lineage labeled tuft cells were co-localized with EBCs.” We don’t know whether the club cell lineage-derived tuft cells transit through intermediate EBCs and that is why we use “suggest”. It is also possible that club cells can directly generate tuft cells. To avoid the confusion, we delete the sentence.

      4) Based on the data from Fig-3A, the authors claim that treatment with C59 significantly enhances tuft cell development in ALI cultures. Porcupine is known to facilitate Wnt secretion. So, which cells are producing Wnt in these cultures? It is important to determine which cells are producing Wnt and also which Wnt? Further, based on DBZ treatments, it appears that active Notch signaling is necessary to induce Tuft cell fate in basal cells. Where are Notch ligands expressed in these tissues? Is Notch active only in a small subset of basal cells (and hence generate rate tuft cells)? This is one of the key findings in this manuscript. Therefore, it is important to determine the expression pattern of Wnt and Notch pathway components.

      We thank the reviewer’s interesting questions and agree the importance of identifying the specific ligands and receptors for relevant Wnt and Notch signaling during tuft cell derivation. That being said, we think the topic is beyond the scope of this study which is focused on the role of tuft cells in alveolar regeneration. The point is well taken and we will investigate the topic in our future study.

      5) How do the authors explain different phenotypes observed in Trpm5 knockout and Pou2f3 mutants? Is it possible that Trpm5 knockout mice have a subset of tuft cells and that they might be something to do with the phenotypic discrepancy between two mutant models?

      Again we thank the reviewer for the interesting question. As discussed in the discussion section, Trpm5 is also reported to be expressed in B lymphocytes (Sakaguchi et al., 2020). It is possible that loss of Trpm5 modulates the inflammatory responses following viral infection, which may contribute to improved alveolar regeneration. However, it is also possible that Trpm5-/- mice keep a subset of tuft cells that facilitate lung regeneration as suggested by the reviewer.

      6) One of the key findings in this manuscript is that Wnt and Notch signaling play a role in Tuft cell specification. All current experiments are based on pharmacological modulation. These need to be substantiated using genetic gain loss of function models.

      We have performed the genetic studies.

      Reviewer #2 (Public Review):

      In this manuscript, the authors describe the ectopic differentiation of tuft cells that were derived from lineage-tagged p63+ cells post influenza virus infection. These tuft cells do not appear to proliferate or give rise to other lineages. They then claim that Wnt inhibitors increase the number of tuft cells while inhibiting Notch signaling decreases the number of tuft cells within Krt5+ pods after infection in vitro and in vivo. The authors further show that genetic deletion of Trpm5 in p63+ cells post-infection results in an increase in AT2 and AT1 cells in p63 lineage-tagged cells compared to control. Lastly, they demonstrate that depletion of tuft cells caused by genetic deletion of Pou2f3 in p63+ cells has no effect on the expansion or resolution of Krt5+ pods after infection, implying that tuft cells play no functional role in this process.

      Overall, in vivo and in vitro phenotypes of tuft cells and alveolar cells are clear, but the lack of detailed cellular characterization and molecular mechanisms underlying the cellular events limits the value of this study.

      We thank the reviewer for the comments and acknowledging that our findings are clear. In the revised manuscript we provide more detailed characterization and genetic evidence to elucidate the role of tuft cells in lung regeneration.

      1) Origin of tuft cells: Although the authors showed the emergence of ectopic tuft cells derived from labelled p63+ cells after infection, it cannot be ruled out that pre-existing p63+Krt5- intrapulmonary progenitors, as previously reported, can also contribute to tuft cell expansion (Rane et al. 2019; by labelling p63+ cells prior to infection, they showed that the majority of ectopic tuft cells are derived from p63+ cells after viral infection). It would be more informative if the authors show the differentiation of tuft cells derived from p63+Krt5+ cells by tracing Krt5+ cells after infection, which will tell us whether ectopic tuft cells are differentiated from ectopic basal cells within Krt5+ pods induced by virus infection.

      We thank the reviewer for the helpful suggestion. We have performed the experiment accordingly.

      2) Mechanisms of tuft cell differentiation: The authors tried to determine which signaling pathways regulate the differentiation of tuft cells from p63+ cells following infection. Although Wnt/Notch inhibitors affected the number of tuft cells derived from p63+ labelled cells, it remains unclear whether these signals directly modulate differentiation fate. The authors claimed that Wnt inhibition promotes tuft cell differentiation from ectopic basal cells. However, in Fig 3B, Wnt inhibition appears to trigger the expansion of p63+Krt5+ pod cells, resulting in increased tuft cell differentiation rather than directly enhancing tuft cell differentiation. Further, in Fig 3D, Notch inhibition appears to reduce p63+Krt5+ pod cells, resulting in decreased tuft cell differentiation. Importantly, a previous study has reported that Notch signalling is critical for Krt5+ pod expansion following influenza infection (Vaughan et al. 2015; Xi et al. 2017). Notch inhibition reduced Krt5+ pod expansion and induced their differentiation into Sftpc+ AT2 cells. In order to address the direct effect of Wnt/Notch signaling in the differentiation process of tuft cells from EBCs, the authors should provide a more detailed characterization of cellular composition (Krt5+ basal cells, club cells, ciliated cells, AT2 and AT1 cells, etc.) and activity (proliferation) within the pods with/without inhibitors/activators.

      Again we thank the reviewer for the insightful suggestions. We agree that it will be interesting to further address the direct effect of Wnt/Notch signaling in the differentiation process of tuft cells from EBCs. In this revised manuscript we added new findings of EBC differentiation into tuft cells in mice with genetic deletion of Rbpjk.

      3) Impact of Trpm5 deletion in p63+ cells: It is interesting that Trpm5 deletion promotes the expansion of AT2 and AT1 cells derived from labelled p63+ cells following infection. It would be informative to check whether Trpm5 regulates Hif1a and/or Notch activity which has been reported to induce AT2 differentiation from ectopic basal cells (Xi et al. 2017). Although the authors stated that there was no discernible reduction in the size of Krt5+ pods in mutant mice, it would be interesting to investigate the relationship between AT2/AT1 cell retaining pods and the severity of injury (e.g. large Krt5+ pods retain more/less AT2/AT1 cells compared to small pods. What about other cell types, such as club and goblet cells, in Trpm5 mutant pods? Again, it cannot be ruled out that pre-existing p63+Krt5- intrapulmonary progenitor cells can directly convert into AT2/AT1 cells upon Trpm5 deletion rather than p63+Krt5+ cells induced by infection.

      We thank the reviewer for the comments and suggestions. Our new data using KRT5-CreER mouse line confirmed that pod cells (Krt5+) do not contribute to AT2/AT1 cells, consistent with previous studies (Kanegai et al., 2016; Vaughan et al., 2015). Our data also show that p63-CreER lineage labeled AT2/AT1 cells are separated from pod cell area, suggesting pod cells and these AT2/AT1 cells are generated from different cell of origin. We also checked the Notch activity in pod cells in Trpm5-/- mice, and some pod cell-derived cells are Hes1 positive, whereas some are Hes1 negative (RLFigure 1). As indicated in discussion we think that AT2/AT1 cells are possibly derived from pre-existing AT2 cells that transiently express p63 after PR8 infection. It will be interesting to test whether Trpm5 regulates Hif1a in this population (p63+,Krt5-), and this will be our next plan.

      RLFigure 1. Representative area staining in Trpm5-/- mice at 30 dpi. Area 1: Notch signaling is active (Hes1+, arrows) in pod cells following viral infection. Area 2: pod cells exhibit reduced Notch activities. Note few Hes1+ cells in pods (arrows). Scale bar: 50 µm.

      4) Ectopic tuft cells in COVID-19 lungs: The previous study by the authors' group revealed the presence of ectopic tuft cells in COVID-19 patient samples (Melms et al. 2021). There appears to be no additional information in this manuscript.

      In Melms et al., Nature, 2021 (Melms et al., 2021), we showed tuft cell expansion in COVID-19 lungs but not the potential origin of tuft cells. In this manuscript we show some cells co-expressing POU2F3 and KRT5, suggesting a pod-to-tuft cell differentiation.

      5) Quantification information and method: Overall, the quantification method should be clarified throughout the manuscript. Further, in the method section, the authors stated that the production of various airway epithelial cell types was counted and quantified on at least 5 "random" fields of view. However, virus infection causes spatially heterogeneous injury, resulting in a difficult to measure "blind test". The authors should address how they dealt with this issue.

      We clarified that quantification method as suggested. For the in vitro cell culture assays on the signaling pathways, we took pictures from at least five random fields of view for quantification. For lung sections, we tile-scanned the lung sections including at least three lung lobes and performed quantification.

      Reviewer #3 (Public Review):

      In this manuscript Huang et al. study how the lung regenerates after severe injury due to viral infection. They focus on how tuft cells may affect regeneration of the lung by ectopic basal cells and come to the conclusion that they are not required. The manuscript is intriguing but also very puzzling. The authors claim they are specifically targeting ectopic basal progenitor cells and show that they can regenerate the alveolar epithelium in the lung following severe injury. However, it is not clear that the p63-CreERT2 line the authors are using only labels ectopic basal cells. The question is what is a basal cell? Is an ectopic basal progenitor cell only defined by Trp63 expression?

      The accompanying manuscript by Barr et al. uses a Krt5-CreERT2 line to target ectopic basal cells and using that tool the authors do not see a signification contribution of ectopic basal cells towards alveolar epithelial regeneration. As such the claim that ectopic basal cell progenitors drive alveolar epithelial regeneration is not well-founded.

      We appreciate the reviewer for the positive comments and agreeing that our findings are interesting.

      The title itself is also not very informative and is a bit misleading. That being said I think the manuscript is still very interesting and can likely easily be improved through a better validation of which cells the p63-CreERT2 tool is targeting.

      We have revised the title accordingly and performed extensive experiments to address the reviewer’s concerns.

      I, therefore, suggest the following experiments.

      1) Please analyze which cells p63-CreERT2 labels immediately after PR8 and tamoxifen treatment. Are all the tdTomato labeled cells also Krt5 and p63 positive or are some alveolar epithelial cells or other airway cell types also labeled?

      We thank the reviewer for the question. To answer the reviewer’s question, we performed PR8 infection (250 pfu) on three Trp63-CreERT2;R26tdT mice and TMX treatment at days 5 and 7 post viral infection. We didn't perform TMX injection immediately as the mice were sick at a few days post infection. The lung samples were collected at 14 dpi. We observed that tdT+ cells are present in the airways (rebuttal letter RLFigure 2A, B), and it appears that the lineage labeled cells (tdT+) include club cells (CC10+) that are underlined by tdT+Krt5+ basal cells (RLFigure 2C). We think that these labeled basal cells give rise to club cells. However, we also noticed that rare club cells and ciliated cells (FoxJ1+) are labeled by tdT in the areas absent of surrounding tdT+ basal cells (RLFigure 2D). Moreover, a minor population of tdT+ SPC+ cells are present in the terminal airways that were disrupted by viral infection (RLFigure 2E and D). We did not see any pods formed in this experiment and we did not observe any tdT+ cells in the intact alveoli (uninjured area).

      RLFigure 2. Trp63-CreERT2 lineage labeled cells in the airways but not alveoli when Tamoxifen was induced at day 5 and 7 after PR8 H1N1 viral infection. Trp63-CreERT2;R26-tdT mice were infected with PR8 at 250 pfu and Tmx were delivered at a dose of 0.25 mg/g bodyweight by oral gavage. Lung samples were collected and analyzed at 14 dpi. Stained antibodies are as indicated. Scale bar: 100 µm.

      2) Please also show if p63-CreERT2 labels any cells in the adult lung parenchyma in the absence of injury after tamoxifen treatment.

      Dr. Wellington Cardoso’s group demonstrated that Trp63-CreERT2 only labels very few cells in the airways but not the lung parenchyma in the absence of injury after tamoxifen treatment (Yang et al., 2018). Dr. Ying Yang has revisited the data and she did not observe any labeling in the lung parenchyma (n = 2).

      3) Please analyze if p63-CreERT2 labels any cells with tdTomato in the absence of injury or after PR8 infection but without tamoxifen treatment.

      We performed the experiment and didn't observe any labeled cells in the lung parenchyma without Tamoxifen treatment (n = 4).

      4) Please analyze when after PR8 infection do the first p63-CreERT2 labeled tdTomato positive alveolar epithelial cells appear.

      We administered tamoxifen at day 5 and 7 after PR8 infection and harvested lung tissues at day 14. As shown in Figure 1, we observed a few tdT+ SPC+ cells in the terminal airways that are disrupted by viral infection. Notably, we did not observe any lineage labeled cells in the intact alveoli (uninjured) in this experiment..

      5) A clonal analysis of p63-CreERT2 labeled cells using a confetti reporter might also help interpret the origin of p63-CreERT2 labeled cells.

      We thank the reviewer for the suggestion. Our new data demonstrate that a rare population of SPC+tdT+ cells are present in the disrupted terminal airways of Trp63-CreERT2;R26tdT mice. Our data in the original manuscript and the new data suggest that the initial SPC+;tdT+ cells are rare because we have to administrate multiple doses of Tamoxifen to label them. Given the less labeling efficiency of confetti than R26tdT mice, it is possible we will not be able to label these SPC+ cells. Moreover, our original manuscript clearly shows individual clones of SPC+tdT+ cells in the regenerated lung, and they do not seem to compose of multiple clones. Therefore we think that use of confetti mice may not add new information..

      6) Lastly could the authors compare the single-cell RNAseq transcription profile of p63-CREERT2 labeled cells immediately after PR8 and tamoxifen treatment and also at 60dpi. A pseudotime analysis and trajectory interference analysis could help elucidate the identity of p63-CreERT2 labeled cells that are actually not ectopic basal progenitor cells.

      We appreciated the reviewer’s suggestion and agree that single cell RNA sequencing with pseudotime analysis can provide further information regarding the origin of the lineage labeled alveolar cells of Trp63-CreERT2;R26tdT mice. That said, our new data clearly show that KRT5-CreER lineage labeled cells do not give rise to AT1/2 cells as previously described (Kanegai et al., 2016; Vaughan et al., 2015), suggesting that the ectopic basal progenitor cells do not generate alveolar cells. By contrast, Trp63-CreERT2 lineage labeled cells do give rise to AECs, suggesting that this p63+ cell population capable of generating AECs are different from Krt5+ ectopic basal progenitor cells. Our single cell core has an extremely long waiting list due to the pandemic and we hope that our new findings are enough to address the reviewer’s concern without the need of single cell analysis..

    1. Author Response

      Reviewer #2 (Public Review):

      Activation of TEAD-dependent transcription by YAP/TAZ has been implicated in the development and progression of a significant number of malignancies. For example, loss of function mutations in NF2 or LATS1/2 (known upstream regulators that promote YAP phosphorylation and its retention and degradation in the cytoplasm) promote YAP nuclear entry and association with TEAD to drive oncogenic gene transcription and occurs in >70% of mesothelioma patients. High levels of nuclear YAP have also been reported for a number of other cancer cell types. As such, the YAP-TEAD complex represents a promising target for drug discovery and therapeutic intervention. Based on the recently reported essential functional role for TEAD palmitoylation at a conserved cysteine site, several groups have successfully targeted this site using both reversible binding non-covalent TEAD inhibitors (i.e., flufenamic acid (FA), MGH-CP1, compound 2 and VT101~107), as well as covalent TEAD inhibitors (i.e., TED-347, DC-TEADin02, and K-975), which have been demonstrated to inhibit YAP-TEAD function and display antitumor activity in cells and in vivo.

      Here, Fan et al. disclose the development of covalent TEAD inhibitors and report on the therapeutic potential of this class of agents in the treatment of TEAD-YAP-driven cancers (e.g., malignant pleural mesothelioma (MPM)). Optimized derivatives of a previously reported flufenamic acid-based acrylamide electrophilic warhead-containing TEAD inhibitor (MYF-01-37, Kurppa et al. 2020 Cancer Cell), which display improved biochemical- and cell-based potency or mouse pharmacokinetic profiles (MYF-03-69 and MYP-03-176) are described and characterized.

      Strengths:

      All of the authors' claims and conclusions are very well supported and justified by the data that is provided. Clear improvements in biochemical- and cell-based potencies have been made within the compound series. Cell-based selective activities in the HIPPO pathway defective versus normal/control cell types are established. Transcriptional effects and the regulation of BMF proapoptotic mRNA levels are characterized. A 1.68 A X-Ray co-crystal structure of MYF-03-69 covalently bound to TEAD1 via Cys359 is provided. In vivo efficacy in a relevant xenograft is demonstrated, using a 30 mg/kg, BID PO dose.

      We thank the reviewer for appreciating and highlighting the strengths of our study.

      Weaknesses:

      Beyond the impact on BMF gene regulation, new biological insights reported here for this compound series are moderate. Progress and differentiation with respect to activity and/or ADME PK profiles relative to the very closely related and previously described (Keneda et al. 2020 Am J Cancer Res 10:4399. PMID 33415007) acrylamide-based covalent TEAD inhibitor K-975 (identical 11 nM cell-based potencies when compared head-to-head and identical reported in vivo efficacy doses of 30 mg/kg) is not entirely clear. Demonstration of on-target in vivo activity is lacking (e.g., impact on BMF gene expression at the evaluated exposure levels).

      We thank the reviewer’s question. We have compared mouse liver microsome stability and hepatocyte stability of K-975 and MYF-03-176 and found that K-975 is metabolically less stable.

      Consistently, when NCI-H226 cells derived xenograft mice were dosed with 30 mg/kg K-975 twice daily, the tumors kept growing and reach more than 1.5-fold volume on 14th day. While with the same dosage, MYF-03-176 showed a significant tumor regression. K-975 did not reach such efficacy even with 100 or 300 mg/kg twice daily, either in NCI-H226 or MSTO-211H CDX mouse model according to the paper (Keneda et al. 2020 Am J Cancer Res 10:4399).

      To demonstrate the on-target in vivo activity, we tested expression of the TEAD downstream genes and BMF in tumor sample after 3-day BID treatment (PD study) and we observed reduction of CTGF, CYR61, ANKRD1 and an increase of BMF, which indicates an on-target activity in vivo.

    1. Author Response:

      We would like to thank the reviewers for a very thorough and careful analysis of our manuscript.  All the comments and suggestions were taken to heart, and we feel that our revised manuscript is vastly improved because the reviewers clearly put in a significant effort to help us interpret and clarify our conclusions.  We appreciate that the reviewers took the time to help us convey our results within the context of the field.  Understanding and working with BRCA2 variants of uncertain significance is a challenging and complex, and we strive to report accurate and solid data that will facilitate to predict cancer risk and targeted therapies for patients. 

      We appreciate the comment by reviewer 1 stating: “Identification of truly pathogenic BRCA2 missense mutations is a challenging but very important task for early diagnostics.” Our goal is to define cancer risk to prevent tumor formation and to promote personalized medicine with targeted therapies for homologous recombination deficient tumors. In future studies, we will expand these analyses to potentially pathogenic mutations of BRCA1 and PALB2.

      In reviewer 1 public text we have noticed a few mistakes in the text:

      1.     In the text it says RAD52, but it should say RAD51. This is the sentence: “Using an impressive array of cellular and biochemical approaches they demonstrated that the first two BRCA2 mutants have a detrimental effect on RAD52-dependent DNA repair, and therefore likely to be pathogenic

      2.     In the text it says T1980I instead of T1346I. This is the sentence: “In contrast, T1980I seems to have no effect on DNA repair in various tested assays and is likely to be a passenger mutation.”

      We thank reviewer 2 for the thoughtful comments and questions and we agree that this paper is important to the field of homologous recombination, replication, genome stability maintenance, DNA double strand break repair and classification of VUS. We wish that BRCA2 S1221P full length mutant could have been purified to provide us with deeper mechanistic insights about how this mutant affects BRCA2 biochemical functions.

      We appreciate the constructive question about the impact of the results raised by reviewer 3, and we acknowledge the immense efforts from different laboratories to classify VUS over the years with different approaches (segregation studies, protein prediction algorithms, viability analysis in ES, HDR reporters, in vitro analysis, etc.). However, we see the need of rigorous and thorough in vitro and in cells analysis to understand BRCA2 fundamental biology and better classification of VUS with a more comprehensive analysis of altered BRCA2 functions.

      To answer the comments and questions raised by reviewer 3, we have incorporated more elaborated introduction, results, and discussion in the manuscript to cover this. In brief, our comprehensive analysis of three independent variants located in the BRC repeats of BRCA2 highlight the importance of using multiple analysis to understand the altered BRCA2 functions because there is not a unique assay to measure BRCA2 tumor suppression activity. Our goals were two: 1) to identify potentially pathogenic variants with important clinical implications for cancer risk in patients and 2) to leverage deleterious variants to uncover the specific functions carried out by individual BRC repeats.

      As an example of our answers to the questions of reviewer 3, we incorporate the last paragraph of our discussion here:

      Clinical integration of functional assays into the genetic counseling setting is an important goal but should be met with caution until we fully understand how specific variants impact the tumor suppressor functions of BRCA2.  The use of robust and accurate functional assays will be essential to correctly evaluate BRCA2 VUS.  Our study demonstrated that novel pathogenic variants exist not only in the DBD domain of BRCA2 but also in the BRC region leading to defects in RAD51 binding, activity, and subsequent HDR deficiencies.  Mechanistic studies leveraging patient variants will continue to reveal the many functions of BRCA2.”

    1. Author Response

      Reviewer #1 (Public Review):

      This paper tackles a very important question in somatosensory biology - the identity of the sodium channel controlling excitability in proprioceptors. While whole rainforests' worth of papers have been published on sodium channels in nociceptors, there has been a significant gap in our understanding of which NaV isoforms are at play in the large fiber proprioceptors and LTMRs. Using pharmacology, gene KO, behavior, and histology, the authors show quite convincingly that NaV1.1 in sensory neurons is essential for normal motor behavior and contributes to proprioceptor excitability. Interestingly, they find NaV1.1 is haploinsufficient. This finding is all the more exciting given the many human NaV1.1 het and homo mutants and points to future possibilities for interrogating the role of this channel in human proprioception and using human tissue (e.g. iPSCs).

      We are delighted that the Reviewer finds our results address a “very important question in the field of somatosensory biology”.

      Reviewer #2 (Public Review):

      The manuscript by [Espino et al, 2022] characterizes the role of the sodium channel Nav1.1 in DRG sensory neurons, focusing on its role in proprioceptive sensory neurons. Nav1.1 expression has previously been observed in myelinated DRG neurons (including proprioceptive muscle afferents) but its significance for proprioceptive function remains unknown. In a series of molecular and in vitro patch clamp studies (using pharmacological Nav1.1 inhibitors and activators), the authors demonstrate that all proprioceptors express Nav1.1 and that this sodium channel is required for repetitive firing in the majority of proprioceptors. A pan sensory conditional deletion of Nav1.1 leads to a loss in motor coordination, suggesting that Nav1.1 in sensory neurons is required for normal motor control. While this is a somewhat generic and slightly unsatisfying conclusion, further morphological studies and ex vivo electrophysiological recordings of functionally identified muscle spindle afferents begin to offer a more interesting take on the role of Nav1.1 in proprioceptor function. First, while proprioceptor number and spindle morphology are unchanged, it appears as if the number of synapses between muscle spindle afferents to motor neurons is reduced, perhaps suggesting that a reduction in proprioceptor excitability during development affects the formation of proprioceptive sensory-motor circuits. Second, ex vivo recordings of MS afferents indicate that the loss of Nav1.1 primarily affects the static phase of their response to increases in muscle length, suggesting a role in the regulation of proprioceptor slow adaptation response properties.

      There are two clear strengths of the manuscript. First, mutations in Nav1.1 have been shown to be associated with a number of central brain disorders, including those that lead to motor impairments. The notion that a sensory neuron restricted loss of Nav1.1 similarly leads to motor coordination defects indicates that some phenotypes that previously had been suggested to be due to a central role of Nav1.1 could in fact have a peripheral basis. A second strength is that these studies further our understanding of the molecules that regulate excitability in proprioceptors and offer a foundation for further work to tease apart the molecular underpinnings of the physiological response properties of individual proprioceptor subtypes.

      While the studies generally support the main conclusion that Nav1.1 in mammalian sensory neurons is required for normal motor behaviors, the depth of some of the analyses leaves a bit more to be desired. For example, it seems that a little more could have been done to strengthen the in vitro analyses of Nav1.1 in proprioceptors with additional controls, and by expanding this analysis to genetically identified Nav1.1 mutant (heterozygous or homozygous) proprioceptors. In addition, it feels a bit of a missed opportunity that there is no further exploration of the relationship of Nav1.1 function in the context of specific proprioceptor subtypes (even if only through discussion). In addition, the observation that a loss in Nav1.1 may cause disruptions in sensorymotor connectivity could benefit from additional analyses to support these findings.

      We thank the reviewer for identifying the strengths of our study and pointing out that these new findings will “offer a foundation for further work”. We are eager to continue this line of investigation and are currently developing new approaches in the lab that will allow us to deepen our analyses in the future.

      Reviewer #3 (Public Review):

      The authors characterize the role of voltage-gated sodium channel Nav1.1 expression in proprioceptors in the peripheral nervous system. They use genetically modified mice, pharmacological blockers, and electrophysiological methods to support their claims. Albeit it was known for a long time that Nav1.1 is expressed in the peripheral nervous system, Espino et al. here present a thorough characterization of its role in proprioception and show its importance for motor behaviour, proprioceptor function, and synaptic transmission in the spinal cord. Characterizing the sodium channel subtype's function is crucial for our understanding of the function and dysfunction of the nervous system and to potentially develop new therapeutic approaches.

      We thank the Reviewer for their comments on the importance of our work investigating sodium channel function in proprioceptors.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper is a follow-up of the authors previous paper (2018), in which they carefully described the organisation of the junctions between cells of the adult Drosophila midgut epithelium and their control from the basal side by integrin signalling. Here, the authors used state-of-the art imaging and genetics to unravel step-by-step the events leading from an initially unpolarised cell to an epithelial cell that integrates into the existing epithelium. Many of the images are accompanied by cartoons, which help the reader to better understand the images and follow the conclusions. It would have been helpful yet, in particular with respect to the mutant phenotypes described later, if they would have named each of the steps/stages. In addition, mentioning the timescale would give an idea about the temporal frame in which this process elapses.

      We have used terms such as “unpolarised cells, polarised Actin/Cno” to label different stages in Figure 6, since this sequence of steps is inferred from results obtained from fixed samples with still images. We have illustrated the septate junction mutant phenotype in Figure 8I.

      We have also performed a new experiment to estimate the time taken for an activated EB to form a PAC and to become a mature enterocyte using overexpressing Sox21a with esg[ts]>GFP to induce enteroblast differentiation. Counting the number of GFP+ve cells without PAC, with a PAC and with full apical domain at different time points suggests that activated EBs take about a day to form a PAC and another day to form a fully-integrated enterocyte. We have summarised the results in Figure 5-figure supplement 1C.

      We have also included this result in the main-text as “ To estimate the time taken for enteroblasts to progress to pre-enterocytes with a PAC, and for pre-enterocytes become to enterocytes, we induced enterocyte differentiation by over-expressing UAS-Sox21a under the control of esg[ts]-Gal4 and counted the number of GFP+ve cells without a PAC or apical domain, with a PAC and with a full apical domain at different time points after induction (Chen et al., 2016; Meng and Biteau, 2015; Zhai et al., 2017). 17 hours after shifting the flies to 25ºC to inactivate Gal80ts, almost no GFP+ve cells had progressed to pre-EC with a PAC (0.1%) or EC (1%), and these few cells probably started to differentiate before Sox 21a induction. 24 hours later, 10% of the GFP+ve cells had developed into pre-ECs with a PAC and 20% had become ECs (Figure 5-figure supplement 1B-C). After an additional 24 hours, the number of cells with a PAC fell to 1%, whereas 50% were ECs. Assuming that it takes 12-17 hours to induce high levels of Sox21a expression, these results suggest that most activated EBs take about 24 hours to develop into a pre-EC with a PAC and a further 24 hours to differentiate into a mature EC, although some cells differentiate faster. This time frame is in agreement with a previous study using similar approaches to accelerate differentiation (Rojas Villa et al., 2019) and a recent live imaging study tracing the enteroblast to enterocyte transition (Tang et al., 2021). These results also indicate that down-regulation of Sox21a is not essential for enteroblast to pre-enterocyte differentiation, since enteroblasts overexpressing Sox21a still from a PAC (Figure 5-figure supplement 1B).

      The authors convincingly show that septate junctions are instrumental for proper polarisation and integration of the enteroblast. However, while they nicely showed that Canoe in neither required in the enteroblast nor in the enterocytes for this process, it remains unclear whether septate junction proteins are required in enteroblast or in enterocytes or in both and at which particular step the process fails in the mutant.

      Early stage enteroblasts neither express or require septate junction proteins, whereas late stage enteroblasts and pre-enterocytes do (Chen et al., 2020; Hung et al., 2020; Izumi et al., 2019; Xu et al., 2019). Since cells mutant for septate junction proteins do not develop into mature enterocytes with an apical domain facing the gut lumen, we cannot answer the reviewer’s question of whether septate junction proteins are required in enterocytes.

      As we discussed in the paper, we think that “differentiating enteroblasts only require a basal cue to establish their initial apical-basal polarity, whereas the formation of the pre-assembled apical compartment also requires a junctional cue. The septate junctions are not necessary for apical domain formation per se, however, as mesh mutant enteroblasts form a full-developed apical domain with a brush border inside the cell. This suggests that septate junctions define the site of apical domain formation by delimiting the region where apical membrane proteins are secreted to assemble the brush border, but do not control the process of apical domain formation directly.”

      Reviewer #2 (Public Review):

      The authors recently showed the polarization of the cells of the adult Drosophila midgut does not require any of the canonical epithelial polarity factors, and instead depend on basal cues from adhesion to the ECM, as well as septate junction proteins (Chen et al, 2018). Here they extend this research to examine in greater detail precisely how midgut epithelial cells integrate in the pre-exisiting epithelium and become polarized. Surprisingly, they show that enteroblasts form an apical membrane initiation site prior to polarizing. Furthermore, they show that this develops into a pre-apical compartment containing fully-formed brush border. This is a very interesting finding - it explains how integrating enteroblasts can integrate into a pre-existing epithelium without disrupting barrier function. The conclusions of this paper are mostly well supported by data, but some aspects could do with being clarified and extended as outlined below.

      Model presented in Figure 6

      While the separation of membranes indicated in Figure 6 steps 3-5 can be seen in the image shown in Figure 3B, this is one of the only images which supports the idea that there is a separation of membranes between the enteroblast and overlying enterocytes during PAC formation. Is the model in Figure 6 supported by EM data - can you see a region where there is brush border and separation of cells? Supplementing Figure 3 with corresponding EM images would greatly aid the reader in interpreting the data and strengthen the model.

      We think that AJ clearing and membrane separation is a brief process that is quickly followed by the separation of the apical and junctional proteins and apical secretion at the AMIS to form the PAC. We have not captured this stage in our EM images, but have many other examples that show this step (e.g Figure 4C and Figure 8F). Another example is shown below.

      A key step in the model is that the clearance of E-Cadherin from the apical membrane leads to a loss of adhesion between the enteroblast and the overlying enterocytes. This would need to be supported by functional data such as overexpression of E-Cad or E-CadDN in enteroblasts or by generating shg mutant clones. If the model is correct, perturbing E-Cad levels in enteroblasts should lead to defects in PAC formation, such as loss of de-adhesion/early de-adhesion/excessive de-adhesion.

      We think it is the local clearance of ECad from the apical membrane, not the downregulation of total level of ECad that is important for the local membrane separation and future PAC formation. The experiment of overexpressing ECad or ECad-DN proposed by the reviewer might be crucial to demonstrate the importance of total amount of ECad, but might not be very helpful in determining the importance of membrane separation in the PAC formation. Moreover, AJ formation in fly midgut epithelium does not depend on ECad, suggesting that ECad and NCad act redundantly which further complicates this approach (Choi et al., 2011; Liang et al., 2017).

      Role for the septate junction proteins

      Septate junction proteins were previously shown by these authors to be required for enteroblast polarization and integration into the midgut epithelium (Chen et al, 2018). Here they extend this by examining enteroblasts mutant for septate junction proteins, and conclude that septate junction proteins are required for normal PAC formation. However, it is not clear what aspect of the polarization of the enteroblasts is disrupted, because a number of mesh mutant cells (albeit a lower proportion than in wildtype) do form PACs. The main phenotype seems to be that cells fail to polarize (as previously reported) or have internalised PACs. It is hard to know what to conclude from this data about the role of the septate junction components in PAC formation.

      The major phenotype of the septate junction mutants is the loss of polarity, i.e. an inability to form an apical domain and integrate into the epithelial layer as shown in Figure 8. Neither mesh or Tsp2a mutants can form a PAC, even though mesh mutant cells have higher propensity to form an internal PAC-like structure (Figure 8B,C,E,G,H, Figure 8-figure supplement 1L). Thus, we think that septate junctions are required for AMIS and PAC formation. What complicates the interpretation is that some (6-20%) septate junction mutant cells do form an AMIS like structure (Figure 8D-F, Figure 8-figure supplement 1F&K). The simplest explanation for this result is that this is due to perdurance of the wild-type proteins after clone induction, with the weaker phenotype of ssk mutants being due to longer perdurance of this protein. However, we cannot rule out the alternative explanation that AMIS and PAC formation is facilitated by the septate junction proteins, but that they can still form very inefficiently in their absence.

      We realise that this section was quite confusing in the orginal version of the manuscript and have now re-written it to make this interpretation clearer.

      Coracle is used as a readout for the localization of septate junction components, yet the staining for Cora in Figure S3B looks quite different to Mesh in S3D. If Cora is to be used as a readout for the localization of septate junction components, then staining for Cora/Mesh and/or Cora/SSk or Tsp2a should be shown.

      When discussing the requirement for septate junctions for enteroblast integration - Coracle and Mesh are used interchangeably - but as mentioned before, it is not clear if they colocalize, or if their localization is interdependent (as demonstrated for Mesh, Tsp2a and Ssk in Figure 7). What is the phenotype of enteroblasts mutant for cora?

      Following from the previous point - while it is clear that Coracle is apical early during AMIS formation, it is not clear if Mesh, Tsp2a and Ssk also are, yet these are the mutants that are examined for a role in AMIS/PAC formation. It would be good to know whether the loss of cora would lead to defects in AMIS formation.

      The reason we used mainly Coracle as a marker for the septate junctions is that Mesh and Tsp2A localise to the basal labyrinth as well as to the septate junctions which could confuse the reader. We have now added new panels to Figure 3-figure supplement 3E&F showing the colocalization of Cora with Mesh/Tsp2a at the septate junctions and during the crucial stages of PAC formation.

      Additional Results:

      "Coracle is a peripheral septate junction protein whose localisation depends on the structural septate junction components such as Mesh/Ssk/Tsp2a (Chen et al., 2018; Izumi et al., 2016, 2012). Cora antibody staining provides a clearer marker for the septate junctions than Mesh or Tsp2a antibody staining, because the latter also label the basal labyrinth (Figure 3-figure supplement 1E&F). To determine whether Cora is required for PAC formation or epithelial polarity in the adult midgut, we generated a null mutant allele with a premature stop codon in FERM domain using CRISPR. Cells mutant for this allele, corajc, or a second cora null allele, cora5, can form a PAC, septate junctions and a full apical domain, indicating that Cora is also not required for enteroblast integration or enterocyte polarity (Figure 7F&G, Figure 7-figure supplement 1E-H).

      Additional Materials and Methods:

      We used the CRISPR/Cas9 method (Bassett and Liu, 2014) to generate null alleles of canoe and coracle. sgRNA was in vitro transcribed from a DNA template created by PCR from two partially complementary primers:

      forward primer:

      For coracle:

      5′-GAAATTAATACGACTCACTATAGAAGCTGGCCATGTACGGCGGTTTTAGAGCTAGAAATAGC-3′;

      The sgRNA was injected into…Act5c-Cas9 embryos to generate coracle null alleles (Port et al., 2014). Putative…coracle mutants in the progeny of the injected embryos were recovered, balanced, and sequenced. …The coraclejc allele contains a 2bp deletion around the CRISPR site, resulting in a frameshift that leads to stop codon at amino acid 225 in the middle of the FERM domain, which is shared by all isoforms. No Coracle protein was detectable by antibody (DSHB C615.16) staining in both midgut and follicle cell clones. The coraclejc allele was recombined with FRT G13 to make the FRTG13 coraclejc flies.

      It is unclear what is happening in Figure 8A,C,E, S7D. Is that a detachment phenotype or an integration phenotype? Are the majority of cells unpolarised due to loss of integrin attachment rather than failure to form an AMIS/PAC?

      Cells mutant for septate junction proteins do not detach from the basement membrane and still localise Talin basally, as illustrated by the new panel we have added (Figure 8-figure supplement 1N), showing Talin localisation in Tsp2a mutant cell.

      However, because the mutant cells cannot integrate and remain stuck beneath the septate junctions between the enterocytes, they sometimes become displaced from a portion of the basement membrane by younger EBs that derive from the same mutant ISC, leading to a pile up of cells in the basal region of the epithelium (e.g. Figure 8A, E and H).

      We have added the following sentences to the Results, explaining these points:

      "Because the mutant cells remain trapped beneath enterocyte-enterocyte septate junctions, they accumulate in the basal region of the epithelium, with new EBs derived from the same mutant ISC forming beneath them and reducing their contact with the basement membrane (Figure 8A)."

      " The majority of cells mutant for septate junction components fail to polarise or form an AMIS, although they form normal lateral and basal domains, as the basal integrin signalling component, Talin, localises normally (Figure 8-figure supplement 1N)."

      It is unclear whether enteroblasts really pass through an 'unpolarized stage'. In Figure 6, when they are described as 'unpolarised', they clearly have distinct basal and AJ domains. In septate junction mutants, when cells are classified as unpolarized, do they still have distinct regions of integrin/E-Cad expression?

      This is a semantic question. We agree that they have distinct lateral and basal domains, but they do not have an apical domain. In this respect, these "unpolarised" cells are similar to a mesenchymal fibroblast migrating on a substrate, which has a distinct basal side contacting the substrate that is different from the non-contacting regions of the cell surface. They also match the description of the migratory, "mesenchymal" enteroblasts (Antonello et al., 2015). To make this clearer, we have added the following notes to the legend for Figure 6: “Unpolarised” in the second panel of this figure indicates that the enteroblast has not formed a distinct apical domain. At this stage, no marker is clearly apically localised. “unpolarised” or “polarised” in the third and fourth panels describe the localisation of marker proteins, such as Actin and Cno."

    1. Author Response

      Reviewer #1 (Public Review):

      Solving the puzzle of this paper was clearly not easy, and the authors used an impressive set of tools and statistical methods to get to the bottom of what they observed in a very creative way. However, the presentation of the manuscript and its relevance could perhaps be improved.

      We are pleased to see that the referee was favorably impressed. We hope that this revision has improved the presentation of the manuscript and has clarified its relevance.

      First, I find the arguments in some parts of the manuscript to be a bit awkwardly formulated. For example, there is much discussion about social evolution and the paradox of why cells invest into rhamnolipid production, but this does not seem to be the topic of the paper, which focuses more on understanding P. aeruginosa's metabolism. Instead, there is very little discussion about the origin of these isolates and to what extent these findings may be relevant for P. aeruginosa's natural environment. I understand that this may be very speculative, but there could at least be more discussion on why glycerol was chosen as a growth medium, and what would happen if a more realistic growth medium were used instead. What environment does this bacterium experience and might it be surrounded by other species that could reduce oxidative stress?

      We understand that the referee would like a broader analysis of how the growth environment impacts surfactant secretion. We have added an entirely new section titled “Mathematical model predicts impact of carbon sources on surfactant production” at the end of the results section to address this issue (p. 16-19, l. 358-435). In the new section we present new data on how a range of carbon sources, beyond glycerol, impact P. aeruginosa growth and biosurfactant secretion. Then we use our model to determine carbon sources that favor secretion and we identify D-glucose as being better than glycerol. These new experimental and computational work refine our model to explain surfactant secretion more broadly than in glycerol. The biosynthesis of this secondary metabolite is favored when the carbon and energy source imposes a low burden on the primary metabolism. We also investigated the rhlAB expression dynamics in PA14 in glucose to further support our results.

      The overall message of the paper could be clarified: essentially, cells only produce rhamnolipids when they are not experiencing oxidative stress. I am sure the message is more nuanced, but this is not clear from the current abstract.

      We have changed the abstract to clarify our main point: that cells only produce surfactants when they are not experiencing oxidative stress and they more carbon source than needed for growth.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The methods section lacks sufficient detail, and arbitrary choices made in the simulation setup may have biased the results. The author's finding that the LR is disordered does not provide obvious mechanistic insights, and the simulations with the bound ligand are too preliminary to make solid conclusions. Although this manuscript is technically strong, the significance of the results is often unclear.

      We did not make “arbitrary choices”. The set up choice (only one) which we made was guided by heuristics and its adequacy was amply confirmed by the robustness of the simulated system. We have emphasized this in the revision.

      Reviewer #2 (Public Review):

      Strengths:

      1) The authors have focused on the LR region of TSHR and perform rigorous MD simulations to identify its various conformers and tried to give a reasoning for this observation. The authors also showed the stability of LR increased in the presence of the ligand, TSH.

      2) The authors have done many simulations of the TMD helix bundle and meticulously tried to quantitate the differences by assessing the changes in helix length, radius and angles.

      Weaknesses:

      1) Although the focus of the paper was the full model of the TSHR, the authors have broken down the whole protein into smaller sequences and have done separate simulations, and discussed the result. The whole picture of the TSHR is not clear. For example in Figure 5, the various confirmation (and secondary structures) of only LR is shown at different times. For the TMD helix bundle, separate tables have been shown, focusing only on TMD.

      The whole picture of the TSHR is shown on Figure 1. The reason the TMD is not shown in Figure 5 is because there is little variation in the TMD (as analyzed in detail in Tables 1-3) and not including it allowed us to show more detail in the ectodomain.

      2) The authors have analyzed the cysteines in the LR doing simulation, showing the propensity for various pairs of disulfide formation. However, the authors have not further discussed this point. Can this information be used to better guide the modelling process?

      We have added a statement in the Discussion section suggesting that the closeness of these cysteines during the simulation indicates that they indeed should form disulfide bonds. Furthermore, their separation when TSH was introduced indicated the likely role of these disulfide bonds in signal transduction.

      3) Based on the data in this manuscript, the authors claim that the LR domain makes significant contact with the TSH ligand. However if one refers to the crystal structure of FSH ligand with the ectodomain of its receptor (pdb: 4AY9) , the corresponding loop for LR is missing, directing to the point that either the interaction between this loop (LR) and ligand is either very weak or there is no interaction.

      While our results on the TSH-TSHR complex are still preliminary we pointed out in the revision that (a) the TSH-LR contact we see involves the part of the LR that are missing in the FSH-FSHR structure (b) there is only 39.3% sequence identity between the LR of TSHR and FSHR and (c) the large fluctuations we see in the LR conformations suggests that it is very unlikely that the contacts seen are artifacts of the initial structure.

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript presents an interesting study on a timely topic (hyperacusis). The study was carried out in awake animals using modern approaches in neurosciences (calcium imaging, optogenetic). The amount of data is impressive, the study is very ambitious, and overall its quality is indisputable. However, I have some general comments and questions on some concepts that are critical for the study, and also on the interpretation of the data, in particular the behavioral data.

      We appreciate Reviewer 2’s overall positive evaluation as well as their more specific critiques, which we address below.

      The first point I want to mention is the concept of 'homeostatic plasticity'. I am not sure we agree on its definition. My understanding of it is that the AVERAGE of central activity will remain constant around a set point value. In case of a reduction of sensory inputs (hearing loss), the neurons' sensitivity will be enhanced in such a way that the averaged activity will be preserved. So, neural hyperactivity after partial or sensory deprivation is not 'maladaptive': it is a collateral effect, 'the price to pay' for maintaining neural activity stable around a given value. In my opinion, this point is crucial. The authors should also mention and cite the model's paper from Schaette et al.

      “Homeostasis” is a term used widely in physiology to describe a negative feedback process in which an internal adjustment compensates for an external perturbation to return a given system (temperature, pH, etc.) to a set point. To the reviewer’s point, homeostatic processes – broadly defined – can work at many different biological scales including perhaps large, distributed systems like the example s/he gave of neurons throughout the central auditory pathway. By contrast, “homeostatic plasticity” is a mechanism studied by dozens of laboratories in hundreds of papers by which neurons (typically studied in cortical neurons) adjust their synaptic and intrinsic excitability to maintain their activity around a set point range. A key feature of homeostatic plasticity is that neurons “sense” deviations from their set point and initiate a compensatory process to offset this deviation. Up to this point, it seems that we are on the same page as the reviewer.

      The first point of possible disagreement lies in the interpretation of how excess neural activity relates to homeostatic plasticity. The reviewer mentioned modeling papers by Schaette and Kempter (2006, 2007, 2012) on the cochlear nucleus, which are also based on homeostatic plasticity and their work is now cited in the revised text (see line 71). The reviewer is correct that there is a difference in how the term is used and interpreted, but the difference is fairly subtle. Their work and our work propose that homeostatic plasticity processes are applied within a single neuron to offset the reduced afferent input that accompanies cochlear damage. As the reviewer recalled, they describe hyperactivity as a consequence of this compensation, as we do as well. The only difference is that they and the reviewer describe hyperactivity as the byproduct of the normal, successful implementation of homeostatic plasticity, which it unequivocally is not because – by definition – homeostatic plasticity is a stabilizing process that maintains activity at a predetermined set point range.

      The second point of disagreement lies in the reviewer’s statement that “neural hyperactivity after partial or sensory deprivation is not 'maladaptive': it is a collateral effect, 'the price to pay' for maintaining neural activity stable around a given value.” We disagree. Hyperactivity can be both a collateral and maladaptive effect. Hyperactivity and hypersynchrony are understood to be the basis of tinnitus, which is a maladaptive, disordered state. The reviewer’s comment implies that there is no alternative for compensating for sensory deprivation but to make cortical neurons hyperactive. We see no reason why this must be so. In fact, stabilization of activity rates after sensory deprivation has been demonstrated in hundreds of studies in the developing visual system. In the adult auditory system, activity in cortical neurons is initially depressed after injury before rebounding to exceed baseline levels (see Resnik Polley 2017 eLife, Asokan 2018 Nat Comm., Resnik Polley 2021 Neuron). It is not obligatory for cortical activity rates to pass through the set point range and continue into hyperactivity, nor is it obligatory for cortical activity rates to remain elevated above baseline many days after the injury. Additional evidence for this point comes from Figures 4, 6, and 8, which show that some cortical neurons actually do homeostatically regulate their activity back to baseline (i.e., show stable gain). This raises the intriguing question of why some neurons recover to their homeostatic activity set point while others do not. Figure 8 provides new insight into this question by showing that that their baseline response properties can account for 40% of the variability in gain stabilization after peripheral insult.

      A third point of disagreement related to the reviewer’s statement that “My understanding of it is that the AVERAGE of central activity will remain constant around a set point value. In case of a reduction of sensory inputs (hearing loss), the neurons' sensitivity will be enhanced in such a way that the averaged activity will be preserved”. We agree that homeostatic plasticity processes are influenced by activity propagating through distributed neural networks. However, the biological implementation of the process is programmed into individual neurons. The activity set point is neuron-specific, the error signal that encodes a deviation from the set point is neuron-specific, and the transcriptional/translational changes deployed to stabilize the activity rate are neuron-specific. As an analogy, home climate control systems work autonomously for each house, because the sensors (thermostat) and actuators (heating/cooling) are sensitive to fluctuations in that home, not across other houses in the town. The heating and cooling systems for each house in town may be driven by a distributed, common source (e.g., a hot day) but the mechanisms that bring the ambient temperature back to the set point for each house are autonomous and reflect the particular thermostat programming for each house. The widely studied homeostatic plasticity mechanisms mentioned in our manuscript (e.g., excitatory synaptic scaling) are not sensitive to and do not target the averaged neural activity among millions of neurons distributed throughout the sensory neuroaxis.

      As a final point on this statement, there is no demonstration that we are aware of that average central activity remains constant after a reduction of sensory inputs. This would require recording from many neurons across multiple stages of the sensory pathway in a single animal to show that the increased gain at later stages in the system exactly offsets the reduced responsiveness at earlier stages of the system. So, the reviewer’s definition of homeostatic plasticity is based on a general supposition about a distributed process that has never been empirically demonstrated whereas the definition we use is consistent with the mechanisms and terminology used throughout the neuroscience literature (albeit often incorrectly in the hearing loss literature).

      The second point is that a lot is built on the behavioral procedure and d'. I am not convinced by the behavioral procedure (and the d') is a convincing measurement of loudness (and therefore loudness hyperacusis). So, in my opinion, the title may be changed and more importantly the entire spirit of the paper should be modified.

      The reviewer’s critique as well as comments from other reviewers helped us realize that we had used the terms “hyperacusis” and “loudness” imprecisely. We think that is part of the confusion. What we have studied here is auditory hypersensitivity after sensorineural hearing loss, which may or may not be a model of why persons with hyperacusis can exhibit loudness hypersensitivity.

      Once “hyperacusis” and “loudness” have been stripped away from the behavior, we contend that we have a behavioral assay for auditory hypersensitivity, which is the main point of our study. To be clear, the behavioral readout most commonly employed in the animal literature to model hyperacusis is reaction time, which has a less direct relationship to hypersensitivity than does d’. D-prime is widely used as the sensitivity index in detection behaviors. The main advantage of d’ is that it controls for differences in response bias either between subjects or after noise exposure. We used the d’ metric to show that mice can more reliably detect tone levels near their sensation threshold and can more reliably detect direct stimulation of thalamocortical projection neurons after acoustic trauma. These observations provide the framework for all of the neural measurements that follow.

      On the balance, the reviewer was correct that our imprecise use of hyperacusis and loudness was confusing and contradictory. The terms “hyperacusis” and “loudness” now only appear in the manuscript to describe other published findings or to describe what our study does not address. This resulted in several small text changes throughout the manuscript as well as a direct statement about the relationship between our work, loudness, and hyperacusis on Pg. 14, Lns 448-466.

      “While the findings presented here support an association between sensorineural peripheral injury, excess cortical gain, and behavioral hypersensitivity, they should not be interpreted as providing strong evidence for these factors in clinical conditions such as tinnitus or hyperacusis. Our data have nothing to say about tinnitus one way or the other, simply because we never studied a behavior that would indicate phantom sound perception. If anything, one might expect that mice experiencing a chronic phantom sound corresponding in frequency to the region of steeply sloping hearing loss would instead exhibit an increase in false alarms on high-frequency detection blocks after acoustic trauma, but this was not something we observed. Hyperacusis describes a spectrum of aversive auditory qualities including increased perceived loudness of moderate intensity sounds, a decrease in loudness tolerance, discomfort, pain, and even fear of sounds (Pienkowski et al., 2014a). The affective components of hyperacusis are more challenging to index in animals, particularly using head-fixed behaviors, though progress is being made with active avoidance paradigms in freely moving animals (Manohar et al., 2017). Our noise-induced high-frequency sensorineural hearing loss and Go-NoGo operant detection behavior were not designed to model hyperacusis. Hearing loss is not strongly associated with hyperacusis, where many individuals have normal hearing or have a pattern of mild hearing loss that does not correspond to the frequency dependence of their auditory sensitivity (Sheldrake et al., 2015). While the excess central gain and behavioral hypersensitivity we describe here may be related to the sensory component of hyperacusis, this connection is tentative because it was elicited by acoustic trauma and because the detection behavior provides a measure of stimulus salience, but not the perceptual quality of loudness, per se.”

      A lot is derived/interpreted from the results, but I believe there is a lot of over-interpretation. I would suggest the authors be more cautious and moderate in their speculations and conclusions. I would reconfigure the manuscript, and simplify it.

      We believe that the changes mentioned above and in the response to their specific comments below reduce over-interpretation and simplify the manuscript.

      As an example of a change made to moderate the conclusions from our work, we added the following to Pg. 14, Lns 442-447

      “Further, while the perceptual salience (Figure 2) and neural decoding of spared, 8kHz tones (Figure 5) were both enhanced after high-frequency sensorineural hearing loss, these measurements were not performed in the same animals (and therefore not at the same time). Definitive proof that increased cortical gain is the neural substrate for auditory hypersensitivity after hearing loss would require concurrent monitoring and manipulations of cortical activity, which would be an important goal for future experiments.”

      Reviewer #3 (Public Review):

      The study uses a mouse animal model of sensorineural hearing loss after sound overexposure at high frequencies that mimics ageing sensorineural hearing loss in humans. Those mice present behavioural hypersensitivity to mid-frequency tones stimuli that can be recreated with optogenetic stimulation of thalamocortical terminals in the auditory cortex. Calcium chronic imaging in pyramidal neurons in layers 2-3 of the auditory cortex shows reorganization of the tonotopic maps and changes in sound intensity coding in line with the loudness hypersensitivity showed behaviourally. After an initial state of neural diffuse hyperactivity and high correlation between cells in the auditory cortex, changes concentrate in the deafferented high-frequency edge by day 3, especially when using mid-frequency tones as sound stimuli. Those neurons can show homeostatic gain control or non-homeostatic excess gain depending on their previous baseline spontaneous activity, suggesting a specific set of cortical neurons prompt to develop hyperactivity following acoustic trauma.

      This study is excellent in the combination of techniques, especially behaviour and calcium chronic imaging. Neural hyperactivity, increase in synchrony, and reorganization of the tonotopic maps in the auditory cortex following peripheral insult in the cochlea has been shown in seminal papers by Jos Eggermont or Dexter Irvine among others, although intensity level changes are a new addition. More importantly, the authors show data that suggest a close association between loudness hypersensitivity perception and an excess of cortical gain after cochlear sensorineural damage, which is the main message of the study.

      The problem is that not all the high-frequency sensorineural hearing loss in humans present hyperacusis and/or tinnitus as co-morbidities, in the same manner that not all animal models of sensorineural hearing loss present combined tinnitus and/or hyperacusis. In fact, among different studies on the topic, there is a consensus that about 2/3rds or 70% of animals with hearing loss develop tinnitus too, but not all of them. A similar scenario may happen with hearing loss and hyperacusis. Therefore, we need to ask whether all the animals in this study develop hyperacusis and tinnitus with the hearing loss or not, and if not, what are the differences in the neural activity between the cases that presented only hearing loss and the cases that presented hearing loss and hyperacusis and/or tinnitus. It could be possible that the proportion of cells showing non-homeostatic excess gain were higher in those cases where tinnitus and hyperacusis were combined with hearing loss.

      We thank the reviewer for her/his careful reading of the original manuscript and many helpful suggestions and critiques that have been addressed in the revision. Both Reviewer 2 and Reviewer 3 understood that we were presenting our high-frequency sensorineural hearing loss manipulation as a way to model the clinical phenomenon of hyperacusis. This was not our intent, and we regret the wording of the original manuscript communicated this point. In fact, the clinical literature shows that hyperacusis does not have a strong association with hearing loss and moreover our behavioral and neural outcome measures were not designed to index the core phenotype of hyperacusis (a spectrum of sound-evoked distress, disproportionate scaling of loudness with sound level, and sound-evoked pain). Our study addresses the neural and behavioral signatures of auditory hypersensitivity, which is an “upstream” condition that may (or may not) be related to the presentation of clinical phenomena like hyperacusis and tinnitus.

      The reviewer mentions a litmus test for animal models of tinnitus, in which the utility of an animal model for tinnitus would be evaluated in part based on whether a controlled insult only produced a behavioral change suggestive of a chronic phantom percept in a fraction of animals. That may be so, but our study is clearly not modeling tinnitus and we make no claims to this effect in the original or revised manuscript. The Reviewer then goes on to say that “a similar scenario may happen with hearing loss and hyperacusis”. “May” is the operative word here because the association between sensorineural hearing loss and the clinical presentation hyperacusis is quite weak overall in human subjects but no study (that we are aware of) has attempted to document the probabilistic appearance of hyperacusis before and after acoustic trauma. So, we really don’t know whether hyperacusis has a probabilistic appearance like tinnitus or is more deterministic like cochlear threshold shift. But, again, the main point is that our experiments make no direct claim about hyperacusis one way or the other, which we now clarify and discuss throughout the revised text, as detailed below.

      We do contend that our experiments allow us to study auditory hypersensitivity, though again there is no precedent or consensus in the literature for expecting auditory hypersensitivity to present probabilistically or deterministically across mice after a controlled insult. Regardless, we agree with the reviewer that it is a very good idea to provide the individual animal data to the reader. We added new panels to Figure 2C to show that an increase in the 8kHz d’ slope after noise exposure (i.e., a change > 1) was observed in 7/7 mice that underwent acoustic trauma but 1/6 mice in the sham exposure group, suggesting a deterministic, binary behavioral effect found in every mouse with noise-induced high-frequency sensorineural damage. On the other hand, within the acoustic trauma cohort, 3 mice showed marked increases in the d’ growth slope (> 2) while 4 showed more subtle changes, suggesting a more graded or probabilistic effect. By providing the individual animal data as per the Reviewer’s request, the reader can now make a more informed determination about the reliability of auditory hypersensitivity within the acoustic trauma cohort.

      Regarding the relationship between the peripheral/cortical/perceptual auditory hypersensitivity we report here and the clinical conditions of tinnitus and hyperacusis, we revised the text such that the word “hyperacusis” only appears in the context of other publications and have added the following text (Pg. 14, Lns 448-466).

      “While the findings presented here support an association between sensorineural peripheral injury, excess cortical gain, and behavioral hypersensitivity, they should not be interpreted as providing strong evidence for these factors in clinical conditions such as tinnitus or hyperacusis. Our data have nothing to say about tinnitus one way or the other, simply because we never studied a behavior that would indicate phantom sound perception. If anything, one might expect that mice experiencing a chronic phantom sound corresponding in frequency to the region of steeply sloping hearing loss would instead exhibit an increase in false alarms on high-frequency detection blocks after acoustic trauma, but this was not something we observed. Hyperacusis describes a spectrum of aversive auditory qualities including increased perceived loudness of moderate intensity sounds, a decrease in loudness tolerance, discomfort, pain, and even fear of sounds (Pienkowski et al., 2014a). The affective components of hyperacusis are more challenging to index in animals, particularly using head-fixed behaviors, though progress is being made with active avoidance paradigms in freely moving animals (Manohar et al., 2017). Our noise-induced high-frequency sensorineural hearing loss and Go-NoGo operant detection behavior were not designed to model hyperacusis. Hearing loss is not strongly associated with hyperacusis, where many individuals have normal hearing or have a pattern of mild hearing loss that does not correspond to the frequency dependence of their auditory sensitivity (Sheldrake et al., 2015). While the excess central gain and behavioral hypersensitivity we describe here may be related to the sensory component of hyperacusis, this connection is tentative because it was elicited by acoustic trauma and because the detection behavior provides a measure of stimulus salience, but not the perceptual quality of loudness, per se.”

    1. Author Response

      Reviewer #1 (Public Review):

      The article by Solvi and colleagues aims to investigate what type and degree of information (either absolute, relative, or a weighted combination of both) is used by bumblebees when retrieving the value of an item. The authors reported recent evidence in humans and birds that suggest they seem to use a combination of absolute memories and remembering of subjective ranking, and an absence of relevant studies for other species, including invertebrates. Thus, the authors conducted four different experiments to study what type of information is guiding the decision of bumblebees when facing different qualitative and quantitative comparisons.

      In the first two experiments, the authors reported the use of relative ranking of stimuli instead of a memory of their absolute value. According to the authors, these results are confirmed by experiment three, where bees were presented with two equally-ranked choices which, in fact, were not treated as different by bees. In the last experiment, bumblebees showed a preference for the highest rank item.

      Despite the presentation of well-designed experiments, the conclusions that bumblebees are using only memories of ordinal comparisons, thus showing a different strategy with respect to humans and birds, seems to not be fully supported by the results. The behaviour on the first two experiments, for instance, could be explained by a recency effect, where the higher item of the last comparison is better retrieved (the work of Giurfa on transitive inferences in bees was not mentioned, though is relevant here). Furthermore, in the last experiment, bumblebees could not have used an ordinal ranking; their choice for the higher-ranking item could be based on its higher absolute quantitative value in terms of sucrose solution.

      We’re sorry for not being clearer in our descriptions in our original submission. In each of the first three experiments, the order of sessions in which the different pairs of sucrose concentrations had been used were counterbalanced. For example, in experiment 1, half of the bees experienced 45 vs 30 first and 30 vs 20 second, and half of the bees experienced 30 vs 20 first and 45 vs 30 second. Further, our GLM results show that the order of training did not affect bees’ preferences. Therefore, a recency effect cannot explain the results of experiments 1 or 2 (or 3). We now highlight this on lines 101 - 105 in the Results and in each of the Experimental descriptions in the Methods, and explain that the GLMs showed no effect of these factors on lines 424 - 426 of Methods.

      With regards to the experiment 6 (last experiment in our original submission), there is no reason bees could not have used ordinal ranking. However, they also could have used absolute memories. We apologise for not making the rationale for and interpretation of experiment 6 clearer. The rationale was to determine how our results spoke to a situation that was more ecologically relevant for a bumblebee. In response to Reviewer #3’s concerns, we have now added new data from an experiment which also helps better explain both our rationale and our interpretation of experiment 6. We discuss this in more detail now in the revised manuscript. In short, bumblebees must use absolute properties, otherwise they would not be able to discriminate or rank any two sequentially visited flowers. However, our results suggest that they only retain (or only utilise memory of) absolute information for a short period of time (a few minutes). Despite this, experiment 6 suggests that in normal foraging situations, bees’ preferences for the highest rewarding flowers will not be affected. This is because, in the wild, bumblebees could commonly experience short time intervals (a few minutes) between flowers, which would allow them to compare each flower’s absolute information and encode ranking information. We discuss the new data on lines 204 - 233 and add clarification for experiment 6 on lines 249 - 261.

      The different behaviours and strategies used by bees here could be better explained by differences in the experimental task proposed, rather than supporting a general statement about the evolution of different strategies in comparison to other species.

      We hope that our explanations and clarifications to your above comments and to the other referees’ comments remedy this concern.

      Reviewer #2 (Public Review):

      This manuscript analyzes if bumblebees choose feeding options based on their absolute or relative remembered subjective value. The experiments relate to previous work done in starlings where comparable questions were raised (1). The design used in the four experiments presented is elegant and provides support for the conclusion that bees guide their choices by remembered ranking of feeders instead of focusing on their absolute rewards. Bees preferred the options that were ranked higher within each experimental context experienced, irrespective of the absolute reward they provided. As a consequence, they even preferred a sucrose solution of low concentration (15%) to one that was more profitable (30%), simply because the former was experienced together with a poorer alternative (10%) while the latter was experienced together with a more attractive alternative (45%) (Exp. 2). All four experiments provide results that are consistent with the hypothesis that contextual ranking is essential to determine the bees' choices.

      Thank you for the kind and supportive words.

      Three main points require consideration to render this manuscript even more attractive than what it is already.

      1) The experiments involved in all cases four different colors and different sucrose concentrations (range: 5 - 45 % w/w). An essential requisite of these experiment is that bees should be able to discriminate the options provided, both in terms of color and in terms of reward quality. Asking about ranking or absolute value makes no sense if bees cannot distinguish, say, 15% from 10%, or yellow from orange, and so on. The authors are obviously aware of this point as they mention it explicitly (lines 267-269). Yet, although they mentioned that they verified this point, the only experimental proof available is provided in Fig. Suppl. 4, where a single comparison (from the many possible) was tested; the discrimination test provided involved blue and yellow, which were associated in a balanced way with the two highest sucrose concentrations used, 45% and 30%. In terms of color information, the choice involved the colors that were easy to distinguish (see their loci in the color hexagon). Yet, what about the other colors? Could they be equally well discriminated? Probably not, because some occupied very close loci in the hexagon. Admittedly, the tests B vs. C involved similar colors (yellow vs. orange) and bees showed significant preferences supporting the presence of color discrimination. Yet, no information is available for yellow and green and other color combinations assayed. Even more important would be to show that bees rank the different sucrose solutions differently, which is not clear in all cases. Concentrations were chosen following theoretical considerations based on Weber's law (2), but do bees really respond differently to them? Providing an experimental assessment of this question would be important.

      The perceptual distances between colours used in our experiments ranged from 0.141-0.333 hexagon units, which are intermediate to large colour distances that can be trained to high asymptotic levels of discrimination within 50-60 visits [Dyer and Chittka 2004, doi: 10.1007/s00359-003-0475-2]. Our training procedure involved 50 drinking experiences for each colour (100 drinking experiences in total), which is enough for clear discrimination by bees. Further, we adopted a counterbalanced paradigm, for which the results of each group are displayed in the presented figures. The visualisation of the individual data points indicate there was no significant difference between colour combinations, but more importantly, the GLM statistical analyses show that colour combinations had no effect on bees’ preferences (reported on lines 424 – 426). Finally, we have added data and description of new experiments using orange and yellow flowers and which show that bumblebees are able to quickly and easily discriminate the orange and yellow colours (the shortest hexagon loci distances for the colours used) even when not learned side-by-side (new Figure 2). Importantly, we used sucrose concentration pairs with much greater differences compared to what bees are capable of discriminating behaviourally (bees can tell differences in sucrose concentration as low as 1.5% [Whitney et al., 2008, doi:10.1007/s00114-008-0393-9]) and neurophysiologically (Miriyala et al., 2018, doi: 10.1016/j.cub.2018.03.070). This is now discussed on lines 348 - 353.

      2) Figures B, D, and F are of fundamental importance to draw conclusions about the strategy used by the bees in the three adjacent experiments. Yet, the kind of representation chosen by the authors does not help to follow their conclusions. Firstly, it is not clear what the data points represent. If, for instance, in Fig. 1B, 40 bees were tested (line 247), how many bees per combination were tested (only one combination is mentioned in line 249)? Moreover, given that bees were tested with B vs. C, and if I guess correctly, there are ca. 10 data points per combination, what do these 10 proportions represent? How were these values computed? I could not find this information in the Methods section. No description of the test methodology is provided for Experiments 1 to 3. Moreover, data points in Figs B, D, F are barely visible and appear clustered around 50% in several cases, thus casting doubt on the reported significance of the comparisons. This needs to be improved by means of visible and clear graphic displays. The same kind of consideration can be applied to Fig. 2B, even if the results are clearer.

      We apologise for the lack of clarity and missing information. We now note in the figure legend that each filled circle represents the proportion of choices for a particular option by an individual bumblebee (10 individuals per group). We have now added more description on the test methodology and analyses for experiments 1-3 on lines 101 – 105, 126-127, 380 - 384. We have also increased the size of the individual data points in each of the figures for better clarity.

      3) A final point relates to results obtained in a different experimental framework but which asks whether animals can rank and order in transitive terms experienced alternatives. A considerable amount of work in the field of experimental psychology has addressed the question of transitive inferences in many species (3-12), and even in bees and wasps (13, 14). In these studies, animals are trained with premise pairs presenting different reinforcement outcomes (e.g. A+ B-/B+ C-/C+ D-/D+ E-) to determine if they establish relative rankings (A ˃ B ˃ C ˃ D ˃ E), or on the contrary use associative learning of absolute reinforcement outcomes (in which case, A ˃ B = C = D ˃ E). To determine the strategy followed by animals, they are tested with a non-overlapping pair never experienced during the training (B vs. D). In the first case, animals prefer B to D while in the second case they choose equally between both options. There might be, therefore, some parallels or contact points between these experiments and the experiments reported in this manuscript. Could the authors discuss these parallels and provide a broader view of absolute vs. relative remembered subjective value?

      Thank you for this suggestion. We believe that tests for transitive inference only resemble our methods superficially. However, we do feel that because of this we should provide a brief explanation as to how and why they are fundamentally different. We have now included a paragraph in the Introduction on lines 73 - 89.

      Reviewer #3 (Public Review):

      The central conclusion of this beautiful experimental study is that bumblebees prefer flowers on the basis of their remembered ranking in their context, but are insensitive to their absolute properties. Thus, let's say that there 4 flower types, ranked as follows in nectar concentration: A>B>C>D. However, when the bee learns about these flowers, it does in either of two 'contexts', populated as follows: A & B, or C & D. Thus, the bee experiences that B is the worse option in the context in which it is found, and C is the better one in its own context. If, at a later time, the bee has to make a novel choice, this time between B and C, its memory for ranking leads it to prefer C over B, while its (putative) memory for nectar concentration should favour B over C. The authors find, in a variety of different treatments, evidence for the influence for ranking, but they do not find any evidence for sensitivity to absolute properties (i.e., concentration).

      Thank you for the complimentary sentiments.

      One difficulty that permeates the argument is the ubiquitous difficulty in proving the null hypothesis as true: lack of significant evidence for a putative effect in one or a few experiments, does not mean reliable absence of the effect.

      We appreciate this thought, and we hope that the additional experiments on absolute information usage, together with the original experiments, might collectively form a clearer argument here.

      Another difficulty is that in my view memory for absolute properties was not given a full chance: bees were always trained in situations where both dimensions (concentration and ranking) were present. In such situations, they preferentially used ranking. However, to learn ranking between flower types in sequential encounters, they must remember the absolute properties, so that in each encounter they contrast the present flower with the memory for others. Say the bee encounters a type B flower. How does it store its ranking if it doesn't remember the properties of A at all? To take this objection into account and still maintain the claim, it is necessary to say that it remembers the properties of A when in the A & B context, but it erases it from memory when in the context B & C.

      Neglecting memory for concentration may be an overshadowing effect. Overshadowing is known in learning studies, and it means that, when more than one cue is paired with an outcome, the most salient between them may reduce learning about the predicting power of the other. In this case, bees may remember and use concentration when trained in contexts where there is only flower type, so that there is no chance of using ranking, and then offered choices between pairs of them. In this case, the bees would not have access to ranking, so that there would be a stronger opportunity for absolute memory to manifest itself.

      Thank you for these suggestions. We apologise for not making our arguments clearer in our first submission. Yes, absolute information must be used at some level, otherwise no ranking, or even discrimination, could take place. Our claim is that while bumblebees must detect and use absolute information to compare flowers, our experiments show they do not retain (or utilise memory of) this information for very long (i.e. for much longer than several minutes). We now make this clearer throughout the manuscript, e.g. on lines 31 - 36 and 204 - 233. We have also added new experiments which show that when bees were trained with each of two flowers alone and then tested together, absolute information can be used if visits to the different flowers are separated by only a few minutes, but not if separated by an hour. This suggests that absolute information is retained and used by bumblebees as short-, but not mid-term memories (Menzel 2001), in order to make comparisons of options and rank them. Please see lines 385 - 401 for a detailed description of these experiments, as well as Figure 2.

      In experiment 4, during training, they could move between two zones representing the 'contexts', each with 2 flower types, and they were then given choices between the 4 types, rather than just binary choices as previously. In this case, the bees did prefer the top-quality flower type (type A), which is consistent with memory for absolute concentration and with ranking, because A offered the highest concentration of the 4-type context. Why this happened is not clear, but it indicates that the context of choice may be crucial. It is known from other studies that the number of options at the time of choice can be very influential. For instance, in one study, it was shown that starlings appeared to be risk prone when offered a binary choice and risk averse when offered a trinary choice, even if the choices were all intermingled in the same sessions. In any case, this experiment raises doubts as to the claimed insensitivity to memory for nectar concentration. Another possibility is that the separation between contexts in this experiment (a partially avoidable wall) was not extreme as in the previous ones, so that the bees could now establish a ranking among the 4 types because they were all encountered intermingled to an extent.

      Again, we apologise for the lack of clarity in our original argument. We have made clear in the manuscript that bumblebees are not insensitive to absolute metrics, and indeed require them to distinguish between sequentially visited flowers. We have also added new data and descriptions of experiments which we believe help set the stage for and help interpret the results of experiment 6 (previously experiment 4). The newly added experiments (experiments 4 and 5) show that when bees learn flowers in isolation, and therefore have no ranking information available, they still do not retain or utilise memory of absolute information in a new context, unless the temporal separation between flower experiences is short (a few minutes). The results of experiment 6 (previously experiment 4) essentially help show that in a more ecologically realistic scenario (what bees normally experiment in the wild), the time between flower visits are short enough so that absolute information can be compared and used to rank flowers. We now explain this better on lines 204 – 233 and 249 - 260.

      There is one potential mechanism that may also be discussed. It is known from other species, that state at the time of learning influences subjective value of alternatives. To explain this effect I will exemplify the problem with a non-eusocial consumer. Say that food sources B and C are of equal caloric value. Say, further, that B is encountered when the subject is less food deprived than when it encounters C. Then the hedonic (conditioning) power of B will be lower, because it causes a smaller improvement in fitness (this was Daniel Bernoulli's argument regarding the concept of utility). In animal studies this effect is called State-Dependent Valuation Learning (SDVL). Since in the present experiments the context A & B was richer than the context C & D, the bees would have been in a consequently more favourable state (maybe carrying bigger sugar loads), so that each encounter with B would cause a smaller improvement than each encounter with C. This effect is totally different from remembering the ranking of flower types. The two alternative explanations for preference of C over B (ranking and SDVL) can, fortunately, be confronted because it is possible to change the state of the bees by a common 3rd source that could be used to equate or manipulate the average richness of the contexts.

      Thanks for this suggestion. Although we believe SDVL cannot account for bumblebees’ behaviours as well as ordinal ranking, we agree that it would be valuable to discuss SDVL with reference to some of the literature on the subject. We have now added description of SDVL to Table 1 on lines 156 – 162 and added a paragraph in the Discussion on lines 283 – 290.

      All the reasons mentioned above should make it clear that this reviewer finds the study of very great interest and much merit, but considers that the conclusion for exclusive impact of ranking on preference should be tempered, or at least defended more strongly against these doubts.

      Thanks again for the valuable and positive critique.

    1. Author Response

      Reviewer #3: (Public Review):

      In this ms Li et al. examine the molecular interaction of Rabphilin 3A with the SNARE complex protein SNAP25 and its potential impact in SNARE complex assembly and dense core vesicle fusion.

      Overall the literature of rabphilin as a major rab3/27effector on synaptic function has been quite enigmatic. After its cloning and initial biochemical analysis, rather little new has been found about rabphilin, in particular since loss of function analysis has shown rather little synaptic phenotypes (Schluter 1999, Deak 2006), arguing against that rabphilin plays a crucial role in synaptic function.

      While the interaction of rabphilin to SNAP25 via its bottom part of the C2 domain has been already described biochemically and structurally in the Deak et al. 2006, and others, the authors make significant efforts to further map the interactions between SNAP25 and rabphilin and indeed identified additional binding motifs in the first 10 amino acids of SNAP25 that appear critical for the rabphilin interaction.

      Using KD-rescue experiments for SNAP25, in TIRF based imaging analysis of labeled dense core vesicles showed that the N-terminus of SN25 is absolutely essential for SV membrane proximity and release. Similar, somewhat weaker phenotypes were observed when binding deficient rabphilin mutants were overexpressed in PC12 cells coexpressing WT rabphilin. The loss of function phenotypes in the SN25 and rabphilin interaction mutants made the authors to claim that rabphilin-SN25 interactions are critical for docking and exocytosis. The role of these interaction sites were subsequently tested in SNARE assembly assays, which were largely supportive of rabphilin accelerating SNARE assembly in a SN25 -terminal dependent way.

      Regarding the impact of this work, the transition of synaptic vesicles to form fusion competent trans-SNARE complex is very critical in our understanding of regulated vesicle exocytosis, and the authors put forward an attractive model forward in which rabphilin aids in catalyzing the SNARE complex assembly by controlling SNAP25 a-helicalicity of the SNARE motif. This would provide here a similar regulatory mechanism as put forward for the other two SNARE proteins via their interactions with Munc18 and intersection, respectively.

      We thank the reviewer #3 for the summary of the paper and for the praise of our work. The point-to-point replies are as follow:

      While discovery of the novel interaction site of rabphilin with the N-Terminus of SNAP25 is interesting, I have issues with the functional experiments. The key reliance of the paper is whether it provides convincing data on the functional role of the interactions, given the history of loss of function phenotypes for Rabphilin. First, the authors use PC12 cells and dense core vesicle docking and fusion assays. Primary neurons, where rabphilin function has been tested before, has unfortunately not been utilized, reducing the impact of docking and fusion phenotype.

      We have discussed these questions as mentioned in our response to Essential Revisions 3 and added this corresponding passage to the Discussion section (pp.18-19, lines 407-427).

      In particular the loss of function phenotype in figure 3 of the n-terminally deleted SNAP25 in docking and fusion is profound, and at a similar level than the complete loss of the SNARE protein itself. This is of concern as this is in stark contrast to the phenotype of rabphilin loss in mammalian neurons where the phenotype of SNAP25 loss is very severe while rabphilin loss has almost no effect on secretion. This would argue that the N-terminal of SNAPP25 has other critical functions besides interacting with rabphilin. In addition, it could argue that the n-Terminal SNAP25 deletion mutant may be made in the cell (as indicated from the western blot) but may not be properly trafficked to the site of release

      To test whether the N-peptide deletion mutant of SN25 can properly target to the plasma membrane, we overexpressed the SN25 FL or SN25 (11–206) with C-terminal EGFP-tag in PC12 cells and monitored the localization of SN25 FL-EGFP and SN25 (11–206)-EGFP near the plasma membrane by TIRF microscopy. We observed that the average fluorescence intensity of SN25 (11–206)-EGFP showed no significant difference with SN25 FL-EGFP as below, suggesting that the N-peptide deletion mutant may not influence the trafficking of SN25 to plasma membrane.

      (A) TIRF imaging assay to monitor the localization of SN25-EGFP near the plasma membrane. Overexpression of SN25 FL-EGFP (left) and SN25 (11–206)-EGFP (right) using pEGFP-N3 vector in PC12 cells. Scale bars, 10 μm. (B) Quantification of the average fluorescence intensity of SN25-EGFP near the plasma membrane in (A). Data are presented as mean ± SEM (n ≥ 10 cells in each). Statistical significance and P values were determined by Student’s t-test. ns, not significant.

    1. Author Response

      Reviewer #1 (Public Review):

      This work focuses on the mechanisms that underlie a previous observation by the authors that the type VI secretion system (T6SS) of a Pseudomonas chlororaphis (Pchl) strain can induce sporulation in Bacillus subtilis (Bsub). The authors bioinformatically characterize the T6SS system in Pchl and identify all the core components of the T6SS, as well as 8 putative effectors and their domain structures. They then show that the Pchl T6SS, and in particular its effector Tse1, is necessary to induce sporulation in Bsub. They demonstrate that Tse1 has peptidoglycan hydrolase activity and causes cell wall and cell membrane defects in Bsub. Finally, the authors also study the signaling pathway in Bsub that leads to the induction of sporulation, and their data suggest that cell wall damage may lead to the degradation of the anti-sigma factor RsiW, leading to activation of the extracellular sigma factor σW that causes increased levels of ppGpp. Sensing of high ppGpp levels by the kinases KinA and KinB may lead to phosphorylation of Spo0F, and induction of the sporulation cascade.

      The findings add to the field's understanding of how competitive bacterial interactions work mechanistically and provide a detailed example of how bacteria may antagonize their neighbors, how this antagonism may be sensed, and the resulting defensive measures initiated.

      While several of the conclusions of this paper are supported by the data, additional controls would bolster some aspects of the data, and some of the final interpretations are not substantiated by the current data.

      • The Bsub signaling pathway that is proposed is intricate and extensive as shown in Fig 5A. However, the data supporting that is very sparse:

      a) The authors show no data showing that the proteases PrsW and/or RasP, or the extracellular sigma factor σW are necessary, or that the cleavage of RsiW is needed, for induction of sporulation - this could presumably be tested using mutants of those genes.

      It has been previously demonstrated that the proteases PrsW and/or RasP cleave RsiW under certain conditions such as alkaline-shock (Heinrich et al., 2009). In first place, PrsW cleaves RsiW and the resulting cleaved-RsiW serves as substrate to RasP. In the previous version of the manuscript, we already demonstrated that treatment with Tse1 causes damage to PG and delocalization of RsiW, however as the reviewer comments we did not show the participation of any of these proteases in the proposed signaling pathway. We have now generated single mutants in rsiW and prsW and they have been treated with Tse1. We have observed no variation in the levels of sporulation compared to untreated strains (Figure 1) a finding according to their suggested implication in the sporulation signaling pathway activated by Tse1. Positive controls, that is the single mutants grown at 37ºC, were still able to sporulate. This data has been added to Figure 6B in the new version of the manuscript.

      As suggested by other reviewers, we have generated a sister plot of this figure showing the raw CFUs in each case. These data are included in Supplementary file 3. This experiment and the related figure have been incorporated into the new version of the manuscript.

      Figure 1. A) Quantification of the percentage of sporulated Bsub, rsiW and prsW cells after treatment with purified Tse1 showing that rsiW and prsW single mutants are blind to the presence of Tse1. B) Cell density (CFUs/mL) of total (blue bars) and sporulated population (brown bars) of different Bacillus strains (Bsub, ∆rsiW and ∆prsW) untreated and treated with Tse1. Sporulation at 37ºC is shown as positive control in each strain. Statistical significance was assessed via t-tests. p value < 0.1, p value < 0.001, **p value < 0.0001.

      Similarly, they don't demonstrate that the levels of ppGpp increase in the cell upon exposure to Pchl.

      We have not been able to measure the levels of ppGpp, however, given that in the same proposed sporulation cascade the levels of different nucleotides are altered (Kriel et al., 2013, Tojo et al., 2013, López and Kolter, 2010), we have alternatively analyzed the levels of ATP using an ATP Determination Kit (Thermo, A22066). We have found that ATP levels increased by 3-fold in Bsub cells treated with Tse1 compared to untreated control cells. Consistently, no increase in ATP levels were observed in rsiW or prsW mutants treated with Tse1. We have incorporated all the raw luminescence data obtained for each sample and treatment in Figure 6-source data 1. This experiment, figures (Figure 6A in the new version of the manuscript) and description in “Materials and Methods” have been added to the new version of the manuscript.

      c) There is some data showing that kinA and kinB mutants don't induce sporulation (Fig supplement 7A), but that is lacking the 'no attacker' control that would demonstrate an induction.

      We have included in the new version of the manuscript the ‘no attacker’ control sporulation (%). The figure shows that the presence of Pchl strains induces the sporulation of all kinase mutants. This new data has been incorporated in Figure 6-figure supplement 1A in the new version of the manuscript.

      d) There is some data showing that RsiW may be cleaved (Fig 5C, D), but that data would benefit from a positive control showing that the lack of YFP foci is seen in a condition where RsiW is known to be cleaved, as well as from a time-course showing that the foci are present prior to the addition of Tse1, and then disappear. As it is shown now, it is possible that the addition of Tse1 just blocks the production of RsiW or its insertion into the membrane (especially given the membrane damage seen). Further, there is no data that the disappearance of the YFP loci requires the proteases PrsW and /or RasP - such data would also support the idea that the disappearance is due to cleavage of RsiW.

      Thank you for your useful suggestion. It is important to consider that we have not seen repression of the expression of genes that encode any of the two proteases on cells treated with Tse1 in our transcriptomics analysis. However, we agree that additional experiments would enhance the significance of our findings. We have repeated the whole experiment including a positive control to demonstrate that YFP foci disappears in a condition in which RsiW is known to be degraded by PrsW and RasP. Bacillus cells have been incubated in medium at pH 10 which provokes an alkaline shock that triggers RsiW cleavage (Asai, 2017; Heinrich et al., 2009). As shown in Fiugre 6D under this condition we also observed disappearance of YFP foci . We have also provided extra images with quantification of average signal from YFP-foci in Figure 6-figure supplement 2 .

      • The entire manuscript suggests that T6SS is solely responsible for the induction of sporulation. While T6SS does appear to play a major part in explaining the sporulation induction seen, in the absence of 'no attacker' controls for Fig. 2A, it is impossible to see this. From the data shown in Fig. 2C, and figure supplement 2A, the 'no attacker' sporulation rate seems to be ~20%, while the rate is ~40% with Pchl strains lacking T6SS, suggesting that an additional factor may be playing a role.

      This must be a misunderstanding of the message of this manuscript. The conceptual fundament of this study was settled in our previous manuscript (Molina-Santiago et al., 2019). We demonstrated that B. subtilis sporulated in the presence of P. chlororaphis. Interestingly, the overgrowth of P. chlororaphis over B. subtilis colony did not eliminate cells of B. subtilis, given that most of them were sporulated. The data we obtained strongly suggested that a functional T6SS was involved in the cellular response of Bacillus in the close cell to cell contact. In this new manuscript, we have explored this idea, and found that indeed, the T6SS of P. chlororaphis mobilized at least one effector, Tse1, which is able to trigger sporulation in Bacillus. Thus we did not conclude, and neither have done in this new study, that T6SS is the only factor expressed by P. chlororaphis responsible for sporulation activation in Bacillus. We have accordingly rephrased some sentences of the manuscript to clarify the proposed implication of T6SS in B. subtilis sporulation.

      In addition, as mentioned above, we have included data of sporulation percentages in the absence of an attacker to better compare the induction of sporulation observed in the presence of the different Pchl strains and in the presence of Tse1.

      Reviewer #2 (Public Review):

      In a previous study, the authors showed that cell-cell contact with Pseudomonas chlororaphis induces sporulation in Bacillus subtilis. Here, the authors build on this finding and elucidate the mechanism behind this observation. They describe the enzymatic activity of a protein (Tse1) secreted by the type VI secretion system (T6SS) of P. chlororaphis (Pch), which partially degrades the peptidoglycan (PG) of targeted B. subtilis cells and triggers a signal cascade culminating in sporulation.

      Most of the key conclusions of this paper (Tse1 being secreted by the T6SS and inducing sporulation in targeted cells) are well supported by the data. One conclusion (sporulation response being an anti-T6SS "defense" strategy) is not well supported by the data and should be removed or rephrased.

      The authors elucidate the enzymatic activity of Tse1, a T6SS effector protein, in a genus (Pseudomonas) of great interest to microbiologists, and to researchers studying the T6SS specifically. They also carefully dissect the cellular response (signal cascade and sporulation) of an important model organism (B. subtilis; Bsub) specifically to exposure to Tse1. The results describing this cellular response contribute substantially to our understanding of how T6SS effector proteins interact with cells of Gram-positive species.

      My only major concerns regard the interpretation of these results as sporulation being an adaptive and/or specific response to attacks by the T6SS. I outline my reasoning below.

      • Interpretation of sporulation as a "defense" mechanism/strategy against the T6SS. In order for a phenotype X to be regarded as a "defense against Y" mechanism, it has to be shown that phenotype X (sporulation in response to Tse1) evolved - at least in part - for the purposes of increasing survival in the presence of Y (T6SS attacker). There are no experiments in this study comparing e.g. a sporulating Bsub with a non-sporulating Bsub, that would allow testing if sporulation increases survival. The experiments carefully describe the cellular response to Tse1, but no inference can be made with regards to this being adaptive for Bsub, or if it helps the cells survive against T6SS attacks, etc. A more parsimonious explanation would be that Tse1 happens to target the PG and causes envelope stress, triggering sporulation. So, it would be a general stress response that also happens to be triggered by T6SS. Now, some general (cell envelope) stress responses are known to be very effective at protecting against the T6SS. But in those instances, a beneficial effect for survival in the face of T6SS attacks has been shown in dedicated experiments. Purely observing a response to a T6SS effector, as this study does (very well), is not evidence that the response has evolved for the purpose of surviving T6SS attacks. Tucked away in the supplement (and briefly mentioned in the main text) is data on Bsub and Bacillus cereus, showing that i) cell densities of the sporulating Bsub and a sporulating B. cereus strain are not affected by an active T6SS, and ii) cell densities of an asporogenic B. cereus are slightly reduced by an active T6SS. However, the effect sizes of density reduction by the T6SS in the asporogenic B. cereus are minute (20x10^6 vs. ~50x10^6). In typical killing assays against e.g. gram-negative strains, a typical effect size for T6SS killing would be a several order of magnitude reduction in survival of the target strain when exposed to a T6SS attacker. Based on this dataset alone (Figure Suppl. 8), I would say that all three Bacillus strains are not experiencing any "fitness-relevant" killing by the T6SS, which is in line with the T6SS often being useless against gram-positives when it comes to killing. Hence, no claims about fitness benefits of sporulation in response to a T6SS attack, or this being a "defense mechanism/strategy" should be made in the manuscript.

      Thanks for this interesting introductory and specific comments. We agree with the reviewer and have rephrased some sentences of the manuscript. Sporulation is not an adaptive or specific response of Bacillus to T6SS, indeed and as stated by reviewer 2, sporulation is a general stress response. It might happen that the way the manuscript was written, at some points, gave the wrong impression. In consequence we have rephrased some sentences. Nevertheless, in Figure supplement 8 (in the new version of the manuscript is Figure 6-figure supplement 3) we made a mistake during generation of the Figure. We have again done this experiment and we have generated a new and corrected chart that shows three orders of magnitude reduction in survival of the asporogenic B. cereus strain in competition with Pchl mutant strains compared to Pchl WT strain. These new findings show that the absence of sporulation ability leads to a severe reduction in survival of Bacillus cereus DSM 2302 population in competition with Pchl with an active T6SS compared to the survival in competition with Pchl hcp mutant. In this figure, it is also shown that Bacillus population also decreased in competition with tse1 mutant, demonstrating that Tse1 is responsible for killing Bacillus. However, there is a statistical difference in the survival of Bacillus competing with hcp or tse1 mutants. The increased survival of Bacillus in the interaction with tse1 strain compared to Bacillus-hcp competition, is suggestive of the ability of this strain to deliver additional T6SS-dependent toxins. This observation is in accordance to the data presented in Fig. 2B, which indicated that tse1 mutant has an active T6SS able to kill E. coli.

      • Data supporting baseline "no competitor" sporulation rates being no different from those triggered by T6SS mutants is not convincing. For the data shown in Fig. 2A, a key comparison here would be to show baseline Bsub sporulation rates in absence of a competitor. This measurement is shown in Fig supplement 2A, and the value shown there (roughly 22% on average) appears to be much lower than the average T6SS mutant shown in Fig. 2A. The main text states that sporulation rates induced rate by the different T6SS mutants are "statistically" similar to the no-competitor baseline (L206/207). I am not convinced by this, since i) overall sporulation rates (incl of WT Pch) appear to have been lower in the experiment shown in supplement 2A, so a direct comparison between the no-competitor baseline and the data shown in Fig. 2A is not possible; and ii) hcp and tse1 mutants were tested in different experiments throughout the study, and sporulation rates appear to consistently hover around 30-40%, which is higher than the roughly 22% for "no competitor" depicted in Supplement Fig2A. I am focussing on this, because for the interpretation of the results, and the main narrative of the paper, knowing if "simply interacting with a T6SS-negative P. chlororaphis" induces some sporulation would make a big difference. One sentence in the discussion adds to my confusion about this: L464/465, "... a strain lacking paar (Δpaar) had an active T6SS that triggered sporulation comparably to Δhcp, ΔtssA, and Δtse1 strains", suggesting that the authors' claims that even strains lacking active T6SS trigger increased sporulation (which I would agree with, based on the data).

      We understand the reviewer's comment that a direct comparison between the two figures is not correct due to fluctuations of the baseline sporulation rates between experiments. To solve this issue, we have added the baseline "no competitor" sporulation percentages in the experiments represented in Figure 2B in the new version of the manuscript.

      Related with the sporulation provoked by a T6SS-negative P. chlororaphis, the reviewer is right. Bacillus sporulation occurs due to many external factors (abiotic and biotic stresses) so the presence of P. chlororaphis in the competition already has an effect on the sporulation percentage of B. subtilis. Accordingly, we have removed the statement on the sporulation rates induced by the different T6SS mutants are "statistically" similar to the no-competitor. However, our previous data (Molina-Santiago, Nat Comm 2019) and current findings convincedly demonstrate the relevance of the T6SS and, specifically the Tse1 toxin, in the induction of sporulation at least in the close cell to cell contact.

      • Claim regarding "bacteriolytic activity" when tse1 is heterologously expressed in E. coli. The data supporting this claim (Fig2-supplement 2C) only shows a lower net population growth rate after induction of tse1 (truncated vs. non-truncated) expression. This could be caused by: slower growth (but no death), equal growth (with some death), or a combination of the two. The claim of "bacteriolytic" activity in E. coli is therefore not supported by this dataset.

      We agree with the reviewer and we have decided to remove this figure and the experiment of “bacteriolytic activity” given that it does not contribute conceptually to the message of the manuscript.

      I cannot comment in more detail on the validity of the biochemistry/enzymatic activity assays as these are not my area of expertise.

      Reviewer #3 (Public Review):

      The authors identify tse1, a gene located in the type 6 secretion system (T6SS) locus of the bacterium Pseudomonas chlororaphis, as necessary and sufficient for induction of Bacillus subtilis sporulation. The authors demonstrate that Tse1 is a hydrolase that targets peptidoglycan in the bacterial cell wall, triggering activation of the regulatory sigma factor sigma-w. The sporulation-inducing effects of sigma-w are dependent on the downstream presence of the sensor histidine kinases KinA and KinB. Overall, this is a well-structured paper that uses a combination of methods including bacterial genetics, HPCL, microscopy, and immunohistochemistry to elucidate the mechanism of action of Tse1 against B. subtilis peptidoglycan. There are some concerns regarding a few experimental controls that were not included/discussed and (in a few figures) the visual representation of the data could be improved. The structure of the manuscript and experiments is such that key questions are addressed in a logical flow that demonstrates the mechanisms described by the authors.

      To begin, we have concerns regarding the sporulation assays and their results. The data should be presented as "Percent sporulation" or "Sporulation (%)" - not as a "sporulation rate": there is no kinetic element to any of these measurements, so no rate is being measured (be careful of this in the text as well, for instance near lines 204). More importantly, there is no data provided to indicate that changes in percent spores are not instead just the death of non-sporulated cells. For example, imagine that within a population of B. subtilis cells, 85% of the cells are vegetative and 15% are spores. If, upon exposure to tse1, a large proportion of the vegetative cells are killed (say, 80% of them), this could lead to an apparent increase in sporulation: from 15% for the untreated population to ~50% of the treated, but the difference would be entirely due to a change in the vegetative population, not due to a change in sporulation. The authors need to clearly describe how they conducted their sporulation assays (currently there is no information about this in the methods) as well as provide the raw data of the counts of vegetative cells for their assays to eliminate this concern.

      Thanks for the suggestion. We have changed all the titles and data presented as “sporulation rate” by “sporulation (%)” or “sporulation percentage”. As also suggested by reviewer 2, we have included the raw data of the CFUs counts of total population and sporulated cells to show that there is no substantial change in the rate of death. Also, we have added a section in Material and Methods to specify how sporulation assays have been done. Quote text:

      “Sporulation assays

      Spots of bacteria were resuspended in 1 mL sterile distilled water. Then, serial dilutions were made and cultured in LB solid media for vegetative cells CFU counts. The same serial dilutions were further heated at 80ºC for 10 minutes to kill vegetative cells and immediately cultured again in LB solid media. Plates were grown overnight at 28 ºC and the resulting colonies were counted to calculate the percentage of Bsub sporulation (%). A list of raw CFUs (total and spore population) from all figures with sporulation percentage is shown in Supplementary file 3.”

      A related concern is regarding the analysis of the kinases and the effects of their deletions on the impact of Tse1. Previous literature shows that the basal levels of sporulation in a B. subtilis kinA or a kinB mutant are severely defective relative to a wild-type strain; these mutants sporulate poorly on their own. Therefore, the data presented on Lines 394+ and the associated Supplemental Figure regarding the sporulation defects of these two mutants are not compelling for showing that these kinases are required for this effector to act. It is likely that simply missing these kinases would severely impact the ability of these strains to sporulate at all, irrespective of the presence of Tse1, and no discussion of this confounding concern is discussed.

      Previous literature shows that mutation of kinases affects sporulation of B. subtilis. Histidine kinases KinA and KinB are the first responsible for initiation of sporulation cascade upon phosphorylation of spo0F. However, as shown in Figure 6-figure supplement 1A, single mutants in these kinases (ΔkinA, ΔkinB) still sporulate given that the phosphorylation cascade is controlled by numerous intermediaries and other histidine kinases that form a multicomponent phosphorelay (KinA-E). In this context, the sporulation of B. subtilis can be also triggered by KinC or KinD in the absence of KinA or KinB, as KinC/KinD can act directly on the master regulator of sporulation Spo0A (Burbulys et al., 1991; Wang et al., 2017).

      In addition, as suggested by reviewer 1, we have added to Figure 6-figure supplement 1A of the new version of the manuscript, the sporulation percentage 'no competitor' control of each kinase mutant and B. subtilis WT. The results show that, as commented by the reviewer and also supported by literature, these mutants sporulate poorly on their own in the absence of an attacker (none). However, as shown in the figure, all kinase mutants increase the sporulation percentage in the presence of a competitor.

      Another concern is regarding the statistical tests used in Figure 2. For statistical tests in A, B, and D, it should be stated whether a post-test was used to correct for multiple comparisons, and, if so, which post-test was used. to provide a stronger control comparison. For C, we suggest the inclusion of a mock control in addition to the two conditions already included (i.e., an extraction from an E. coli strain expressing the empty vector)

      We have clarified the statistical tests used in Figure 2. Briefly, we have used one-way ANOVA followed by the Dunnett test in Figure 2A, B and D for the statistical analysis of the sporulation percentage of Bsub in competition with Pchl as control group. In relation to Figure 2C, it is not possible to add a mock control with a strain carrying the empty vector, because this is a suicide plasmid (pDEST17) unable to replicate in E. coli without chromosome integration.

      An additional concern regarding controls is that there is an absence of loading controls for the immunoblot assays. In Figure 5D and all immunoblot assays, there is no mention of a loading control, which is a critical control that should be included.

      In the previous version of the manuscript, we already included a loading control for Figure 5D in Figure supplement 7B, both for cell and for supernatant fractions. In the new version of the manuscript, the loading control of Figure 6E (in the previous version of the manuscript Figure 5D) is shown in Figure 6-figure supplement 2C. We have also included the original unedited gels and blot (Figure 6-figure supplement 2- source data 1 and Figure 6-figure supplement 2-source data 2).

      Some of the visualizations could be improved to help the reader understand and appropriately interpret the data presented. For instance, in Figures 3 and 4 the scale bars are different across each of the Figure's imaging panels. These should be scaled consistently for better comparison. Additionally, the red false colorization makes the printed images difficult to see. Black-and-white would be easier to see and would not subtract from the images.

      The reviewer is right. Scales bar equal 2 in Figure 3A, but the length of the bars was not the same. We have edited the images to have the same magnifications for better comparison.

      In relation to Figure 4, we have changed the magnifications and now all the figures have the same scale bars and magnifications. In addition, we have added more images of broader fields in Figure 4-figure supplement 1 which were used to measure the percentage of permeabilized cells and to obtain the fluorescence intensity measures shown in Figure 4.

      An additional weakness of the paper is that the RNA-seq data is not fully investigated, and there is an absence of methods included regarding the RNA-seq differential abundance analysis (it is mentioned on L379-380 but no information is provided in the methods). As stated by the authors, 58% of differentially regulated genes belonged to the sw regulon, but the other 42% of genes are not discussed, and will hopefully be a target of future investigations.

      The methods section has been modified for a better explanation of the RNA-seq differential abundance analysis. Quote text: “The raw reads were pre-processed with SeqTrimNext (Falgueras et al., 2010) using the specific NGS technology configuration parameters. This pre-processing removes low-quality, ambiguous and low-complexity stretches, linkers, adapters, vector fragments, and contaminated sequences while keeping the longest informative parts of the reads. SeqTrimNext also discarded sequences below 25 bp. Subsequently, clean reads were aligned and annotated using the Bsub reference genome with Bowtie2 (Langmead and Salzberg, 2012) in BAM files, which were then sorted and indexed using SAMtools v1.484(Li et al., 2009). Uniquely localized reads were used to calculate the read number value for each gene via Sam2counts (https://github.com/vsbuffalo/sam2counts). Differentially expressed genes (DEGs) were analyzed via DEgenes Hunter, which provides a combined p value calculated (based on Fisher’s method) using the nominal p values provided by edgeR (Robinson et al., 2010) and DEseq2. This combined p value was adjusted using the Benjamini-Hochberg (BH) procedure (false discovery rate approach) and used to rank all the obtained DEGs. For each gene, combined p value < 0.05 and log2-fold change > 1 or < −1 were considered as the significance threshold”

      Regarding the RNA-seq analysis, we are aware of the amount of information that can be extracted. Previous to filtering the information shown in the manuscript, we have done bioinformatic analysis trying to find a connection with the cellular response, that is increase of sporulation. Besides this, we had some observations but with no direct connection to sporulation, which would be interesting to pursue in future studies, but not for the clarity of this story (Figure 23 below). In any case, we are including the whole picture of the transcriptomics changes occurring in Bsub after treatment with Tse1. KEGG pathway analyses of genes differentially expressed showed induction of flagellar assembly and aminobenzoate degradation, nitrogen and amino acid metabolisms. Interestingly, fatty acid degradation and CAMP resistance pathways were also induced, probably related to changes suffered in the cell wall after the action of Tse1 toxin. On the other hand, synthesis and degradation of ketone bodies pathway was mostly repressed.

      Figure 2. KEGG pathway analyses of genes differentially expressed occurring in Bsub after treatment with Tse1.

      Another methodological concern in this paper is the limited details provided for the calculation of the permeabilization rate (Figure 4, L359, L662-664). It is not clear how, or if, cell density was controlled for in these experiments.

      We agree with the reviewer and we have explained with more detail how the permeabilization rate was calculated. Quote text: “N=3 for Bsub treated with Tse1 and N=3 for untreated Bsub. N refers to the number of CLSM fields analyzed to calculate the number of permeabilized cells of the total of cells in the field”

      Finally, one weakness of the paper is the broad conclusions that they draw. The authors claim that the mechanism of sporulation activation is conserved across Bacilli when the authors only test one B. subtilis and one B. cereus strain. They further argue (lines 469+) that Tse1 requires a PAAR repeat for its targeting, but do not provide direct evidence for this possibility.

      We have reduced the tone of the final conclusion in order to specify that the activation of sporulation is a mechanism that can be found in different Bacillus species such as Bsub and Bcer. Related with the second appreciation, we have included a further explanation for this argument. Quote text: “As shown in Figure 2B, a paar mutant has an active T6SS able to kill E. coli. However, as shown in Figure 2A, we noticed that a paar mutant (which encodes tse1) is not able to trigger B. subtilis sporulation to a similar level than Pchl WT strain. Given that paar deletion apparently abolishes Tse1 secretion, we suggest that Tse1 is a PAAR-associated effector that requires a PAAR repeat domain protein to be targeted for secretion, thereby increasing Bacillus sporulation during contact with Pseudomonas cells (Cianfanelli et al., 2016; Hachani et al., 2014; Whitney et al., 2014)”.

    1. Author Response

      Reviewer #3 (Public Review):

      The work is of general interest to audiences of public policy and public health. The data found some evidence that mobile health interventions may be affected by the type of mobile used but failed to substantiate the claim conclusively on how the lack of mobile ownership may hinder their rollout process. The claim about gender or geographic inequality must be elaborated in detail and many countries in developing countries are now connecting more users in rural areas through unconventional methods such as village phones instead of just mobile ownership.

      Strengths:

      The main strength of this paper is the usage of the cross-sectional data from the R7 Afrobarometer survey which is a newly available dataset and contains comprehensive data from more than 50 African countries. The usage of the Bayesian Logistic Regression (BLR) model produced some useful findings.

      Weakness:

      1) The authors have generalized a lot of things in a very simple manner. For example, they have assumed if participants have access to the internet means they own a smartphone and if they don't then they are basic phone users. It is possible a lot of smartphone owners do not have subscriptions to the internet due to the high cost of internet in African countries.

      We agree with the Reviewer that some smartphone owners may not have access to the internet due to the high cost of internet in African countries. Therefore, to estimate the percentage of SP owners who may not pay to access the internet, we looked at the frequency of access to the internet within this sub-group (Methods: lines 133-138). In the Afrobarometer surveys, participants were asked how often they accessed the internet; they were not asked to specify how they accessed the internet. We analyzed these data, stratified on the basis of the type of mobile phone that we assumed individuals owned (we assumed that an individual owned a smartphone if they reported that their mobile phone could access the internet, and that an individual owned a basic mobile phone if they reported that their mobile phone could not access the internet).

      Notably, we found that only 13% of individuals that we classified as SP owners (and 89% of individuals that we classified as owners of BP) reported that they never accessed the internet. We now include the results of this analysis in our revised manuscript (Results: lines 219-221); they are presented in Figure 1—figure supplement 2.

      Additionally, we now mention that in order to implement mHealth interventions that are based on smartphones, individuals will need to both own a smartphone and have financial means to access the internet.

      2) They have consistently talked about inequalities in gender, and rural-urban geographic regions based on the odds ratio derived from the BLR. A regression decomposition technique can quantify these differences more elaborately in detail.

      The purpose of our study was to determine – for 33 African countries – what proportion of people owned mobile phones (basic phones & smartphones) in each country, and if there were inequalities/inequities in the ownership of mobile phones based on: (i) gender, (ii) age, (iii) urban-rural residency, (iv) wealth, and (v) distance to a healthcare facility.

      We found a high ownership of mobile phone ownership that our results show varies substantially amongst the 33 countries. Additionally, by conducting a Bayesian Logistic Regression we have found that there are significant inequalities/inequities in all five variables. Additionally, we have identified substantial differences in the degree of these inequities in the 33 countries.

      We agree with the Reviewer that we have not explained why these inequalities exist, and that we could use a regression decomposition analysis to identify explanatory factors. We note that this is the next stage, and current focus, of our research. This next stage requires constructing new statistical models – and utilizing a different dataset – than the models that we present and the dataset that we utilize in our submitted manuscript. Consequently, conducting a regression decomposition analysis is beyond the scope of the present study: it will be an article in its own right.

      However, in response to this Comment, we have now added a description of potential factors that may explain inequalities in gender and rural-urban geographic regions (Discussion: lines 328-339). These factors have been identified in previous studies.

      3) They failed to explain why a lot of poor people own smartphones. This could be due to the usage of village phones (first implemented by Grameen Phone in Bangladesh). This has expanded in African countries as well where multiple users communicate through a community phone connecting more users in rural areas.

      We agree with the Reviewer. We now discuss the utilization of village phones in Africa, as well as other explanatory reasons for why a lot of poor people own smartphones (Discussion: lines 339-354).

      4) Basic phones may also be effective for mobile health interventions through voice-enabled systems and disseminating important messages to communities. (For e.g. there is extensive literature on how community-level messages, such as instructions on personal hygiene and usage of masks, were transmitted through basic phones during the beginning of covid19 in developing parts of Asia).

      We agree with the Reviewer that basic mobile phones may also be effective for mHealth interventions through voice-enabled systems and disseminating important messages to communities. We have added a paragraph (Discussion: lines 370-396) to discuss current mHealth interventions that are being utilized in Africa, including both those based on smartphones and those based on basic mobile phones.

      5) Further clarification of why lack of ownership of a mobile phone may propagate inequalities in health is needed beyond just simple associations. A latent factor may also cause these differences.

      We have added a paragraph (Discussion: lines 356-368) to discuss this topic.

    1. Author Response

      Reviewer 1 (Public Review):

      To me, the strengths of the paper are predominantly in the experimental work, there's a huge amount of data generated through mutagenesis, screening, and DMS. This is likely to constitute a valuable dataset for future work.

      We are grateful to the reviewer for their generous comment.

      Scientifically, I think what is perhaps missing, and I don't want this to be misconstrued as a request for additional work, is a deeper analysis of the structural and dynamic molecular basis for the observations. In some ways, the ML is used to replace this and I think it doesn't do as good a job. It is clear for example that there are common mechanisms underpinning the allostery between these proteins, but they are left hanging to some degree. It should be possible to work out what these are with further biophysical analysis…. Actually testing that hypothesis experimentally/computationally would be nice (rather than relying on inference from ML).

      We agree with the reviewer that this study should motivate a deeper biophysical analysis of molecular mechanisms. However, in our view, the ML portion of our work was not intended as a replacement for mechanistic analysis, nor could it serve as one. We treated ML as a hypothesis-generating tool. We hypothesized that distant homologs are likely to have similar allosteric mechanisms which may not be evident from visual analysis of DMS maps. We used ML to (a) extract underlying similarities between homologs (b) make cross predictions across homologs. In fact, the chief conclusion of our work is that while common patterns exist across homologs, the molecular details differ. ML provides tantalizing evidence to this effect. The conclusive evidence will require, as the reviewer rightly suggests, detailed experimental or molecular dynamics characterization. Along this line, we note that we have recently reported our atomistic MD analysis of allostery hotspots in TetR (JACS, 2022, 144, 10870). See ref. 41.

      Changes to manuscript:<br /> “Detailed biophysical or molecular dynamics characterization will be required to further validate our conclusions(38).”

      Reviewer 3 (Public Review):

      However - at least in the manuscript's present form - the paper suffers from key conceptual difficulties and a lack of rigor in data analysis that substantially limits one's confidence in the authors' interpretations.

      We hope the responses below address and allay the reviewer’s concerns.

      A key conceptual challenge shaping the interpretation of this work lies in the definition of allostery, and allosteric hotspot. The authors define allosteric mutations as those that abrogate the response of a given aTF to a small molecule effector (inducer). Thus, the results focus on mutations that are "allosterically dead". However, this assay would seem to miss other types of allosteric mutations: for example, mutations that enhance the allosteric response to ligand would not be captured, and neither would mutations that more subtly tune the dynamic range between uninduced ("off) and induced ("on") states (without wholesale breaking the observed allostery). Prior work has even indicated the presence of TetR mutations that reverse the activity of the effector, causing it to act as a co-repressor rather than an inducer (Scholz et al (2004) PMID: 15255892). Because the work focuses only on allosterically dead mutations, it is unclear how the outcome of the experiments would change if a broader (and in our view more complete) definition of allostery were considered.

      We agree with the reviewer that mutations that impact allostery manifest in many different ways. Furthermore, the effect size of these mutations runs the full gamut from subtle changes in dynamic range to drastic reversal of function. To unpack allostery further, allostery of aTF can be described, not just by the dynamic range, but by the actual basal and induced expression levels of the reporter, EC50 and Hill coefficient. Given the systemic nature of allostery, a substantial fraction of aTF mutations may have some subtle impact on one or more of these metrics. To take the reviewer’s argument one step further, one would have to accurately quantify the effect size of every single amino acid mutation on all the above properties to have a comprehensive sequence-function landscape of allostery. Needless to say, this is extremely hard! Resolution of small effect sizes is very difficult, even at high sequencing depth. To the best of our knowledge, a heroic effort approaching such comprehensive analysis has been accomplished so far only once (PMID: 3491352).

      Our focus, therefore, was to screen for the strongest phenotypic impact on allostery i.e., loss of function. Mutations leading to loss of function can be relatively easily identified by cell-sorting. Because our goal was to compare hotspots across homologs, we surmised that loss of function mutations, given their strong phenotypic impact, are likely to provide the clearest evidence of whether allosteric hotspots are conserved across remote homologs.

      The reviewer raised the point of activity-reversing mutations. Yes, there are activity reversing mutations in TetR. However, they represent an insignificant fraction. In the paper cited by the reviewer, there are 15 activity-reversing mutations among 4000 screened. Furthermore, the paper shows that activity-reversing in TetR requires two-tofour mutations, while our library is exclusively single amino acid substitutions. For these reasons, we did not screen for activity-reversing mutations. Nonetheless, we agree with the reviewer that screening for activity-reversing mutations across homologs would be very interesting.

      The separation in fluorescence between the uninduced and induced states (the assay dynamic range, or fold induction) varies substantially amongst the four aTF homologs. Most concerningly, the fluorescence distributions for the uninduced and induced populations of the RolR single mutant library overlap almost completely (Figure 1, supplement 1), making it unclear if the authors can truly detect meaningful variation in regulation for this homolog.

      Yes, the reviewer is correct that the fold induction ratio varies among the four aTF homologs. However, we note that such differences are common among natural aTFs. Depending on the native downstream gene regulated by the aTF, some aTFs show higher ligand-induced activation, and others are lower. While this is not a hard and fast rule, aTFs that regulate efflux pumps tend to have higher fold induction than those that regulate metabolic enzymes. In summary, the variation in fold induction among the four aTFs is not a flaw in experimental design nor indicates experimental inconsistency but is instead just an inherent property of protein-DNA interaction strength and the allosteric response of each aTF.

      Among the four aTFs, wildtype RolR has the weakest fold induction (15-fold) which makes sorting the RolR library particularly challenging. To minimize false positives as much as possible, we require that dead mutant be present in (a) non-fluorescent cells after ligandinduction (b) non-fluorescent cells before ligand-induction (c) at least two out of the three replicates for both sorts. Additionally, for RolR specifically, we adjusted the nonfluorescent gate to be far more stringent than the other three aTFs (Fig. 1 – figure supplement 1). Furthermore, we assign residues as allosteric hotspots, not individual dead mutations. This buffers against false strong signals from stray individual dead mutations. Finally, the top interquartile range winnows them to residues showing strong consistent dead phenotype. As a result of these “safeguards” we have built in, the number of allosteric hotspots of RolR (57) is comparable to the other three aTFs (51, 53 and 48). This suggests that we are not overestimating the number of hotspots despite the weaker fold induction of RolR. We highlight in a new supplementary figure (Figure 1 – figure supplement 4) that changing the read count threshold from 5X to 10X produces near identical patterns of mutations suggesting that our results are also robust to changes in ready depth stringency.

      Changes to manuscript: In response to the reviewer's comment, we have added the following sentence.

      “We note that the lower fold induction (dynamic range) of RolR makes it particularly challenging to separate the dead variants from the rest.”

      The methods state that "variants with at least 5 reads in both the presence and absence of ligand in at least two replicates were identified as dead". However, the use of a single threshold (5 reads) to define allosterically dead mutations across all mutations in all four homologs overlooks several important factors:

      Depending on the starting number of reads for a given mutation in the population (which may differ in orders of magnitude), the observation of 5 reads in the gated nonfluorescent region might be highly significant, or not significant at all. Often this is handled by considering a relative enrichment (say in the induced vs uninduced population) rather than a flat threshold across all variants.

      We regret the lack of clarity in our presentation. We wish to better explain the rationale behind our approach. First, we understand the reviewer’s point on considering relative enrichment to define a threshold. This approach works well in DMS experiments involving genetic selections, which is commonly the case, because activity scales well with selection stringency. One can then pick enrichment/depletion relative to the middle of the read count distribution as a measure of gain or loss of function.

      Second, this strategy does not, in practice, work well for cell-sorting screens. While it may be tempting to think of cell sorting as comparably activity-scaled as genetic selections, in reality, the fidelity of fluorescent-activated cell sorters is much lower. Making quantitative claims of activity based on cell sorting enrichment can be risky. It is wiser to treat cell sorting results as yes/no binary i.e., does the mutation disrupt allostery or not. More importantly, the yes/no binary classification suffices for our need to identify if a certain mutation adversely impacts allosteric activity or not.

      Third, the above argument does not imply that all mutations have the same effect size on allostery. They don’t. We capture the effect size on individual residues, not individual mutations, by counting the number of dead mutations at a residue position. This is an important consideration because it safeguards us from minor inconsistencies that inevitably arise from cell sorting.

      Fourth, a variant to be classified as allosterically dead, it must be present both in uninduced and induced DNA-bound populations in at least two out of three replicates (four conditions total). This is a stringent criterion for selecting dead variants resulting in highly consistent regions of importance in the protein even upon varying read count thresholds. To the extent possible, we have minimized the possibility of false positive bleed-through.

      Finally, two separate normalizations were performed on the total sequence reads to be able to draw a common read count threshold 1) between experimental conditions & replicates and 2) across proteins. First, total sequencing reads were normalized to 200k total across all sample conditions (presorted, -inducer, and +inducer) and replicates for each homolog, allowing comparisons within a single protein. Next, reads were normalized again to account for differences in the theoretical size of each protein’s single-mutant library, allowing for comparisons across proteins by drawing a commont readcount cutoff. For example, total sequencing reads of RolR (4,332 possible mutants) increased by 1.18x relative to MphR (3,667 possible mutants) for a total of 236k reads.

      Changes to manuscript: We have provided substantial additional details in the Fluorescence-activated cell sorting and NGS preparation and analysis sections.

      We also added the following in the main text.

      “In other words, we use cell sorting as a binary classifier i.e., does the mutation disrupt allostery or not. We capture the effect size on individual residues, not individual mutations, by counting the number of dead mutations at a residue position. This is an important consideration because it safeguards us from minor inconsistencies that inevitably arise from cell sorting.”

      Depending on the noise in the data (as captured in the nucleotide-specific q-scores) and the number of nucleotides changed relative to the WT (anywhere between 1-3 for a given amino acid mutation) one might have more or less chance of observing five reads for a given mutation simply due to sequencing noise.

      All the reads considered in our analyses pass the Illumina quality threshold of Q-score ≥ 30 which as per Illumina represent “perfect reads with no errors or ambiguities”. This translates into a probability of 1 in 1000 incorrect base call or 99.9% base call accuracy.

      We use chip-based oligonucleotides to build our DMS library, which allows us to prespecify the exact codon that encodes a point mutation. This means the nucleotide count and protein count are the same. The scenario referred to by the reviewer i.e., “anywhere between 1-3 for a given amino acid mutation” only applies to codon randomized or errorprone PCR library generation. We regret if the chip-based library assembly part was unclear.

      Depending on the shape and separation of the induced (fluorescent) and uninduced (non-fluorescent) population distributions, one might have more or less chance of observing five reads by chance in the gated non-fluorescent region. The current single threshold does not account for variation in the dynamic range of the assay across homologs.

      We have addressed the concern raised by the reviewer on fluorescent population distributions in answers to questions 10 and 11.

      The reviewer makes an important point about the choice of sequencing threshold. We use the sequencing threshold to simply make a binary choice for whether a certain variant exists in the sorted population or not. We do not use the sequencing reads as to scale the activity of the variant. To address the reviewer's comment, we have included a new supplementary figure (Fig 1 – figure supplement 4) where we compare the data by adjust the threshold two levels – 5 and 10 reads. As is evident in the new figure, the fundamental pattern of allosteric hotspots and the overall data interpretation does not change.

      TetR: 5x – 53 hotspots, 10x – 51 hotspots

      TtgR: 5x – 51 hotspots, 10x – 51 hotspots

      MphR: 5x – 48 hotspots, 10x – 48 hotspots

      RolR: 5x – 57 hotspots, 10x – 60 hotspots

      In other words, changing the threshold to be more or less strict may have a modest impact on the overall number of hotspots in the dataset. Still, the regions of functional importance are consistent across different thresholds. We have expanded the discussion in the manuscript to address this point.

      Changes to manuscript: We have now included a new supplementary comparing hotspot data at two thresholds: Figure 1 – figure supplement 4.

      We also added the following in the main text.

      “To assess the robustness of our classification of hotspots, we determined the number of hotspots at two different sequencing thresholds – 5x and 10x. At 5x and 10x, the number of hotspots are – TetR: 53, 51; TtgR: 51, 51; MphR: 48, 48 and RolR: 57,60, respectively. Changing the threshold has a modest impact on the overall number of hotspots and the regions of functional importance are consistent at both thresholds”

      The authors provide a brief written description of the "weighted score" used to define allosteric hotspots (see y-axis for figure 1B), but without an equation, it is not clear what was calculated. Nonetheless, understanding this weighted score seems central to their definition of allosteric hotspots.

      We regret the lack of clarity in our presentation. The weighted score was used to quantify the “deadness” of every residue position in the protein. At each position in the protein, the number of mutations that inhibited activity was summed up and the ‘deadness’ of each mutation was weighted based on how many replicates is appeared to inactivate the protein. Weighted score at each residue position is given by

      Where at position x in the protein, D1 is the number of mutations dead in one replicate only, D2 is the number of mutations dead in 2 replicates, D3 is the number of mutations dead in 3 replicates, and Total is the total number of variants present in the data set (based on sequencing data). Any dead mutation that is seen in only one replicate is discarded and does not contribute to the “deadness” of the residue. Mutations seen in two and three replicates contribute to the score. We have included a new supplementary figure (Fig. 1 – figure supplement 2) to give the reader a detailed heatmap of all mutations and their impact for each protein.

      Changes to manuscript: The weighted scoring scheme is now described in greater detail under Materials and Methods in the “NGS preparation and analysis” section.

      The authors do not provide some of the standard "controls" often used to assess deep mutational scanning data. For example, one might expect that synonymous mutations are not categorized as allosterically dead using their methods (because they should still respond to ligand) and that most nonsense mutations are also not allosterically dead (because they should no longer repress GFP under either condition). In general, it is not clear how the authors validated the assay/confirmed that it is giving the expected results.

      As we state in response to question 12, we use chip-based oligonucleotides to build our DMS library, which allows us to pre-specify the exact codon that encodes a point mutation. We have no synonymous or nonsense mutations in our DMS library. Each protein mutation is encoded by a single unique codon. The only stop codon is at 3’end of the gene.

      The authors performed three replicates of the experiment, but reproducibility across replicates and noise in the assay is not presented/discussed.

      Changes to manuscript: A new supplementary table (Table 1) is now provided with the pairwise correlation coefficients between all replicates for each protein.

      In the analysis of long-range interactions, the authors assert that "hotspot interactions are more likely to be long-range than those of non-hotspots", but this was not accompanied by a statistical test (Figure 2 - figure supplement 1).

      In response to the reviewer's comment, we now include a paired t-test comparing nonhotspots and hotspots with long-range interactions in the main text.

      Changes to manuscript: In all four aTFs, hotspots constituted a higher fraction of LRIs than non-hotspots (Figure 2 – figure supplement 1; P = 0.07).

    1. Author Resonse

      Reviewer #1 (Public Review):

      The authors trained rats to self-initiated a trial by poking into a nose poke, and to make a sequence of 8 licks in the nose poke after a visual cue. Trials were considered valid (called "timely") only if rats waited for more than 2.5 sec after the end of the previous trial. An attempt to initiate a trial (nose poking) before the 2.5 sec criterion was regarded as "premature". The authors recorded from the dorsal striatum while rats performed in this task. The authors first show that some neurons exhibited a phasic activation around the time of port entry detected using an infrared detector ("Entry cell"), as well as port exit ("Exit cell). Some neurons showed activation at both entry and exit ("Entry and Exit cell") or between these two events ("Inside-port cell"). Fractions of neurons that fall into these four categories are roughly the same (Fig. 3C). The main conclusions drawn from this study are that (1) the activity preceding a port entry was positively correlated with the latency to initiate a trial (or "waiting time"; Fig. 4E), which appear to reflect the value upcoming reward, and that (2) in adolescent rats, the activity rose more steeply with the latency to trial initiation (Fig. 7J).

      These observations are potentially interesting, in particular, the possible difference between adult and adolescent rats is intriguing. However, this study does not examine whether this brain region actually plays a role in the task. Some of the conclusions appear to be premature.

      1) Previous studies have found correlations between the activity of neurons in the striatum and the latency to trial initiation (e.g. Wang et al., Nat. Neurosci., 2013) or action initiation more generally (e.g. Kunimatsu et al., eLife, 2018). In the former study, the trial initiation was self-generated, similar to the present study, and was modulated by the overall reward value (state value). In the latter study, the latency was instructed by a cue. Furthermore, there are many studies that showed correlations between striatal activity and future rewards (e.g. Samejima et al., Science, 2005; Lau and Glimcher, 2008). Many of these studies varied the value of upcoming reward (e.g. amount or probability). Although some details are different, the basic concepts have been demonstrated in previous studies.

      Although there are other studies linking striatal activity to trial/action initiation and reward probability, here the striatal activity preceding the execution of a learned sequence is dependent on the internal representation of the time waited. Elapsed time is the only cue the animal has regarding the possible outcome until it is too late and the trial has already been initiated. Although a light cue then tells the rat if the timing was correct or not, providing an opportunity to stop the behavior, the behavior released during premature trials resembles very closely that observed during unrewarded timely trials. This remarkable similarity between premature trials and timely unrewarded trials allowed comparing very advantageously the effect of wait time-based modulation of anticipatory striatal activity. Moreover, we have compared striatal activity between adult and adolescent rats finding a steeper wait time-based modulation of striatal activity in adolescent animals that correlates with a more impulsive behavior in these animals.

      2) The authors conclude that "in this task, the firing rate modulation preceding trial initiation discriminates between premature and timely trials and does not predict the speed, regularity, structure, value or vigor of the subsequently released action sequence". This conclusion is based on the observation that premature and timely trials did not differ in terms of kinematic parameters as measured using accelerometer. Although the result supports that the difference in activity between premature and timely cannot be explained by the kinematic variables, it does not exclude the possibility that the activity is modulated by some kinematic variables in a way orthogonal to these trial types.

      While our accelerometer data do not support that differences in movement initiation time or velocity could explain the differences in striatal activity between adolescent and adult rats, we can not rule out that kinematic variables not captured by the head accelerometer recordings could explain some of the results. This is acknowledged in the main text, results section, page 8, line 180.

      3) The firing rate plot shown in Figure 4D should be replotted by aligning trials by movement initiation (presumably available from accelometer or video recording). Is it possible that the activity rise similarly between trials types but the activity is cut off depending on when the animal enters the port at different latency from the movement initiation? In any case, the port entry is a little indirect measure of "trial initiation".

      Unfortunately, we have not systematically obtained video recordings of the sessions and only have accelerometer recordings of a few of the animals that provided the neuronal data, which precludes replotting the data as suggested. Accelerometer recordings are available from two of adult and two adolescent rats. Latency from movement initiation to port entry do not differ between premature and timely trials at both ages. This is now reported on page 8 line 175 for adult rats, and page 15 line 341 for adolescent rats. These results appear to be at odds with the idea that decreased neuronal activity in premature trials is the result of a cut-off of the response.

      4) The difference between adult and adolescent rats are not particularly big, with the data from the adolescent rats showing a noisy trace.

      New data from two adolescent rats reduced the variability and confirmed the behavioral and physiological differences with adult rats. All panels from figure 7 now include the data from 5 adolescent animals instead of 3. The number of neurons analyzed in the adolescent group passed from 552 to 876. The inclusion of these new data allowed us to perform new statistical comparisons. We adjusted a logistic function to accumulated trial initiation timing data (Fig.7N) and found that the rate of accumulation is higher in adolescent rats. Importantly, this is observed not only in the part of the curve corresponding to premature responding but also during timely responding, indicating that adolescent rats' premature responding is a manifestation of a more general behavioral trait that makes them self-initiate trials faster than adults (Fig. 7N). The noisy trace of curves showing the amplitude modulation of anticipatory activity as a function of waiting time was partly due to the relatively low number of premature trials that demanded using relatively long time bins. With more data available we have been able to replot these curves using a smaller bin size for the short waiting times (Fig. 7M). We have adjusted a logistic function to these data and observed a higher rate of increase of this activity modulation in adolescent rats, paralleling the behavioral data. Moreover, we report a significant correlation between the behavioral and neurophysiological data (a steeper rate of trial initiation times curve correlates with a steeper wait modulation of anticipatory activity, Fig. 7O). These new findings are reported in the results section, from page 17 line 405 to page 18 line 417.

      Reviewer #2 (Public Review):

      The authors conduct an ambitious set of experiments to study how neural activity in the dorsal striatum relates to how animals can wait to perform an action sequence for reward. There are a lot of interesting studies on striatal encoding of actions/skills, and additionally evidence that striatal activity can help control response timing and time-related response selection. The authors bridge these issues here in an impressive effort. Recordings were made in the dorsal striatum on several tasks, and activity was assessed with respect to action initiation, completion, and outcome processing with respect to whether animals could wait appropriately or could not wait and responded prematurely. Conducting recordings of this sort in this task, particularly in some adolescent animals, is technically advanced. I think there is a very timely and potentially very interesting set of results here. However, I have some concerns that I hope can be addressed:

      It seems like the recordings were made throughout the dorsal striatum (histology map), including some recordings near/in the DLS. Is this accurate? The manuscript is written as though only the DMS was recorded.

      We acknowledge that our recordings are spread along the medial and central regions of the dorsal striatum. Although we are not sure that there is a consensus regarding the limits of the DMS and DLS, we believe that none of our recordings are clearly located within the DLS. Following your suggestion, we have modified the text and refer to the location of our recordings as “dorsal striatum”. We believe that, as there is a lot of work on the roles of the DLS and DMS in reward learning, it is still important to refer to this work in the Introduction section and to discuss our findings in its context, particularly, since we find that most task-related activity is concentrated at the beginning and end of the task as shown in several studies focused in the DLS.

      If I understand correctly, the rats must lick 8 times to get the water. If this is true, one strategy is to just keep licking until the water comes. Therefore, the rats may not have learned an 8-lick action sequence. The authors should clarify this possibility, and if it is, to consider avoiding using phrases like "automatized action sequence" since no real action sequence might have been learned. In short, I am not convinced the animals have learned an action pattern rather than to just keep licking once a waiting period has elapsed.

      We acknowledge that the experiments do not allow us to establish if the rats know what the exact number of licks needed is; when the skill is acquired, licking becomes highly stereotyped and the rats might as well be learning a time after which continuous licking leads to reward. We still believe that the stereotyped performance, the inability to stop the behavior when the absence of the light cue unequivocally indicates that no reward will be obtained in premature trials, and the rapid decrease of lick rate after the eighth lick was emitted and no reward was obtained, support that the behavior is automatic until the time of expected reward delivery. A representative raster plot showing lick sequences during a whole session in a trained adult rat is presented in Fig. 1I and Figure 7 – supplement 1H shows an example of the licks of an adolescent rat.

      The number of subjects per group is very low. This is fine for analysis of within-animal neural activity. However, comparing the behavior between these groups of animals does not seem appropriate unless the Ns are substantially increased.

      The revised version of the manuscript includes a higher number of adolescent rats from which striatal activity and behavior were recorded, which allowed us to perform a more detailed statistical analysis of the correlations between these measures. In addition, we now include new behavioral data from an independent sample of non-implanted 6 adults and 6 adolescent rats that confirms the results obtained with the implanted animals (presented in Figure 7 – supplement 4).

      I found the manuscript difficult to decipher. There are many groups. If I understand correctly, there are the following:

      -ITI 2.5s experiment

      -ITI 5 s experiment

      -ITI2.5-5s experiment

      -ITI 2.5 s experiment (adolescent)

      -Two accelerometer animals (unclear which experiment)

      -Two animals in ITI 2.5 sec without recordings (unclear how incorporated into analyses)

      Within each group, there are multiple categories of behavioral performance. This produces a large list of variables. In some parts of the results, these groups are separated and compared, but not all groups are compared in those such sections. In other sections the different groups (all or just some?) appear to be combined for analysis, but it is not clearly described. Another consequence of mixing the groups and conditions together in analysis as they do is that some of the statements in the results are very hard to follow (E.g., line 305 "...similar behavior observed in 8-lick prematurely released and timely unrewarded trials...").

      To clarify the experimental groups, we now include a table (Table 1) summarizing which tasks were used and how many animals were trained in each task.

      Generally, it is difficult to understand the results without first understanding the details of the different tasks, the different groups of animals, and the different epochs of comparison for neural analysis. It took me a long time to work through the methods and I am still not sure I completely understand it. On this point, some sentences are very long and should be broken up into smaller, clearer sentences. There are a lot of phrases that only someone familiar with the cited articles might understand what they mean (e.g., even one paragraph starting with line 39 includes all of the following terms: automaticity in behavior; behavioral unit or chunk; reward expectancy; reward prediction errors and trial outcomes; explore-exploit; cost-benefit; speed-accuracy tradeoffs; tolerance to delayed rewards; internal urgency states). It is very hard to follow how each of these processes are to be understood in terms of behavioral measures used to study them and how they do or do not relate to the hypothesis of the present study. The discussion similarly uses a lot of different phrases to discuss the task and neural responses in a way that makes it hard to understand exactly what the author's interpretation of the data are. Is there maybe a 'most likely' interpretation that can be stated for some of the responses?

      Our main aim is to disclose the mechanisms underlying differences between adult and adolescent rats relating to impulsivity. We hope that this will become clearer in this version of the manuscript after deepening the analysis of the differences between them. We believe that our data do not allow us to unequivocally determine what is the ultimate cognitive process producing the striatal activity differences between adult and adolescent rats, i.e., differences in internal urgency states, time perception, tolerance to delayed rewards, and tried to reflect that fairly in the Discussion.

      The data set is extremely rich; there are lot of data here. As a result it can be hard to understand how all of the data relate to the main hypothesis of the article. It often reads as an exploratory set of results section rather than a series of hypothesis tests.

      We have tried to improve the overall clarity of the text.

      Reviewer #3 (Public Review):

      Cecilia-Martinez et al., implement a task that allows the study of premature versus timely actions in rats. First, they show that rats can learn this task. Next, they record the activity in the DMS showing start/stop signals in the cells recorded, next they propose that the activity detected before the release of actions sequences discriminate the premature vs the timely initiations showing a relationship between the waiting time and the activity of cells recorded, furthermore they show that it could be the expectancy of reward what could be encoded in the activity before entering the port. Last they show that adolescent rats show more premature starts than adult rats documenting a difference in activity modulation of DMS cells in the relation between waiting time and firing rate (although above the premature threshold, see comments below).

      Overall the paper is well presented describing a well-developed set of experiments and deserves publication attending only minor comments.

      1) I understand rats learn to execute sequences of <8licks or 8 licks, although diagrams are presented, no examples of the individual trials with 8 licks, neither distributions of bouts of these licks are presented.

      Rats learn to execute a lick sequence to obtain the reward. The experiments do not allow us to establish if they know what the exact number of licks needed is; when the skill is acquired, licking becomes highly stereotyped and the rats might as well be learning a time after which continuous licking leads to reward. A representative raster plot showing lick sequences in a session in a trained adult rat is presented in Figure 1I and Figure 7 - supplement 1H shows an example of the licks of an adolescent rat.

      2) Relevant to the statement: "in this task, the firing rate modulation preceding trial initiation discriminates between premature and timely trials and does not predict the speed, regularity, structure, value or vigor of the subsequently released action sequence"... It is not clear if the latency to first lick (plot 2D) and the inter-lick interval (2E) is only from the 8Lick sequences or not. If that is not the case, it is important to compare only the ones with 8Licks.

      The data are from 8 lick sequences, this is now indicated in the figure legend.

      3) Related to the implications of the previous statement, there seems to be a tendency for longer latency to first lick in timely vs premature trials in Figure 2D (timely-trials-Late vs premature-trials-late)? Again here it is important to compare the 8licks sequences only.

      Only 8-lick sequences are compared and the two-way ANOVA showed a significant effect of the training stage without significant effects of trial timing (premature versus timely) and a non-significant interaction. The average ± SEM latencies to the first lick (of the eighth lick sequence) were 0.717 s ± 0.063 for timely trials late and 0.805 s ± 0.086 for premature trials late.

      4) I could not find in the main text whether the individual points in Fig.2 (e.g. 2B-E) are individual animals. Please specify that.

      In this figure panels every individual point corresponds to the mean of a session, the data correspond to 5 adult animals (2-5 sessions per animal and timing condition). Whether the data correspond to animals or sessions is now clarified in all figure legends.

      5) Although very elegant the argument presented in Figure 4C and 6C, I wonder if the head acceleration may lose differences in movements outside the head in the two kinds of trials. If that is the case please acknowledge it.

      We acknowledge in the main text, results section, page 8, line 180, that the accelerometer does not allow us to determine if the movements of other body parts differ between trial types.

      6) Also in 4C, small separations between timely vs premature signals are seen before 0. Is there a way to know if animals in timely vs premature trials approached the entry port in the same way? This request is pertinent in order to rule out motor contribution to the differences in Figure 4A-B.

      Although it is not possible to completely rule out small movement differences between premature and timely trials, no evident behavioral differences can be detected by trained observers or by analyzing video recordings taken during some sessions. The available accelerometer recordings also suggest that a similar motor pattern is displayed in premature and timely trials (Figure 4C).

      7) when saying: "Similar results were obtained in rats trained with a longer waiting interval (Supplementary Figure 5)", "is hard to see the similarity in the premature range, while in the 2.5 seconds task there is a positive relationship in the 5 seconds task it is not.

      Please note that a positive relationship is observed for the two bins preceding trial initiation, which are about 2.75s and 1s before port entry. The bin that seems to not fit is centered 4s before port entry (1s after exiting the port in the previous trial). Because of the longer waiting time, in the 5 s task behavior becomes less organized during the first seconds after port exit, however, the modulation of activity is still observed in the bins that are close to port entry.

      8) The data showing that the waiting modulation of reward anticipation grows at a faster rate in adolescent rats is clear, however, it is not clear how it could be related to the data showing that the adolescent rats were more impulsive.

      We acknowledge that the data do not provide a causal link with behavior. After adding two new adolescent rats we have been able to study in more detail the relationship between the waiting modulation of neuronal activity and the accumulation of trial initiations (depicted in figures 7M and 7N respectively) by adjusting logistic functions to the data. The new results are explained on page 17,line 384. There is a striking parallel between the growth rate of both curves, and the curves of adolescent rats are significantly steeper than those of adult rats. Moreover, there is a significant correlation between the coefficients that mark the rate of growth of the behavioral and neurophysiological data (Fig. 7O).

      9) Related to the sentence: "the strength of anticipatory activity increased with the time waited before response release and was higher in the more impulsive adolescent rats"....One may expect to see a difference in the range of the premature time however the differences were observed in the range >2.5 seconds. Please explain how to reconcile this finding with the fact that the adolescent rats were more impulsive.

      Please, note that the more impulsive behavior of adolescent rats (and the faster growth of the wait modulation of anticipatory activity) is observed along waiting times that exceed the 2.5s criterion wait time; we added a phrase in the Results section (page 18, lines 413) and in the Discussion section (page 19, line 443) to emphasize this point. Regarding the premature trials, a related issue was raised by reviewer #1, concern 4. The addition of new data from adolescent animals allowed us to used smaller bins to better discriminate what happens at short waiting times and included an inset in Figure 7M that allows to better appreciate what happens at these intervals.

    1. Author Response

      Reviewer #2 (Public Review):

      “The authors wish to relate beat-to-beat coordination of cardiac function (in this case as measured left ventricular pressure) to the activity of sympathetic neuron spiking within the stellate ganglion. A strength includes the challenging measurements from multiple stellate neuron activity over long durations in situ in the anesthetized pig.”

      We thank the reviewer for their feedback.

      “A major and overriding weakness is the founding assumption of the analysis that the underlying sympathetic neurons are all cardiac functioning in nature - an assumption that is overwhelmingly unlikely given the evidence in other species including humans that stellate postganglionic neurons are functionally mixed and have functional noncardiac targets. The use of broad and poorly explained/defined terms such as "event entropy" is difficult to follow and find meaning from. The manuscript is filled with difficult-to-follow text like "The neural specificity metric (Sudarshan et al., 2021). Fig. 5", is used to evaluate the degree to which neural activity is biased toward control target states taken here as LVP" and "The neural specificity is reduced from a multivariate signal to a univariate signal by computing the Shannon entropy at each timestamp of the mapped neural specificity metric". The figures are difficult to understand with axes that often bear no units or are quite compressed obscuring the intuitive meaning of the data trends. Fundamentally, cardiac pressure cycles with each heartbeat - roughly once per second - yet fluctuations in the depicted mean spike rate data with changes perhaps ten times in 25 minutes. Such plots are disorienting and difficult to associate with cardiac or neuron "functioning". Only 17 of the 38 references are not self-citations and thus the cited literature represents a narrow view of sympathetic regulation and sympathetic/stellate ganglion knowledge. Much of the foundations are self-professed in earlier publications by the present group and assumed to be accepted.”

      “Fundamentally, cardiac pressure cycles with each heartbeat - roughly once per second - yet fluctuations in the depicted mean spike rate data with changes perhaps ten times in 25 minutes. Such plots are disorienting and difficult to associate with cardiac or neuron "functioning”

      We would like to clarify this point with the understanding that the reviewer is referring to the time axis in Figure 3C in the manuscript.

      The coactivity matrix constructed in Figure 3C computes the cross correlation in sliding mean/std spike activities for different pairs of channels. The mean spiking activities across channels, as the reviewer correctly pointed out, do indeed have a weak autocorrelation with the period of the heart rate. The weak correlation for the heart rate period, possibly due to slow firing rates, was seen across all channels of both control and HF animals. But, the cause of a large proportion of channel-pairs exhibiting high coactivity, termed as cofluctuation (Shown as red tracings in Fig 3D), is not known and cannot be directly associated with cardiac functioning.

      The cofluctuation was also found to be aperiodic in nature approximating a lognormal distribution (Fig R1) with the HF animals containing heavy tails outside their confidence intervals (Fig R1B). The event rate computed from the cofluctuation time series (shown as blue steps in Fig 3E) for an animal is a measure of spatial coherence among SG neural populations and was developed as a novel metric to be used in future studies.

      Figure R1: Cofluctuation histograms (calculated from mean or standard deviation of sliding spike rate, referred as Cofluctuation_MEAN and Cofluctuation_STD, respectively) and log-normal fits for each animal group. μF IT and σF IT are the respective mean and standard deviation (STD) of fitted distribution, used for 68% confidence interval bounds. A-B: Control animals have narrower bounds and represent a better fit to log-normal distribution. C-D: Heart failure (HF) animals display more heavily skewed distributions that indicate heavy tails.

      “Only 17 of the 38 references are not self-citations and thus the cited literature represents a narrow view of sympathetic regulation and sympathetic/stellate ganglion knowledge. Much of the foundations are self-professed in earlier publications by the present group and assumed to be accepted.”

      We thank the reviewer for pointing this out. We have added four additional citations that include methods such as neural population bias and spatiotemporal dynamics linkages to control targets in the neuroscience literature. We have added these citations to page 15 in the “Conclusion” section of the manuscript. In addition, it is our group’s specialty to carry these cardiac nervous system experiments, we are not aware of another group collecting multi-electrode array data from the cardiac nervous system and studying population dynamics of cardiac neurons. Hence we build on based on our previous learnings. The most relevant literature (not necessarily related to cardiac nervous system) can be found in the neuroscience references we cited that contain applications of neural population recordings for different brain areas, mainly in neuropsychiatry domain to understand disease dynamics.

      “For the expert or even the uninformed reader, this report is broadly confused and confusing. The premises (beat to beat or whether LVP conveys cardiac function) are poorly supported. The conclusions are quite vague.”

      Thank you for your feedback. To simplify the understanding, we moved all mathematical details to supplementary material, re-wrote the abstract and the conclusion from scratch, and splitted the methods figures that may be confusion. We believe that our novel metrics event rate and entropy capture non-trivial linkages between heart failure status, cardiac neural activity (spike activity), and peripheral activity (LVP). We have supported our metrics with 17 animals with state-of-the-art surgical techniques and technology, and reported our results with detailed statistical analyses. Our manuscript essentially highlights that event rate and entropy metrics are significantly different between control animals and animals with heart failure. These metrics can be used to design future studies with these animal models to provide a more quantitative approach to heart disease, rather than binary (yes or no) descriptions.

      “Discussion: The abstract does not convey conclusions from the findings and contains broad statements such as "signatures based on linking neuronal population cofluctuation and examine differences in "neural specificity" of SG network" that have little substantive value or conclusion for the reader. Fundamentally what does the title "signatures based on linking neuronal population" cofluctuation mean to the reader? What changed in HF?”

      Thank you for this comment. We completely revised the abstract and conclusion as detailed in our response to Essential Revision #1. Event rate is a metric related to neural activity recordings and entropy is related to the association of neural activity to left ventricular blood pressure. Our findings suggest that both the neural population activity itself (event rate) and its ability to pay attention to cycles of left ventricular pressure (neural specificity) are significantly higher in animals with HF compared to controls.

    1. Author Response

      Reviewer #2 (Public Review):

      McCoy et al. has developed a new urban tree species database from existing city tree inventories. They designed procedures to collect and clean a large amount of data, i.e., more than five million trees from 63 US cities. They found that urban trees were significantly clustered by species in 93% of cities using the compiled data. They also showed that climate significantly shaped both nativity and tree diversity. Also, they identified the homogenization effect of the non-native species. The interest in patterns of urban biodiversity and its driving mechanism has been rising recently. This paper provides an important data source for addressing research questions on this topic. The finding presented by the authors exemplified its potential. Strengths Compared to the existing urban tree database, such as the one developed by Ossola et al.(Global Ecology and Biogeography 2020), the new database added information on spatial location, nativity statuses, and tree health conditions besides occurrences. The new information expands data usability and saves valuable time for researchers. The authors also make the tools available so others can use them to process their own data sets. Because of the added information, various analyses of the diversity pattern of urban trees and the potential driving mechanism could be conducted. The authors found that individual species nonrandomly clustered urban trees. This finding corroborates the existing knowledge that some common species dominate urban trees. Nevertheless, the authors showed that the dominance was apparent in the spatial dimension. The preliminary finding that the native status of a tree had no apparent impact on tree health is interesting. It can potentially contribute to the debate on native vs. exotic in urban tree species selection, which the author mentioned in the paper.

      Thank you for the feedback!

      Weakness

      While the new database and the analysis based on it has strengths, some aspects of the concepts and data analysis need to be clarified and extended.

      We appreciate these helpful comments and have made many changes in response, detailed below.

      First, the authors need to define several critical concepts used in the paper, including city trees, urban forests, biodiversity, and species diversity. The authors used city trees and urban forests interchangeably throughout the paper. Nevertheless, a widely accepted definition of the urban forest is:"All woody and associated vegetation in and around dense human settlements." Konijnendijk et al. had a good discussion on the terminology used in urban forestry (Urban Forestry & Urban Greening, 2006). Similarly, biodiversity is different from species diversity. Effective species number is a diversity indicator. Therefore, it is challenging to accept conclusions being drawn on biodiversity in urban forests without clear definitions.

      We appreciate these clarifications– we have clarified our terminology throughout and added these important definitions.

      • “...urban forests, which are the woody and associated vegetation in and around dense human settlements (Konijnendijk et al., 2006).”

      • “City tree communities, an essential component of urban forests, provide many services.”

      We replaced the term “biodiversity” throughout the text where really we meant to say “tree species diversity” or just “diversity.”

      Second, the tree inventories varied significantly regarding the number of records (214~720,140). The variation can be due to the actual variation of tree abundance in studied cities or incomplete inventories. Biases can be introduced into the findings when comparing these inventories without adjusting the unequal sample sizes. The authors did not detail how they dealt with this issue when conducting the analysis.

      We redid all of our relevant analyses and applied Chao’s rarefaction and extrapolation techniques throughout the manuscript. The (substantial) changes are fully described above in the “Essential Revisions” section. We also copy them here.

      First, we redid all of our diversity calculations applying Chao’s rarefaction and extrapolation techniques through the R package iNext. Therefore, our summary datasheet now has many new columns to include the following values for each city:

      ○ Effective species number:

      ■ Raw effective species number

      ■ Asymptotic estimate of effective species number with confidence interval

      ■ Estimate of effective species number for a given population size (37,000 trees– the median population size rounded to the nearest 1,000) with confidence interval

      ○ Species richness:

      ■ Raw species richness (number of species)

      ■ Asymptotic estimate of number of species with confidence interval

      ■ Estimate of number of species for a given population size (37,000 trees– the median population size rounded to the nearest 1,000) with confidence interval

      ○ The same for the native-only population of trees in each city (e.g., not just raw number of effective number of native species but also the iNext estimates and confidence intervals)

      ○ Whether or not each of the values above was calculated using extrapolation or interpolation

      ○ Sample coverage estimates

      Second, we re-ran our models testing for significant correlations between species diversity in a city and other factors (including climate), where we used the extrapolated / interpolated effective species numbers from iNext. Specifically, we found the best fit model, which included the following predictors: environmental PCA1, environmental PCA1:environmental PCA2, and whether or not a city was designated as a Tree City USA. Then, we ran this model under six sensitivity conditions, varying the independent variable and/or which cities we included based on completeness of their sample. Climate was still a significant correlate of diversity.

      ○ first, with independent variable = effective species as calculated for a given population of 37,000 trees ("effective species for a standardized population size");

      ○ second, independent variable = the asymptotic estimate of the effective species number for that city as calculated using iNext;

      ○ third, the raw effective species number;

      ○ fourth, excluding cities with fewer than 10,000 trees;

      ○ fifth, excluding cities with <50% spatial coverage;

      ○ sixth, excluding cities with <0.995 sample coverage as calculated by iNext.

      ○ For the fourth, fifth, and sixth models, the independent variable was effective species for a standardized population size of 37,000 trees.

      Third, we redid our comparisons of tree populations in parks versus those in urban areas. Parks were still more diverse than urban areas.

      ○ Specifically, we used iNext to calculate diversity metrics based on the smaller of the two population sizes (park vs urban) to enable fair comparison for each city.

      ○ We reported comparison results for (i) raw effective species number, (ii) asymptotic estimate, and (iii) estimate for a given population.

      ○ In doing so, we eliminated Milwaukee from the comparison (it had only 28 trees recorded as being in an urban setting).

      Fourth, we redid our pairwise comparisons of tree community composition between cities in order to account for different population sizes and sampling efforts. To do so, we randomly subsampled the larger city to make its population equal to the smaller city, calculated comparison metrics, and repeated this process 50 times. We report the average comparison metrics.

      Our new Methods text is copied here for your convenience:

      ○ “Throughout our analyses, it was necessary to control for different sample sizes (and different, but unknown, sampling efforts across cities). To do so, we relied on the rarefaction / extrapolation methods developed by Chao and colleagues (Chao et al., 2015, 2014; Chao & Jost, 2012) and implemented through the R software package iNext (Hsieh et al., 2016). In short, these methods use statistical rarefaction and/or extrapolation to generate comparable estimates of diversity across populations with different sampling efforts or population sizes, alongside confidence intervals for these diversity estimates. iNext performs these tasks for Hill numbers of orders q = 0, 1, and 2. We used two techniques in iNext to allow for comparisons across cities (and between parks and urban areas within cities). First, we generated asymptotic diversity estimates for each; second, we generated diversity estimates for a given standardized population size. For our diversity analyses, the standardized population size we used was 37,000 trees (the rounded median of all cities). For analyses of the diversity of native trees, we used a standardized population size of 10,000 trees. For comparisons of the diversity between park and urban areas in a city, we used the smaller of the two population sizes (park or urban). In all cases we also recorded confidence estimates, and plotted rarefaction/extrapolation curves.

      ○ To control for variation in how uniformly trees were sampled across a city’s geographic range, we developed a procedure to score each city’s spatial coverage (see section Spatial Structure below).

      ○ We identified the best-fitting model, and then repeated our analysis under six sensitivity conditions to control for differences in population size, sampling effort, spatial coverage, and sample coverage. Our sensitivity analyses were as follows: first, with independent variable = effective species as calculated for a given population of 37,000 trees ("effective species for a standardized population size"); second, independent variable = the asymptotic estimate of the effective species number for that city as calculated using iNext; third, the raw effective species number; fourth, excluding cities with fewer than 10,000 trees; fifth, excluding cities with <50% spatial coverage; sixth, excluding cities with <0.995 sample coverage as calculated by iNext. For the fourth, fifth, and sixth models, the independent variable was effective species for a standardized population size of 37,000 trees.”

      Reviewer #3 (Public Review):

      This paper's strength is in the utility of the assembled datasets and some interesting and creative proof of concept analyses. This is an amazing resource for comparative analysis. However the paper felt a little sparse in the conceptual and methodological underpinnings of the questions asked to demonstrate the utility of the analysis. Specifically, I suggest:

      A) More substance in the introduction (currently only two short paragraphs) and a clear statement of research questions.

      We have added text to frame our goals and hypotheses:

      ○ “In particular, we wanted to know whether local climatic conditions are associated with the species diversity of city tree communities, how species diversity was distributed in space within cities, and whether introduced tree species contribute to biotic homogenization among urban ecosystems.”

      B) Add data on the extent to which each dataset represents a complete sample of each city's trees. I know are complete inventories, but some consist of 720 trees and cannot be a complete sample. A column in the meta data indicating effort and if there were any bias in where sampling occurred if the dataset is not complete are needed for others to use this data appropriately. For example, we know tree cover/diversity increases with wealth (which the author rightly cites). Let's say in City X, trees were only inventoried in one wealthy neighborhood. They would not be a representative sample of the city and dataset users need to be aware of this before they draw incorrect conclusions about City X where the sample was biased compared to city Y where the inventory was complete, including a sampling of all affluent and poor areas. This is also needed to support the research questions throughout the paper.

      We completely agree, and have made two major changes in response.

      First, we redid all of our diversity analyses after applying Chao’s rarefaction and extrapolation methods to permit comparison between populations of different sizes and sampling efforts. We added new columns to our datasheet with sample coverage estimates, asymptotic estimates of diversity, and diversity estimates for a given population size.

      Second, we also examined spatial coverage in a city because of the valid concern you raised that trees may only be sampled from particular neighborhoods or areas. In short, we divided each city into grid cells, counted trees per grid cell, and calculated metrics of coverage (adjusted number of trees per grid cell, and proportion grid cells that were empty) and bias (skew, kurtosis of number trees in occupied grid cells). These factors are presented in Spatial_Coverage_Supplement.zip. AS you can see even just from a glance at the spatial coverage plots, some cities are indeed extremely biased! Therefore, we ran a sensitivity analysis where we excluded cities with <50% spatial coverage.

      C) The authors chose to use effective species counts as their alpha diversity metric of choice. They explain why: "effective species counts (a measure that allows comparison between cities of different sizes)" (Ln 109). While effective species number is an excellent metric with much better behavior and attributes in linear modeling, I believe it is still strongly dependent on both city area and the number of individual trees sampled and so the above statement and all of the comparisons that flow out of it in the manuscript are currently unsupported. Just as species richness needs to be rarified or extrapolated to be compared at an equivalent # of individuals or area to be accurate so too does EFN (effective species count). Fortunately there is an R package (iNext) based on Chao's method (citation below) that makes it very easy to create effective species accumulation curves for each city by tree individuals sampled.

      a. Chao, Anne, Nicholas J. Gotelli, T. C. Hsieh, Elizabeth L. Sander, K. H. Ma, Robert K. Colwell, and Aaron M. Ellison. 2014. "Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies." Ecological Monographs 84 (1): 45-67. https://doi.org/https://doi.org/10.1890/13-0133.1.

      b. The standardization (rarefaction/extrapolation) of EFN or richness for # individual trees sampled needs to be made for all analyses that make claims to compare diversity metrics across cities or between groups like urban and park areas (i.e. Fig 2a,b,c; Fig 3b; Fig 5a,b, S1a, S2a, S5, Table S2)

      c. If the authors have an argument for why diversity/area or diversity/sampling effort relationships do not apply for a particular question, then they should make that case instead.

      We very much appreciate this suggestion. Indeed, as described above, we applied Chao’s method to all of our analyses.

      D) The question posed by the Beta diversity analysis is fascinating (i.e. is it non-native species that are driving biotic homogenization across species. However, while frequency (which I assume is relative abundance but maybe it is incidence data- please define) is used to deal with different sample sizes consider whether it makes sense to include incomplete, or very small city datasets in the analysis even with frequency data. For example one city only has ~720 trees listed. If this is an incomplete dataset which seems likely, it will probably be much more differentiated (overlap less) from another city with small numbers simply due to incomplete sampling. Diversity analysis in cities always requires tradeoffs and cannot be identical to methods used in "natural" forested ecosystems, but I encourage the authors to explore this a bit. Perhaps a sensitivity analysis could help where incomplete or small sample sizes are dropped or datasets are resampled via random draw to equalize sizes? The latter would handle incomplete samples but would not deal with bias in which neighborhoods were sampled (see point B above).

      Great suggestion. We redid this analysis using a random drawn approach, as you suggested, to equalize sizes. The new analysis found the same results as our old analysis, with slightly different values. The new method is described here:

      ○ “How similar are species compositions across cities? For N = 1953 city-city comparisons of street tree communities, we could calculate weighted measures of similarity because we had frequency data. We calculated similarity scores for the entire tree population, the naturally-occurring trees only, and the introduced trees only. We used chi-square distance metrics on species frequency data, and we controlled for different population sizes (and potentially, sampling efforts) between cities by sub-sampling the larger city 50 times to match the smaller city’s tree population size and calculating average metrics. In this manner we controlled for differences in sample size.”

      E) Additional context/conceptual underpinning the clustering analysis would be great.

      a. The authors state in Line 390-395:"For city trees, which are often organized along grids or the underlying street layout of a city, this method can more meaningfully cluster trees than merely calculating the meters between trees and identifying nearest neighbors (which may be close as the crow flies but separated from each other by tall buildings)."- I very much agree with this sentiment and it is biologically meaningful for animal and plant dispersal, but as written it is unclear to me how the method described in the text "knows" that a tall building or elevation or some sort of feature exists to separate clusters rather than empty space or a ball field. Please clarify.

      We appreciate these comments, and we have added text and references for the interested reader. Here is the new description in full:

      ○ “We wanted to quantify the degree to which trees were spatially clustered by species within a city (rather than randomly arranged). To do so, we first clustered all trees within each city using hierarchical density based spatial clustering through the hdbscan library in Python (McInnes et al., 2017). HDBSCAN, unlike typical methods such as “k nearest neighbors”, takes into account the underlying spatial structure of the dataset and allows the user to modify parameters in order to find biologically meaningful clusters. For city trees, which are often organized along grids or the underlying street layout of a city, this method can more meaningfully cluster trees than merely calculating the meters between trees and identifying nearest neighbors (which may be close as the crow flies but separated from each other by tall buildings). In particular, using the Manhattan metric rather than Euclidean metrics improves clustering analysis in cities (which tend to be organized along city blocks). For further discussion of why hbdscan is preferable to other clustering metrics, see (Berba, 2020; Leland McInnes et al., 2016; McInnes et al., 2017).”

      b. Would you ever expect composition to be truly random either in a city or a natural forest given environmental conditions etc.? In some sense, the ones closest to random are the most surprising. Can you dive into one to give an example of what is going on in that city?

      c. It seems like there are two metrics here- the size of the cluster and then the observed/expected EFN per cluster. The latter is analyzed in this paper but is there any important information in the former? It seems like an interesting structural measurement of the city and possibly useful in its own right.

      d. Are there any target levels of randomness? Could the authors suggest how this might be determined moving forward with their datasets to illustrate this for foresters?

      Great points. We have given a lot of thought to your comments– these are large and interesting questions!! In the end, I think these questions fall mostly beyond the scope of this study, but we added a substantial amount of text to address your comments:

      ○ “Clustering by species is not necessarily a negative, nor indeed should we necessarily expect trees to be randomly arranged (see suggestions for further research in “Future Analyses” section below). Here, we take a first step toward making spatial clustering a metric of interest in city tree planning.”

      ○ “Researchers could also use this dataset to perform more refined analysis of clustering. For example, what is the biological significance of variation in cluster size (as determined by the hdbscan clustering algorithms)? The size and arrangement of the clusters themselves may be useful metrics. How clustered should we expect trees to be in both wild and urban settings? That is, what our are null expectations? Further, researchers could apply network theory to predict how pest species would proliferate through each of these cities (depending on the spatial arrangement of pest-sensitive trees).”

      F) The statement that this dataset enables "the design of rich heterogenous ecosystems built around urban forests" (Ln 72) seems strange. To my mind this tool will enable a more nuanced evaluation of the urban forests that already exist and suggest ways to target future plantings for increased resilience to climate, pest resistance, biodiversity support etc. I don't understand what ecosystem you would build around and not in the urban forest. If this is what is meant please elaborate. For example, do you mean non-tree installations?

      We agree with you and have changed the text as follows:

      ○ “With these tools, we may evaluate existing city tree communities with more nuance and design future plantings to maximize resistance to pests and climate change. We depend on city trees.”

    1. Author Response

      Reviewer #2 (Public Review):

      According to the authors, the goal is to identify a method to study changes in hospital presentation and outcomes of new COVID-19 variants using publicly available population-level data on variant relative frequency to infer SARS-CoV variants likely responsible for clinical cases. This would assist in answering questions asked by public health authorities as to differences in disease severity and risk factors and vaccine protection.

      Authors use patients' data collected prospectively in 30 countries in their pre-Omicron period (Omicron variant is less than 10% of SARS-CoV2 variants) to the Omicron period (Omicron variant prevalence is >90% of circulating variants). The following factors are analyzed and adjusted for: age/gender, symptoms, comorbidities, vaccination, and outcomes during pre and Omicron periods.

      Their model shows that overall, patients were younger, had less symptoms and that the mortality rate was lower in the Omicron period (even if it doesn't reflect in some country reports). No conclusion can be made on vaccination status.

      Major weaknesses and strengths:

      1) The study is presented as a multi-center international study that includes more than 100,000 patients from 30 countries, however, 96.6% of the study patients originated from 2 countries, South Africa (54%) and the United Kingdom (42.6%) (and the relative contribution of South Africa to the study data was hugely different in the 2 study periods, pre-Omicron and Omicron period).

      The huge imbalance in the number of patients recruited by center could create many bias in data interpretation. For example, some countries do not report any increase in patients aged less than 12 years old in the omicron period. Country specific medians suggest that the younger age of patients after the Omicron variant experience in the combined dataset is at least partially explained by an increase of data contributed by South Africa, relative to the proportion of data contributed by other countries. In total only 11 countries contributed data on more than 100 hospitalized cases.

      The differences in study data contribution between countries, with more than 90% of all records being from the United Kingdom and South Africa, required both an adapted analytical approach, that transparently presented country-level data rather than only aggregated estimates, and careful discussion of our findings. Indeed, we agree with the reviewer that this imbalance in country-level data contribution and the varying contribution of some countries to the two study periods could lead to erroneous inferences if ignored (i.e. if only aggregated results were reported); for this reason, we presented country-specific data in the Results section. In our descriptive analyses, to achieve this goal without jeopardising intelligibility, we present findings for a subset of countries, those with at least 50 observations per study period; note that this criterion was modified based on another comment from this reviewer. This approach also addresses the reviewer’s concern, which we share, that the varying relative contribution of different countries to study periods could lead to spurious aggregated patterns. In fact, we highlight this problem in the following paragraph of the Results section:

      “The median (IQR) ages of patients during the pre-Omicron and Omicron periods were 62 (43 – 76) and 50 (30 – 72) years, respectively; however, country-specific medians suggest that the younger age of patients after Omicron variant emergence in the combined dataset is at least partially explained by an increase in the proportion of data contributed by South Africa, relative to the proportion of data contributed by other countries (Table S6).”

      Recruitment of patients is unclear. We don't really know which patients are selected to be part of the study. The authors mention the use of the ISARIC (International Severe Acute Respiratory and Emerging Infections Consortium) COVID-19 database (l. 173). This would imply that patients with severe respiratory symptomatic COVID-19 are recruited in the study. It could explain why patients recruited from Brazil or the Netherlands have the same proportion of patients presenting with shortness of breath in the pre- and Omicron period.

      Due to the time-sensitivity and scale of this work, involving hundreds of investigators in 30 countries, although the study only included hospitalised patients with SARS-CoV-2 infection, the approach used for patient recruitment in each institution was defined by local investigators. Whilst the sampling strategy was not uniform across sites, one should keep in mind that: (i) recommendations on sampling strategy were shared with local investigators; and (ii) most of the partner institutions involved in this work had previously contributed data to the ISARIC platform and are experienced in patient recruitment and clinical and epidemiological research.

      More generally, recruitment approaches could influence the interpretation of our findings in two ways: by reducing the representativeness of the study population in each country; and by inducing bias that could affect the association of interest (the association between study period and fatality risk). Regarding the former, it is possible that in some countries hospitals contributing to this effort admitted patients with more severe disease compared to the local population of COVID-19 hospitalised patients, the target population. Regarding the second potential problem, bias, hospital-based studies might suffer from collider bias, where both the exposure of interest and the outcome directly influence recruitment (selection) to the study or are associated with selection or recruitment through confounders; this is a well-described problem in hospital-based studies that assess COVID-19 outcomes (see Griffith et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nature Communications 2020. for a discussion on how different COVID-19 clinical factors can induce bias when different sampling frames are used). Note that collider bias is not the only mechanism of selection bias affecting effect measures; as explained by Miguel Hernán (in Invited Commentary: Selection Bias Without Colliders. American Journal of Epidemiology 2017) between-exposure stratum heterogeneity in the association between the outcome and selection could bias the association between the exposure and the outcome (relative to the effect measure in the target population). However, recruitment approaches used by partner institutions are unlikely to have systematically changed during the study period, and we are unaware of evidence suggesting any association that might have existed between recruitment procedure and outcome differed in the two study periods for most, or indeed some, partner institutions.

      We have now modified the Discussion section to highlight this potential weakness of our study:

      “Another weakness of our study is that recruitment procedure was not standardised and was defined locally. Whilst this likely affected the generalisability of our descriptive estimates (fatality risk and frequencies of symptoms and comorbidities) to local populations of hospitalised COVID-19 cases (Lash and Rothman, Selection Bias and Generalizability. in Modern Epidemiology 4th Edition 2021; Rothman et al. Why representativeness should be avoided. International Journal of Epidemiology 2013), it might not have affected the association between study period and fatality risk, at least not beyond the well-described potential for collider bias in hospital-based studies on COVID-19 outcomes (Griffith et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nature Communications 2020).”

      In Nepal, patients were more often recruited from critical care setting (l.572).

      However, the authors mention elsewhere that patients recruited for the study were:

      • Omicron variant infections in hospitalised patients (I. 161),

      • Patients with confirmed or suspected COVID-19 (l.183),

      • "some patients were admitted for a medical condition other than covid19 but tested incidentally during hospitalization (l.243)"

      • In some countries, information on whether covid-19 was the main reason for hospitalization was also collected. 69.0% of patients admitted during the omicron periods were admitted due to covid-19, patients for whom this information was available were primarily from South Africa (94.9%), (L.310)

      • For 35.5% of patients admitted to hospital date of symptoms onset was missing and it was assumed that these were not hospital acquired infections (l.233)

      • Information on whether covid-19 was the main reason for hospitalization was collected during the study period and suggest that for a non-negligible proportion of patients, others clinical conditions might have prompted hospitalization.

      • In their discussion the authors state that "Finally it is also possible that the question on the primary reason for hospitalization might have been interpreted differently in different countries and even in different hospitals in the same country." In the few clinical studies from United Kingdom and South Africa 40% to 70% of admissions were qualified as "incidental" COVID-19.

      This comment relates to the previous comment and to the sampling strategy used in the study. Please, see our response to the previous comment.

      Regarding incidental infections, we have now included information on recent studies (Klann et al. Distinguishing Admissions Specifically for COVID-19 From Incidental SARS-CoV-2 Admissions: National Retrospective Electronic Health Record Study. J Med Internet Res; Voor in ’t holt et al. Admissions to a large tertiary care hospital and Omicron BA.1 and BA.2 SARS-CoV-2 polymerase chain reaction positivity: primary, contributing, or incidental COVID-19. International Journal of Infectious Diseases 2022).

      “One possible explanation for this finding would be if incidental SARS-CoV-2 infections, i.e. infections that were not the primary reason for hospitalisation, were more frequent during the Omicron period; the high transmissibility of this variant, and the consequent peaks in numbers of infections, together with its reported association with lower severity, provides support for this hypothesis. However, in the subset of patients with data on the reason for hospitalisation there was no increase in the proportion of admissions thought to be incidental infections and indeed proportions in both study periods were consistent with frequencies of incidental infections in recent studies in the United States (Klann et al. Distinguishing Admissions Specifically for COVID-19 From Incidental SARS-CoV-2 Admissions: National Retrospective Electronic Health Record Study. J Med Internet Res) and the Netherlands (Voor in ’t holt et al. Admissions to a large tertiary care hospital and Omicron BA.1 and BA.2 SARS-CoV-2 polymerase chain reaction positivity: primary, contributing, or incidental COVID-19. International Journal of Infectious Diseases 2022), although in the latter, non-incidental infections included patients for whom COVID-19 was a contributing but not the main cause of hospitalisation.”

      Absence of data standardization.

      There doesn't seem to be standardized questionnaires across all countries. Some countries do not report on symptoms, others do not report on vaccination status. In total, it seems that less than a third of patients have full data (symptoms, co-morbidities, vaccination, and outcome), and such patients are reported by few countries.

      South Africa (that represents 54% of patients) didn't systematically report on symptoms. Hence data showed for symptoms might reflect in volume mainly the United Kingdom patients. In the United Kingdom vaccination rates during the omicron period was 70.3% as compared to 27.9% for South Africa. The authors find that patients with Omicron variant display less symptoms, (which confirms previous findings) however it could have been as plausible that patients from South Africa being less vaccinated exhibit more symptoms.

      Analysis for each group of data is based on different patients' group according to the data available for such group.

      Data from South Africa used in this analysis are part of the DATCOV national hospital surveillance database. The case report form (CRF) used by the National Institute for Communicable Diseases in South Africa was adapted from the ISARIC CRF; although most sections of that CRF were used for the data collection in the country, information on symptoms was not systematically collected. However, as mentioned above, in our analysis, we also report country-level frequencies of symptoms, rather than only presenting aggregated estimates. We agree with the reviewer that we cannot exclude the possibility that in South Africa a different pattern occurred. Based on this comment, we have now included the following statement in the Discussion section:

      “Finally, missing information on symptoms for patients from South Africa prevented our descriptive analysis of changes in clinical presentation in an African setting.”

      Vaccination data.

      Vaccination data are available for less than 50% of the patients and there is considerable inter-country variation in vaccination rates, as we know but also in the recruitment of patients for the study.

      As an example, Table 1 shows the vaccination status by country and study period for 24 countries: Brazil has a vaccination rate of 84.6% and India of 34.8% but on respectively 13 and 23 observations. There are less than 30 observations in 19 countries for pre omicron and less than 30 observations in 15 countries for the omicron period. No conclusion can be made.

      Our study was not designed to assess vaccine effectiveness against the Omicron and non-Omicron variants as controls (e.g. patients hospitalised with respiratory infection caused by pathogens other than SARS-CoV-2) were not recruited. Whilst we descriptively report the frequency of previous vaccination by country and age groups (see Figure S3 in the Supplementary Appendix, with numbers of records in each category presented for transparency), the primary objective in using vaccination data was to control confounding by this factor. The point made by the reviewer, that missing data on vaccination reduced sample size for this comparison, is valid and we have included the following statement in the Discussion section:

      “We also observed that history of COVID-19 vaccination was more frequent during the Omicron period, although for most countries the number of patients with vaccination information was limited, especially after stratification by age. Whilst this pattern would be expected if current vaccines were less effective against the Omicron variant compared to previously circulating variants, as suggested by a recent study in England analysing symptomatic disease, there were changes in vaccination coverage in many settings during the second half of 2021 and early 2022, including in response to the reports of Omicron variant cases. Since non-COVID-19 patients (e.g., patients with respiratory infections caused by other pathogens) were not systematically recruited for this multi-country study, it is not possible to estimate vaccine effectiveness during the two study periods and assess its change.”

      Major findings of the study:

      Major findings of the study match previous individual-based reports: 1-in many settings patients hospitalized with Omicron less often presented with commonly reported symptoms compared to patients infected with pre-omicron variants.

      2) In a mixed-effects logistic model on 14-day fatality risk that adjusted for sex, age categories and vaccination status hospitalization during the Omicron period were associated with lower risk of death. Similar results were obtained when using 28-days fatality risk and when excluding patients who reported being admitted to hospital due to a medical condition other than covid-19.

      3) History of COVID-19 vaccination was more frequent during the Omicron period, but the authors cannot make any conclusion on vaccine effectiveness

      How to interpret these data? The impact in terms of disease severity of new variants has been shown to be context specific due to regional differences in terms of variability of previous exposure, vaccinations rates and population comorbidity level frequency. As a result of recruitment bias and small recruitment in some countries, several countries have different findings described that do not fit with the conclusions.

      As mentioned by the authors, the strength of the project is to have succeeded in engaging so many countries to work together which could definitely assist in the future in understanding new variants characteristics shared globally and identify country specific impact on these variants according to the history of previous variant exposure, vaccine coverage, population morbidity and access to health.

      Reviewer #3 (Public Review):

      The authors combine outcomes data from patients hospitalised with COVID-19 across 30 countries to investigate differences in likelihood of death from the Omicron variant vs pre-Omicron variants. Data are from the ISARC COVID-19 database; variant status is inferred from country-specific GISAID data. The principal finding is a 36% reduced risk of 14-day death in the Omicron period (OR 0.64 (0.59 - 0.69)) compared with the pre-Omicron period, after multiple adjustment.

      The strengths of this paper are the large N and large number of participating countries from different regions, and also the careful and thorough analytical approaches. The main findings are stress-tested through a range of sensitivity analyses using different variant-dominance thresholds and statistical approaches and found to be robust. The figures are clear, well-chosen and easily interpretable.

      The principal weaknesses, as acknowledged in the discussion, are the imbalance in the data sources (96.6% of the observations came from GBR or SA), and the lack of fidelity of data on vaccination (vaccination status is limited to a binary 'one or more vaccinations received Y/N' variable). This latter means that conclusions about the innate severity of Omicron vs pre-Omicron variants cannot be drawn.

      Nonetheless the findings represent a useful contribution to the literature on the severity of COVID-19 variants, and the approach establishes a template for rapid international collaboration, using GISAID data to infer variant status, that will be useful for formulating policy in response to new variants in the future.

      The limited data on timing of vaccination and number of previous doses imply that residual confounding could partially explain the observed association; we mention this limitation in the Discussion section. Although our data alone cannot provide sufficient evidence for differences in innate severity between variants, mechanistic studies (see Shuai et al. Attenuated replication and pathogenicity of SARS-CoV-2 B.1.1.529 Omicron. Nature 2022, and Halfmann et al. SARS-CoV-2 Omicron virus causes attenuated disease in mice and hamsters. Nature 2022) suggest the Omicron variant might be less virulent. We modified the following paragraph in the Discussion section:

      “All these factors might have contributed to the observed association, possibly to different degrees in different countries, reason for which this result should not be assumed to necessarily relate to the differences in variant virulence previously suggested by mechanistic studies (Shuai et al. Attenuated replication and pathogenicity of SARS-CoV-2 B.1.1.529 Omicron. Nature 2022; Halfmann et al. SARS-CoV-2 Omicron virus causes attenuated disease in mice and hamsters. Nature 2022).”

  3. Aug 2022
    1. Author Response

      Reviewer #2 (Public Review):

      The time-dependency of the model simulations was not analyzed, and the nature of the observed biphasic time-dependent APAP response remains elusive. It would be interesting to see how the model can explain the time course of the APAP stimulation experiment.

      The alternative model at its current state can only describe steady state conditions. However, we understand that the reviewer is interested in the dynamic behavior of the model. However, our approach provides a proof of principle that the alternative model can phenomenologically explain the changes of YAP localization as a response to APAP treatment. The question of how to model Hippo pathway in a time-dependent manner as a response to APAP treatment is very challenging and would require further investigations and, most notably, further development of the PDE simulation algorithms and the SME software. Hence, a technical update of the software algorithms would be required, which cannot be in the scope of this manuscript.

      Nevertheless, we decided to share our first and preliminary analyses on dynamic processes caused by APAP with the reviewer. For this, we simulated the steady state model in an arbitrary manner, where APAP initiates (early time-point) and slows down (late time-points) YAP phosphorylation in the nucleus (see Figure below).

      The simulated alternative model shows that increased YAP phosphorylation about 50% leads to the cytoplasmic localization of YAP (Rebuttal Figure R5A/B). However, this shuttling is not detectable in our protein fractionation and live-cell imaging experiments (see also Rebuttal Figure R7C/D). At late time points, decreasing YAP phosphorylation (about 60%) led to a clear nuclear enrichment and dephosphorylation of YAP was observed in our experiments. Thus, our mathematical model nicely describes cellular events of Hippo pathway dynamics observed at later stages after APAP treatment (nuclear enrichment). However, early events cannot be completely explained (suggested nuclear YAP exclusion is not detectable).

      We suggest two explanations for this observation. First, other molecular mechanisms (not yet identified and therefore not part of the model topology) oppose the exclusion YAP enrichment that is expected at early time points. Second, detection methods used in this study (Western Blotting and life cell imaging) cannot capture minimal changes and cellular heterogeneity in the chosen experimental setup. We clarify this aspect/limitation of our study in the discussion chapter of the manuscript. Page 12, lines 436-440

      Time-dependency of YAP (orange) localization based on the simulated APAP treatment. (A): Simulated control (ctrl) and APAP treatment for 2 and 48h. The treatment was simulated by changing the phosphorylation coefficient of YAP in the nucleus. (B): Simulated pYAP/YAP ratio during control and APAP treatment for 2 and 48 hours at the steady state of the model. (C): Simulated NCR of the total YAP during control and APAP treatment for 2 and 48 hours at the steady state.

    1. Author Response

      Reviewer #1 (Public Review):

      This study is a follow-up to the previous work by the authors in establishing a surprising role for the presynaptic adhesion molecules, neurexin (Nrxn) variants containing the SS4+ splice site, in differentially controlling postsynaptic NMDA and AMPA receptors by forming links through a shared system of extracellular cerebellins (Cbln) and postsynaptic GluD1. Here the authors show at CA1 to subiculum synapses, that the role for Clbn2 in mediating the effects of Nrxn1-SS4+ and Nrxn3-SS4+ in enhancing NMDAR and suppressing AMPAR, respectively, is redundant with that of Clbn1. Moreover, Clbns do not appear to play a role in synapse formation. Dai and colleagues extend their previous work also by highlighting the common function for Nrxn-Clbn signaling system across different synapses albeit with subtle differences and point to a lack of a role for Nrxn-Clbn signaling in morphological synapse development. Overall the data are solid, while the key findings are mostly incremental, and the basis for the selectivity in the observed differential regulation of AMPARs and NMDARs via the same trans-synaptic link through Clbns at various types of synapses remain to be clarified. Importantly, the authors make a definitive conclusion concerning the lack of a role for Nrxn-Cbln signaling complexes in synapse formation during development. Nevertheless, this is a contentious issue, and as such, the conclusions could be more compellingly supported with further experiments.

      We appreciate the reviewer’s positive assessment of our study.

      Reviewer #2 (Public Review):

      In this manuscript Dai et al. investigated the role of Nrxn-Cbln complexes in regulating AMPA- and NMDA- receptor function in different brain regions. Using a combination of genetic manipulations, together with electrophysiological and biochemical assays, the authors showed that, at CA1-subiculum synapses, Cbln2 regulates NMDA- and AMPA- receptors via Nrxn1SS4+ -Cbln2 and Nrxn3SS4+-Cbln2 signaling complexes, respectively. In the prefrontal cortex, only Nrxn1SS4+-Cbln2 signaling-dependent regulation of NMDA receptors occurs, while in the cerebellum, only Nrxn3SS4+-Cbln1 signaling-dependent regulation of AMPA receptor occurs. This systematic investigation of the function of different Neurexin-Cerebellin signaling complexes contributes to our understanding of how different members of the same family, in combination pairs, regulate synaptic transmission with circuit specificity. This work adds to the authors' systemic investigation of molecular mechanisms regulating synaptogenesis, synaptic transmission and synaptic plasticity.

      We thank the reviewer for the positive and astute comments.

      Some suggestions for clarifications:

      1) Regarding expression of Cbln1 in the subiculum, in lines 271-273, the authors stated that "in these and earlier experiments we only studied Cbln2, but quantifications show that Cbln1 is also expressed in the subiculum, albeit at lower levels Figure S3)." However, Figure S3 does not include any quantifications, and the example image does not show visible Cbln1 expression. Thus, the above-mentioned statement is inconsistent with the data presented. Please revise. If the authors would like to keep the statement about quantifications of Cbln1, then quantification should be provided for all panels of this Figure, in order to give the readers some ideas about relative expression levels.

      We agree, and have addressed this issue as described above (introductory point 4).

      2) Does Cbln4, which is also broadly expressed in the brain, play a role in regulating AMPA- and NMDA-receptors at the synapses investigated? Does Cbln3 contribute to regulation of synaptic transmission in the cerebellum? Please discussion.

      Cbln4 is not expressed in the subiculum, but is expressed in the PFC. Specifically, Cbln1, Cbln2, and Cbln4 are broadly expressed in brain, whereas Cbln3 is restricted to cerebellar granule cells and requires Cbln1 or Cbln2 for secretion (Bao et al., 2006; Miura et al., 2006). Remarkably, Cbln1, Cbln2, and Cbln4 are not uniformly expressed in all neurons, but synthesized in restricted subsets of neurons (Seigneur and Südhof, 2017). For example, cerebellar granule cells express high levels of Cbln1 but only modest levels of Cbln2, excitatory entorhinal cortex (EC) neurons express predominantly Cbln4, and neurons in the medial habenula (mHb) express Cbln2 or Cbln4 (Seigneur and Südhof, 2017).

      Cbln4 is poorly studied, and Cbln3 has not been functionally studied at all. To the best of our knowledge, there are only four studies on Cbln4 function, three of which are from our lab. The Seigneur & Sudhof (2018) paper showed that the deletion of Cbln4 in a large number of brain regions caused no change in excitatory or inhibitory synapse numbers. Subsequently, the Seigneur et al. (2018) paper demonstrated that genetic deletion of Cbln4 in the mHb had no major effect on synapse numbers, but because of the limits of this preparation (synaptic transmission is hard to monitor in the mHB), no detailed synaptic studies were done. The Fossati et al. (2019) paper in Neuron shows that Cbln4 regulates inhibitory synapse numbers in the cortex by binding to GluD1, but this study depended on RNAi, not genetic manipulations. Its results are puzzling because structural biology studies have shown that Cbln4 does not bind to GluD2, which is highly homologous to GluD1 and has the same function as GluD1. Instead of binding to GluD’s, Cbln4 binds to another class of receptors, Neogenin-1 and DCC, making the Fossati et al. (2019) paper difficult to interpret. The Liakath-Ali et al. (2022) paper, finally, demonstrated that deletion of Cbln4 in the EC or deletion of Neo1 in the dentate gyrus (DG) blocks long-term potentiation at EC→DG synapses but does not change basal synaptic transmission or synapse numbers, again consistent with the notion that Cbln4 regulates synapse properties similar to Cbln1 and Cbln2.

      We have now described these studies in the introduction to the paper. Many synaptic proteins are associated with contentious studies in the literature, and we completely concur that it is essential to evenly discuss the issues in detail, even if this expands the size of a paper.

      Reviewer #3 (Public Review):

      In this study, Dai and colleagues used genetic models combined to electrophysiological recordings and behavior as well as immunostaining and immunoblotting to investigate the role of trans-synaptic complexes involving presynaptic neurexins and cerebellins in shaping the function of central synapses. The study extends previous findings from the same authors as well as other groups showing an important role of these complexes in regulating the function of central synapses. Here, the authors sought to achieve two main objectives: (1) investigating whether their previous findings obtained at mature CA1-> subiculum synapses (Aoto et al., 2013; Dai et al., Neuron 2019; Dai et al., Nature 2021) extend to different synapse subtypes in the subiculum as well as to other central synapses including cortical and cerebellar synapses and (2) investigating whether Nrx-Cbln-GluD trans-synaptic complexes play a role in synapse formation as previously proposed by other groups.

      Overall, the study provides interesting and solid electrophysiological data showing that different Nrxns and Cblns assemble trans-synaptic complexes that differently regulate AMPAR and NMDAmediated synaptic transmission across distinct synaptic circuits (most likely through binding to postsynaptic GluD receptors).

      We appreciate the reviewer’s accurate and positive assessment of our study.

      However, the study has several important weaknesses:

      1) The novelty of the findings appears limited. Indeed, previous studies from the same authors with similar experimental paradigms and readouts already demonstrated the role of Nrxn-CblnGluD complexes in regulating AMPARs versus NMDARs in mature neurons (Aoto et al., Cell 2013; Dai et al., Neuron 2019; Dai et al., Nature 2021). Moreover, the absence of role of Cblns and GluD receptors in synapse formation was already suggested in previous studies from the same authors (Seigneur and Sudhof, J Neurosci 2018; Seigneur et al., PNAS 2018; Dai et al., Nature 2021).

      Not surprisingly, we disagree with this comment. We do concur that our data are consistent with previous studies, but believe that this reproducibility is a strength since so many data in the literature are irreproducible.

      We do not agree, however, that our findings lack novelty. The novelty is admittedly limited, after all we like to be consistent, but our paper is the first to demonstrate that the Nrxn1/Cbln/GluD and Nrxn3/Cbln/GluD complexes are differentially active in different synapses, with the subiculum synapses having both, the mPFC synapses only the former, and the cerebellum only the latter. This is a very important innovation that illustrates the power of the Nrxn/Cbln/GluD signaling complex in shaping synapses. In addition, our paper is the first to analyze a possible developmental function of Cbln2 in depth, to analyze its differential role at the two dominant types of pyramidal neurons in the subiculum, regular- and burst-spiking neurons, to analyze conditional deletions of Cbln1 in the adult cerebellum, and to directly measure the effect of Cbln2 deletions in the PFC. Especially in view of the recent Nature paper that concluded that Cbln2 regulates spine numbers in the PFC, these findings are highly relevant.

      2) The conclusion made by the authors that the Nrxn-Cbln-GluD trans-synaptic complexes do not play a role in synapse formation/development is not sufficiently supported by their data, while previous studies suggest the opposite. Actually, this conclusion is essentially based on the two following measurements taken as a 'proxy' for synapse density: (1) 'the average vGluT1 intensity calculated from the entire area of subiculum' and (2) the 'synaptic proteins levels' assessed by immunoblotting. None of these measurements (only performed in the subiculum) allow to precisely assess synapse density on the neurons of interest. While the average vGluT1 intensity over large fields of view does not directly reflect the density of synapses and does not take into account the postsynaptic compartment, the immunoblotting data only reflects the overall expression of synaptic proteins without discriminating between intracellular, surface and synaptic pools and between cell types. In the subiculum from Cbln1+2 KO mice, the authors performed mEPSCs recordings and found an increase in frequency. However, this increase may reflect the unsilencing and/or potentiation of AMPAR-EPSCs above the detection threshold, irrespectively of the actual synapse number. Finally, the decrease in NMDAR-EPSCs is not discussed by the authors while it could actually reflect a decrease in synapse number.

      We agree that additional data on synapse numbers are helpful for our paper. We have now performed these studies as described in detail in our response to introductory point 1 above. However, we would also like to refer to the already existing body of evidence on the role of neurexin-based complexes in synapse numbers. We have shown in papers published over the last two decades that deletions of individual neurexins or of multiple neurexins, as well as blocking cerebellin binding to neurexins by ablating splicing site #4 (SS4) in neurexins, have NO effect on synapse numbers. The most important of these papers are:

      1. Missler, M., Zhang, W., Rohlmann, A., Kattenstroth, G., Hammer, R.E., Gottmann, K., and Südhof, T.C. (2003) α-Neurexins Couple Ca2+-Channels to Synaptic Vesicle Exocytosis. Nature 423, 939948.
      2. Kattenstroth, G., Tantalaki, E., Südhof, T.C., Gottmann, K., and Missler, M. (2004) Postsynaptic Nmethyl-D-aspartate receptor function requires α-neurexins. Proc. Natl. Acad. Sci. U.S.A. 101, 2607-2612.
      3. Dudanova, I., Tabuchi, K., Rohlmann, A., Südhof, T.C., and Missler, M. (2007) Deletion of α-Neurexins Does Not Cause a Major Impairment of Axonal Pathfinding or Synapse Formation. J. Comp. Neurol. 502, 261-274.
      4. Etherton, M.R., Blaiss, C., Powell, C.M., and Südhof, T.C. (2009) Mouse neurexin-1α deletion causes correlated electrophysiological and behavioral changes consistent with cognitive impairments. Proc. Natl. Acad. Sci. U.S.A. 106, 17998-18003.
      5. Soler-Llavina, G.J., Fuccillo, M.V., Ko, J., Südhof, T.C., and Malenka, R.C. (2011) The neurexin ligands, neuroligins and LRRTMs, perform convergent and divergent synaptic functions in vivo. Proc. Natl. Acad. Sci. U.S.A. 108, 16502-16509.
      6. Aoto, J., Martinelli, D.C., Malenka, R.C., Tabuchi, K., and Südhof, T.C. (2013) Presynaptic Neurexin-3 Alternative Splicing Trans-Synaptically Controls Postsynaptic AMPA-Receptor Trafficking. Cell 154, 75-88. PMCID: PMC3756801.
      7. Aoto, J., Földy, C., Ilcus, S.M., Tabuchi, K., and Südhof, T.C. (2015) Distinct circuit-dependent functions of presynaptic neurexin-3 at GABAergic and glutamatergic synapses. Nat Neurosci. 18, 997-1007.
      8. Anderson, G.R., Aoto, J., Tabuchi, K., Földy, F., Covy, J., Yee, A.X., Wu, D., Lee, S.-J., Chen, L., Malenka, R.C., Südhof, T.C. (2015) α-Neurexins Control Neural Circuit Dynamics by Regulating Endocannabinoid Signaling at Excitatory Synapses. Cell 162, 593-606. PMCID: PMC4709013
      9. Chen, L.Y., Jiang, M., Zhang, B., Gokce, O., and Südhof, T.C. (2017) Conditional Deletion of All Neurexins Defines Diversity of Essential Synaptic Organizer Functions for Neurexins. Neuron 94, 611-625. PMCID: PMC5501922
      10. Dai, J., Aoto, J., and Südhof, T.C. (2019) Alternative Splicing of Presynaptic Neurexins Differentially Controls Postsynaptic NMDA- and AMPA-Receptor Responses. Neuron 102, 993-1008. PMCID: PMC6554035
      11. Luo, F., Sclip, A., Jiang, M., and Südhof, T.C. (2020) Neurexins Cluster Ca2+ Channels within presynaptic Active Zone. EMBO J. 39, e103208. PMCID: PMC7110102
      12. Khajal, A.J., Sterky, F.H., Sclip, A., Schwenk, J., Brunger, A.T., Fakler, B., and Südhof, T.C. (2020) Deorphanizing FAM19A Proteins as Pan-Neurexin Ligands with an Unusual Biosynthetic Binding Mechanism. J. Cell Biol. 219, e202004164
      13. Luo, F., Sclip, A., and Südhof, T.C. (2021) Universal role of neurexins in regulating presynaptic GABAB-receptors. Nature Comm. 12, 2380. PMCID: PMC8062527
      14. Wang, C.Y., Trotter, J.H., Liakath-Ali, K., Lee, S.J., Liu, X., and Südhof, T.C. (2021) Molecular SelfAvoidance in Synaptic Neurexin Complexes. Science Advances 7, eabk1924. PMCID: PMC8682996
      15. Dai, J., Patzke, C., Liakath-Ali, K., Seigneur, E., and Südhof, T.C. (2021) GluD1, A signal transduction machine disguised as an ionotropic receptor. Nature 595, 261-265. PMCID: PMC8776294

      Individual papers may not convince the reviewer, but the cumulative evidence seems to us to be hopefully persuasive. We have published less evidence on the lack of a role of cerebellins and GluD’s in synapse numbers than on neurexins, but the only in-depth studies of these molecules by others are in the cerebellum. Here, deletions of Cbln1 and GluD2 indeed cause a significant, albeit partial, loss of synapses. However, this loss may not be due a lack of synapse formation, but to an elimination of synapses that have been formed, as demonstrated by many beautiful papers from leading investigators. It is regrettable that reviews and textbooks continue to state that cerebellins mediate synapse formation as an established fact because as far as we can see, there is limited evidence for that conclusion, but it keeps coming back again and again.

      3) The authors do not provide sufficient data in order to interpret the increase in AMPAR-EPSCs and decrease in NMDAR-EPSCs amplitudes. Are the changes in AMPARs and NMDARs occurring at pre-existing synapses or do they result from alterations in the number of physical synapses and/or active synapses (see point#2)? In particular, the increase in AMPAR/NMDAR ratio accompanied by the increase in mEPSCs frequency might be well explained by the unsilencing of some synapses and/or by the fact that the available pool of AMPARs is distributed over a smaller number of synapses, resulting in higher quantal size. These effects could explain the blockade of LTP, i.e., through an occlusion mechanism.

      We addressed these points in previous studies (Aoto et al., 2013; Dai et al., 2019 and 2021), as discussed and cited in the present paper, and expanded on these points in the present paper.

      In a nutshell, we showed by surface AMPAR staining that presynaptic Nrxn3-SS4+ decreases postsynaptic AMPAR levels, and by direct application of AMPA that it decreases the functional surface levels of AMPARs, whereas presynaptic Nrxn1-SS4+ increases the functional surface levels of NMDARs. We also demonstrated the opposite effects for the GluD1 KO, and furthermore showed by minimal stimulation experiments that the Cbln2 deletion does not alter the number of silent synapses. In the present manuscript, we performed a detailed analysis of the miniature quantal size for AMPAR- and NMDAREPSCs.

      Finally, we have demonstrated in a large number of papers, including this one, that genetic manipulations of neurexins, cerebellins, and GluD’s do not alter synapse numbers with a few exceptions in which synapses are secondarily eliminated, like in the cerebellum. Together, these data show that the observed changes are mediated by a regulation of postsynaptic functional AMPARs and NMDARs, not by alterations in synapse numbers or by synapse silencing/unsilencing.

      4) The authors did not demonstrate (or did not cite relevant studies) that the deletion of Cbln1 and/or Cbln2 does not affect the expression of the remaining Cblns isoforms (Cbln2 and/or Cbln4) or Nrxns1/3 and GluD1/2. This verification is important to preclude the emergence of any compensatory effect.

      To address this point, we have now measured the mRNA expression levels of Nrxns, Cblns, and GluDs in both the subiculum and the prefrontal cortex in littermate P35-42 Cbln2 WT and KO mice. The result show that the constitutive Cbln2 deletion causes no compensatory expression effects (new suppl Fig. S5). Please note that compensatory expression effects are often raised as a possibility for explaining genetically induced changes (or the lack thereof), but such effects are virtually never found.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors try to shed light on how plant stem cells located in a ring‐like structure in the (the procambial cells or cambium) can generate two distinct differentiated tissues, one filling the interior of the ring (the xylem) and the other one surrounding the ring (the phloem). To achieve this goal, the authors propose different models increasing in complexity, and perform a series of comparisons between the model outcomes and experimental data in the Arabidopsis hypocotyl. This work seems to provide for the first time a computational framework to model the radial formation of the cambium, xylem and phloem in the hypocotyl. Some of the features of the wild type and mutants could be qualitatively recapitulated, such as the radial organization of the xylem, cambium and phloem in wild type, and a striking phenotype upon the overexpression of CLE41 transgene.

      We thank the reviewer for appreciating the novelty of this work.

      Although this work is very well written and understandable at the introduction, when paying careful attention to the presented results, there are different aspects that would require further work and investigation, on both experimental and modelling sides: The authors chose to study different models increasing in complexity, reaching a more complete model (Model 3, Figure 5A‐D) that the authors claim it is recapitulating the experimental data and the explored experimental perturbations (Figure 5E‐F). This model is substantially more complex than Model 1 and Model 2, and it is difficult to understand all the claims by the authors, and the radial pattern formation capabilities of it. Yet, a feature that is clear to the eye, both in the pictures and in the movies, is that this model seems more likely to present a front instability of the cambium front progression, disrupting the radial organization of the different tissues (see Figure 5B), which does not seem to happen in the wild type hypocotyl from Arabidopsis. This effect is even more extreme when looking at the pxy mutant (Figure 5F) and when the xylem cell wall thickness is explored through the simulations (Figure 6). The authors claim this model is able to recapitulate a basic feature of the pxy mutant, which is the fact that the distal cambium appears in patches. Although these patches appear in the simulations, this effect in the model might be produced by the instability of the cambium front progression itself, which might be fundamentally different from what happens in the experimental data. In the experimental data, the PXYpro:CFP cambium does not seem to present such front instability, but rather is the xylem that gets fragmented. To make a link between the Model 3 and the pxy mutant, a careful study of the different stages of this phenotype could be useful to do, both on the modelling and experimental side.

      Thanks for this valuable comment and for appreciating our writing style. Front stability was not part of our considerations but provides certainly a very interesting aspect to our study. The reviewer is correct when noticing that the front of domains observed in planta is very stable but that this is not the case for our computational simulations. We believe that instability in the computational models is due to local noise in the cellular pattern leading to differential diffusion of chemicals* with respect to its radial position and to a progressive deviation of domain from a perfect circle. Such a deviation seems to be corrected by an unknown mechanism in planta but such a corrective mechanism is, due to the absence of a good idea of its nature, not implemented in our models. In order to investigate this point and the contribution of front instability to phenotypes of perturbed lines, we performed a time course analysis of anatomies of wt, IRX3pro:CLE41 and pxy lines with the help of the PXYpro:CFP/SMXL5pro:YFP markers, now shown in Fig. S1, and compared their dynamics to the respective movies 4A, 5A, and 6A. For pxy mutants, we observed ‘gaps’ in the cambium domain already at early stages of development (Fig. S1I, J) arguing against the fact that the pxy anatomy is caused by increased front instability but rather by differential signaling within a circular domain leading to a breakdown of cambium patterning and cell fate determination. Although a corrective mechanism ensuring front stability in planta is difficult to predict, we believe that our model now allows to test respective ideas like directional movement of chemicals or stabilizing communication between cells within a particular circular domain. This aspect is now discussed in the discussion.

      The authors have a parameter search strategy based on matching the proportion of cell types in Model 3. I am wondering how effective is this strategy in a system where these features are evolving in time, especially in Model 3, which seems to present a front instability. Moreover, this strategy does not tell anything about the model robustness for recapitulating the different features of the pattern.

      We thank the reviewer for pointing out these aspects regarding the parameter search. We agree that there are some limitations to estimating dynamic parameters based on the proportion of cell types. As a consequence, we have focused our parameter search on those parameters that directly impact tissue formation: cell division thresholds, cell differentiation thresholds and maximal cell sizes. We have further expanded our parameter search until we obtained five distinct parameter sets that recapitulate central features of cambium activity. This increases the likelihood that the behavior we saw in the subsequent analyses was actually a feature of the system and not a characteristic of that particular parameter set. This strategy did not solve the front instability of model 3, which suggests that there are factors at play ‐ beyond the CLE41‐PXY module and cell wall stability – which are currently beyond the scope of our model.

      In the last model, the authors try to link the cell wall thickness with the radiality of the divisions. Although the idea of looking at the division trajectories seems interesting, more clarity is needed to see how helpful is the radiality measure, and perhaps a better measure is needed ‐ note that the proliferation trajectory in Figure 6C might have the same amount of ramifications than in Figure 6B, and therefore, effectively speaking, the amount of periclinal divisions might be the same in both cases. The authors claim that the increase of xylem thickness contributes in having a more radial growth, but this could be related to the cambium front instability, which seems to be more pronounced as well for higher xylem thickness.

      We agree with the reviewer that this is a critical point as a robust measurement of ‘radiality’ of cell lineages is central for accessing the degree of pericliniality of cell divisions with the computational model. After extensively considering different measurement methods, we indeed think that calculating R2 of cell connectors is the most appropriate and quantitative one in the context of our computational model. In fact, the amount of ramifications is not considered by this method but the geometry of ‘cell connectors’ which clearly shows a more ‘radial’ pattern of cell lineages when xylem cells are ‘stiffer’ (Fig. 6D). Ramifications would be a measurement of the amount of cell divisions, which we did not want to target in this case. We also did not claim that increased xylem thickness leads to more radial growth. In fact, Fig. S4 shows that this is rather the opposite. We expect that increased front instability when ‘xylem stiffness’ is increased, would rather decrease radiality of cell* lineages and mask respective positive effects. The fact that we still see increased ‘radiality’ argues against the assumption that front instability is causative.

      On the experimental side, the claims about the proximal and distal cambium, together with the cell proliferation data are not very well supported with the presented data in Figures 2, 3A and S1. Moreover, these different figures seem to show different behaviors ‐ are these sections at different stages of the hypocotyl? Also, seeing more of the H4 marker in a region of the tissue not necessarily indicates a higher proliferation rate (it could also simply be that cells are more synchronized in the S phase in that region of the cambium, and/or the cell cycle lasts for longer in that part of the tissue). A quantification and the proper repeats to support these claims is lacking. A quantitative and more extensive study of the pxy mutant would enable a better comparison with the simulated model. Is there PXYpro:CFP expression between the fragmented xylem?

      We agree with these concerns toward the H4 marker used in the initial submission. Because H4 expression is not specifically associated with cell division but with DNA synthesis in general and, thus, with endoreduplication, H4 expression does not report faithfully on cell division. As a response, we removed related figures and now reference our previous study characterizing cell division levels in different cambium domains based on cell linage analyses (Shi et al., 2019). Because this is a far more reliable analysis and convincingly supports our claims, we believe that we thereby addressed this concern. As mentioned above, we also added a more extensive analysis of the pxy mutant (Fig. S1) showing that there is no PXY expression between the fragmented xylem domains.

      This work might help progress in the field of understanding radial patterning in plants. The introduction and the first models could attract a more general plant audience, but once the models increase in complexity, the narrative and presented results are more relevant to those scientists more specialized in xylem and phloem formation.

      We thank the reviewer for appreciating the general relevance of our models for a larger audience.

      Reviewer #2 (Public Review):

      The paper uses computer modeling and simulations to show how a radially growing circular plant organ, such as a hypocotyl, can develop and maintain its organization into tissues including, in particular, cambium, xylem and phloem. The results are illustrated with useful movies representing the simulations. The paper is organized as a sequence of models, which has some rationale ‐ it presumably depicts the path of refinements through which the authors arrived at the final model ‐ but the intermediate steps are of limited interest. At the same time, mathematical details of the models are not presented to the full extent. Fortunately, the models can be downloaded over the Internet, and the supplementary materials include detailed instructions for executing them (using the VirtualLeaf framework). Consequently, the paper and its results can potentially serve as a stepping stone for further model‐assisted studies of radial tissue organization and growth.

      Again, we thank the reviewer for appreciating the usefulness of our model and its general implications. In the revised version of the manuscript we substantially expanded explanations of the mathematical details in the main text and the supplemental methods. We still would argue that intermediate steps are of common interests as they illustrate why certain assumptions being extensively discussed within the field were refused providing important justifications for the final model.

    1. Author response

      Reviewer #3 (Public Review):

      Sensory preconditioning (SPC) refers to a conceptually important, higher-order form of Pavlovian conditioning. It involves two training phases and a final test. In the first, pre-conditioning training phase two 'neutral' stimuli are presented together (S1, S2). In the second training phase, one of them is paired with for example a punishment (S1+). In the final test conditioned response to the respective other stimulus is assessed (S2).

      The conclusion that sensory preconditioning does indeed occur requires showing that i) conditioned responding is observed for S2 but not for other, not pre-conditioned stimuli (S3); ii) that conditioned responding to S2 depends on the jointness of presentation of S1 and S2; iii) that conditioned responding to S2 depends on S1 indeed being paired with punishment. It is a strength of the current paper that these requirements are met and that this is the case both at the behavioural level and for a plausible stand-in at the physiological level.

      A weakness is that key data belonging together are not shown and analysed together.

      We have rearranged the data.

    1. Author Response

      Reviewer #1 (Public Review):

      Mikelov et al. investigated IgH repertoires of memory B cells, plasmablasts, and plasma cells from peripheral blood collected at three time-points over the course of a year. In order to obtain deep and unbiased repertoire sequences, authors adopted uniquely developed IgH repertoire profiling technology. Based on collected peripheral blood data, authors claim that:

      1) A high degree of clonal persistence in individual memory B cell subsets with inter-individual convergence in memory and ASCs.

      2) ASC clonotypes are transient over time and related to memory B cells.

      3) Reactivation of persisting memory B cells with new rounds of affinity maturation during proliferation and differentiation into ASCs.

      4) Both positive and negative selection contribute to persisting and reactivated lineages preserving the functionality and specificity of BCRs.

      The present study provides useful technical application for the analysis of longitudinal B cell repertoires, and bioinformatics and statistical data analysis are impressive. Regarding point 1), clonal persistence of memory B cells is already well known. On the other hand, inter-individual convergence between memory B cells and plasma cells might not be shown in healthy individuals even though the biological significance of circulating plasma cells is questionable.

      We thank the reviewer for careful analysis of our manuscript and are grateful for the positive view and all the criticism of our study.

      To the best of our knowledge the clonal persistence of memory B cells was previously studied mostly in the contexts of active immune response after natural challenge or after immunization. Here we used the full set of modern experimental and analytical repertoire sequencing approaches to characterize the connection and dynamics of memory and the two antibody-secreting B cell subpopulations during a long period in healthy donors, i.e. in donors without severe inflammatory diseases or who were not experienced intensive response against a natural antigen close to the sample collection time points. In other words, we carefully dissected the repertoire of peripheral blood antigen-experienced B cells in normal state. Thus we believe that our study brings a number of essentially new details to the overall picture of B cell immunity.

      By assessing the intra- and inter-individual repertoire overlaps we found high reproducibility of B cell memory clones between timepoints, which was just a little bit lower compared to the overlap between replicates. About 5% of largest clonotypes were identical (Fig. 2B left), while the V usage distribution changed more substantially over the time (Fig. 2A left), assuming the impact of non-persistent memory IGH clonotypes. Compared to the intra-individual reproducibility, the number of shared clonotypes between unrelated donors was extremely low, but still detectable, showing impact of convergent clonotypes in antigen-experienced B cells repertoire overlap of unrelated donors. Together, our findings show a high level of individuality of IGH repertoire of antigen-experienced B cells, while common challenges converge it to some extent at the level of most expanded clones, which are extremely stable (persistent) over the time. On the way from naive to the antigen-experienced B cells the germ-line encoded sequence of CDR1 and CDR2 make an impact, which is similar between individuals with similar genetic and environmental context. The latter further supports the previously reported findings on the role of germ-line encoded parts of IGH in the response against specific antigens (Collins et al. DOI: 10.1016/j.coisb.2020.10.011).

      Regarding 2), temporal stability of plasma cell clonotypes has been demonstrated already in the bone marrow with serial biopsies over time (Wu et al. DOI: 10.1038/ncomms13838). The Association of clonotypes between memory and plasma cells in the blood of healthy donors might be new, however, again its biological significance is questionable.

      Long-term stability of plasma cells was previously shown by a number of studies demonstrating presence of antigen-specific clones or even cells during months and years in human bone marrow and other sites, as well as in mice and primates (Wu et al. DOI: 10.1038/ncomms13838; Landsverk et al. DOI: 10.1084/jem.20161590; Manz et al. DOI: 10.1038/40540; Hammarlund et al. DOI: 10.1038/s41467-017-01901-w; Xu et al. DOI: 10.7554/eLife.59850; Davis et al. DOI: 10.1126/science.aaz8432). We agree that BM samples would add the additional layer to our investigation by describing the interconnection of the B cell memory pool with BM PCs. We also agree that the nature of circulating plasma cells is not fully clear at the moment and the relation of such cells/clones to BM PCs remains to be detailed. However, we cannot agree with the reviewer’s remark about the low (or absent) biological significance of the circulating ASCs. According to modern view, raising from large number of different studies conducted for previous several decades on mice, human and other organisms, the differentiation events in GC after antigen-priming lead to formation of cells switched to antibody-secreting program, and some part of them further reaches the bone marrow as site of residence. The bone marrow niches provide necessary signals required for further differentiation of newly migrated ASC cells to long-living or short-living plasma cells and their further survival in BM. However, the ASCs migrating to BM can be sampled from blood during their migration. The presence of an apoptotic-resistant subset of PCs expressing high-affinity Abs in circulation early after booster immunization in humans was previously shown (Inés González-García et al. DOI: 10.1182/blood-2007-08-108118). Similar in vitro survival ability for transcriptomically different blood ASC subsets was demonstrated by other authors (Garmilla et al. DOI: 10.1172/jci.insight.126732). Recent study, using artificial system modeling the BM niche in vitro, show that peripheral blood ASCs are able to differentiate to LLPC (Joyner et al. DOI: 10.26508/lsa.202101285). Besides, in a number of other studies it was also previously shown the increase of plasmablasts and plasma cells in PB during intensive immune response after primary or secondary immunization/natural challenge (Blink et al. DOI: 10.1084/jem.20042060; Odendahl et al. DOI: 10.1182/blood-2004-07-2507; Lee et al. DOI: 10.4049/jimmunol.1002932) or in active autoimmune condition (Szabo et al. DOI: 10.1111/cei.12703; Jacobi et al. DOI: 10.1002/art.10949). So, we considered ASC subsets in our work as a source of ASCs enriched in recently differentiated antibody-producers different in expression of CD138, which is the marker of LLPC in BM plasma cells and seemingly marks differently differentiated ASCs in circulation. Thus, these ASC subsets complement antigen-primed peripheral blood B cells playing an important role in ongoing immune response and influence to the plasma cells population in the BM. The connection on clonal lineage level between persisting memory B cells and the ASC subsets shown in our study, and findings recently published by Antonio Lanzavecchia’s lab (Phad et al. DOI: 10.1038/s41590-022-01230-1), support the idea that the circulating CD19-/lowCD20-CD27+CD138+/- B cells in PB represent the antibody-producing progeny of reactivated memory.

      Regarding 3) and 4), it is hard to generalize observations from the presented data because the analysis was based on just four donor cases with different health conditions, i.e. a combination of healthy and allergic. The cell number of plasmablasts and plasma cells isolated from peripheral blood is extremely low compared to memory B cells, and in fact, the vast majority of ASCs reside in the tissues such as lymphoid organs, bone marrow, and mucosal tissues rather than in circulating blood (Mandric et al. DOI: 10.1038/s41467-020-16857-7). As the most critical problem, direct pieces of evidence to claim points, 3) and 4) are missing.

      We fully agree that our study has a set of limitations and added more detailed discussion of them to the revised version (lines 582-600). We agree that our cohort group is not large, nevertheless our observations demonstrate reproducibility among different donors and hold statistical significance for detected differences. To justify our generalization of this cohort group, combined from healthy and allergic donors, we added more detailed analysis as a Supplementary Note, showing that within our study design we observe no difference between healthy and allergic donors both on the level of the clonal repertoire and the level of clonal lineages.

      The number of sampled plasmablasts and plasma cells compared to memory B cells in our study reflects the ratio between the subpopulations in the peripheral blood of middle aged donors and corresponds to the previous estimations published by the others. According to the fact that about 15% of the most abundant clonotypes on average were reproducible between parallel samples (replicates), the sampled numbers of PBL and PL allowed us to reach a relatively high reproducibility of the clone sampling at the level of cells. This as well as the diversity estimations point out that we sequenced the representative number of ASCs in peripheral blood to characterize their clonal repertoire and their connection with the B cell memory pool. Indeed the vast majority of plasma cells reside in different tissues, mostly in the bone marrow, but we believe that the ASCs in circulation represent the pool of newly generated and/or migrating between sites ASCs at different stages of differentiation. However, the further studies showing clonal relationship between memory B cells and ASCs in circulation and tissue-resident ASCs are still required to provide a more detailed view to this aspect.

      We agree that we cannot provide much direct evidence to support points 3) and 4), however we revealed a bunch of indirect ones, which are very consistent between each other supporting the points on memory reactivation and clonal selection claimed:

      1. From the biological sense, rapid increase of frequency of LBmem lineages and its’ perfect reproducibility between replicates (Supplementary Figure S7E), indicate increase in the number of the sampled cells, i.e. lineage expansion, occurred due to proliferation after antigen challenge or migration between tissues of residence due to some other signals. Predominance of ASC phenotype indicates their involvement in ongoing immune response.

      2. Large G-MRCA distance in LBmem lineages together with low inter-lineage genetic divergence points out on that the observed clonotypes of LBmem lineages diverged recently, originate from some mature clonotype and represent only a single clade of full lineage phylogeny.

      3. Most of LBmem lineages (47 out of 52) includes Bmem clonotypes, showing interconnection of LBmem cluster to Bmem subset. For 38 out of 52 LBmem lineages we detected Bmem clonotype at the time point prior to lineage expansion.

      4. Significant difference in SHM patterns between HBmem and LBmem lineages reflects difference in selection forces, affecting their evolution. In evolutionary genomics, it is rarely possible to study evolution directly, and most often changes in genetic sequences are the only type of data available. Therefore, we are inclined to trust the conclusions drawn from the use of tools designed for this type of problem. If negative selection is expected in the evolution of any protein, positive selection is much more tricky to detect. Thus the presence of its signs suggests new rounds of affinity maturation or presence of some mechanism, leading to reactivation of the best-fitted representatives of the lineage.

      In addition to the indirect evidence, we found direct and clear example of memory reactivation inside the clonal lineage (Fig. 4F). We added alignment of the CDR3 region of this lineage as Supplementary Figure S7 to confirm that both its’ HBmem - like and LBmem - like parts originate from the same recombination event.

      These findings lead to the conclusion that most of the LBmem lineages in analysis originated from some pre-existing memory. However we can not say for sure that in all the cases the memory is similar in properties to the persistent memory of HBmem cluster. The one exemplary clonal lineage shows that at least some of LBmem lineages represent re-activation of persistent HBmem lineages. The most recent study in the field published by Phad et al. (DOI: 10.1038/s41590-022-01230-1) have also demonstrated clonal relatedness of peripheral blood plasmablasts to the persistent memory. It should also be noted that in the present study we focused on the most expanded clones and clonal lineages, while the mechanisms determining the power of expansion are well not defined and thus the behavior of not so large clones can be different. To conclude, we believe that our findings can be generalized while probably representing only a part of the whole complex picture describing the behavior of B cell memory in normal state.

      Reviewer #2 (Public Review):

      The findings in this manuscript have been properly hypothesized and adequately demonstrated, and have some levels of practical guidance. The authors performed a detailed longitudinal analysis of a subset of immune-experienced B cells from donors without severe pathology. They selected a comprehensive analytical framework for BCR clonal lineage from these data and suggested interconnected B-cell clone-level subsets, B-cell memory fusion in donor-independent, and long-term persistent peripheral blood memory-enriched clonal lineages. Lastly, their evolutionary results analyzing the B-cell clonal lineage plus annotation suggest that activating B-cell subsets of preexisting memory-B cells is accompanied by the maturation of new rounds of affinity.

      We thank the Reviewer for careful analysis and positive view on our study.

    1. Author Response

      Reviewer #1 (Public Review):

      Overall, the science is sound and interesting, and the results are clearly presented. However, the paper falls in-between describing a novel method and studying biology. As a consequence, it is a bit difficult to grasp the general flow, central story and focus point. The study does uncover several interesting phenomena, but none are really studied in much detail and the novel biological insight is therefore a bit limited and lost in the abundance of observations. Several interesting novel interactions are uncovered, in particular for the SPS sensor and GAPDH paralogs, but these are not followed up on in much detail. The same can be said for the more general observations, eg the fact that different types of mutations (missense vs nonsense) in different types of genes (essential vs non-essential, housekeeping vs. stress-regulated...) cause different effects.

      This is not to say that the paper has no merit - far from it even. But, in its current form, it is a bit chaotic. Maybe there is simply too much in the paper? To me, it would already help if the authors would explicitly state that the paper is a "methods" paper that describes a novel technique for studying the effects of mutations on protein abundance, and then goes on to demonstrate the possibilities of the technology by giving a few examples of the phenomena that can be studied. The discussion section ends in this way, but it may be helpful if this was moved to the end of the introduction.

      We modified the manuscript as suggested.

      Reviewer #2 (Public Review):

      Schubert et al. describe a new pooled screening strategy that combines protein abundance measurements of 11 proteins determined via FACS with genome-wide mutagenesis of stop codons and missense mutations (achieved via a base editor) in yeast. The method allows to identify genetic perturbations that affect steady state protein levels (vs transcript abundance), and in this way define regulators of protein abundance. The authors find that perturbation of essential genes more often alters protein abundance than of nonessential genes and proteins with core cellular functions more often decrease in abundance in response to genetic perturbations than stress proteins. Genes whose knockouts affected the level of several of the 11 proteins were enriched in protein biosynthetic processes while genes whose knockouts affected specific proteins were enriched for functions in transcriptional regulation. The authors also leverage the dataset to confirm known and identify new regulatory relationships, such as a link between the SDS amino acid sensor and the stress response gene Yhb1 or between Ras/PKA signalling and GAPDH isoenzymes Tdh1, 2, and 3. In addition, the paper contains a section on benchmarking of the base editor in yeast, where it has not been used before.

      Strengths and weaknesses of the paper

      The authors establish the BE3 base editor as a screening tool in S. cerevisiae and very thoroughly benchmark its functionality for single edits and in different screening formats (fitness and FACS screening). This will be very beneficial for the yeast community.

      The strategy established here allows measuring the effect of genetic perturbations on protein abundances in highly complex libraries. This complements capabilities for measuring effects of genetic perturbations on transcript levels, which is important as for some proteins mRNA and protein levels do not correlate well. The ability to measure proteins directly therefore promises to close an important gap in determining all their regulatory inputs. The strategy is furthermore broadly applicable beyond the current study. All experimental procedures are very well described and plasmids and scripts are openly shared, maximizing utility for the community.

      There is a good balance between global analyses aimed at characterizing properties of the regulatory network and more detailed analyses of interesting new regulatory relationships. Some of the key conclusions are further supported by additional experimental evidence, which includes re-making specific mutations and confirming their effects on protein levels by mass spectrometry.

      The conclusions of the paper are mostly well supported, but I am missing some analyses on reproducibility and potential confounders and some of the data analysis steps should be clarified.

      The paper starts on the premise that measuring protein levels will identify regulators and regulatory principles that would not be found by measuring transcripts, but since the findings are not discussed in light of studies looking at mRNA levels it is unclear how the current study extends knowledge regarding the regulatory inputs of each protein.

      See response to Comment #10.

      Specific comments regarding data analysis, reproducibility, confounders

      1) The authors use the number of unique barcodes per guide RNA rather than barcode counts to determine fold-changes. For reliable fold changes the number of unique barcodes per gRNA should then ideally be in the 100s for each guide, is that the case? It would also be important to show the distribution of the number of barcodes per gRNA and their abundances determined from read counts. I could imagine that if the distribution of barcodes per gRNA or the abundance of these barcodes is highly skewed (particularly if there are many barcodes with only few reads) that could lead to spurious differences in unique barcode number between the high and low fluorescence pool. I imagine some skew is present as is normal in pooled library experiments. The fold-changes in the control pools could show whether spurious differences are a problem, but it is not clear to me if and how these controls are used in the protein screen.

      Because of the large number of screens performed in this study (11 proteins, with 8 replicates for each) we had to trade off sequencing depth and power against cell sorting time and sequencing cost, resulting in lower read and barcode numbers than what might be ideally aimed for. As described further in the response to Comment #5, we added a new figure to the manuscript that shows that the correlation of fold-changes between replicates is high (Figure 3–S1A). The second figure below shows that the correlation between the number of unique barcodes and the number of reads per gRNA is highly significant (p < 2.2e-16).

      2) I like the idea of using an additional barcode (plasmid barcode) to distinguish between different cells with the same gRNA - this would directly allow to assess variability and serve as a sort of replicate within replicate. However, this information is not leveraged in the analysis. It would be nice to see an analysis of how well the different plasmid barcodes tagging the same gRNA agree (for fitness and protein abundance), to show how reproducible and reliable the findings are.

      We agree with the reviewer that this would be nice to do in principle, but our sequencing depth for the sorted cell populations was not high enough to compare the same barcode across the low/unsorted/high samples. See also our response to Comment #5 for the replicate analyses.

      3) From Fig 1 and previous research on base editors it is clear that mutation outcomes are often heterogeneous for the same gRNA and comprise a substantial fraction of wild-type alleles, alleles where only part of the Cs in the target window or where Cs outside the target window are edited, and non C-to-T edits. How does this reflect on the variability of phenotypic measurements, given that any barcode represents a genetically heterogeneous population of cells rather than a specific genotype? This would be important information for anyone planning to use the base editor in future.

      We agree with the reviewer that the heterogeneity of editing outcomes is an important point to keep in mind when working with base editors. In genetic screens, like the ones described here, often the individual edit is less important, and the overall effects of the base editor are specific/localized enough to obtain insights into the effects of mutations in the area where the gRNA targets the genome. For example, in our test screens for Canavanine resistance and fitness effects, in which we used gRNAs predicted to introduce stop codons into the CAN1 gene and into essential genes, respectively, we see the expected loss-of-function effect for a majority of the gRNAs (canavanine screen: expected effect for 67% of all gRNAs introducing stop codons into CAN1; fitness screen: expected effect for 59% of all gRNAs introducing stop codons into essential genes) (Figure 2). In the canavanine screen, we also see that gRNAs predicted to introduce missense mutations at highly conserved residues are more likely to lead to a loss-of-function effect than gRNAs predicted to introduce missense mutations at less conserved residues, further highlighting the differentiated results that can be obtained with the base editor despite the heterogeneity in editing outcomes overall. We would certainly advise anyone to confirm by sequencing the base edits in individual mutants whenever a precise mutation is desired, as we did in this study when following up on selected findings with individual mutants.

      4) How common are additional mutations in the genome of these cells and could they confound the measured effects? I can think of several sources of additional mutations, such as off-target editing, edits outside the target window, or when 2 gRNA plasmids are present in the same cell (both target windows obtain edits). Could some of these events explain the discrepancy in phenotype for two gRNAs that should make the same mutation (Fig S4)? Even though BE3 has been described in mammalian cells, an off-target analysis would be desirable as there can be substantial differences in off-target behavior between cell types and organisms.

      Generally, we are not very concerned about random off-target activity of the base editor because we would not expect this to cause a consistent signal that would be picked up in our screen as a significant effect of a particular gRNA. Reproducible off-target editing with a specific gRNA at a site other than the intended target site would be problematic, though. We limited the chance of this happening by not using gRNAs that may target similar sequences to the intended target site in the genome. Specifically, we excluded gRNAs that have more than one target in the genome when the 12 nucleotides in the seed region (directly upstream of the PAM site) are considered (DiCarlo et al., Nucleic Acids Research, 2013).

      We do observe some off-target editing right outside the target window, but generally at much lower frequency than the on-target editing in the target window (Figure 1B and Figure 1–S2). Since for most of our analyses we grouped perturbations per gene, such off-target edits should not affect our findings. In addition, we validated key findings with independent experiments. For our study, we used the Base Editor v3 (Komor et al., Nature, 2016); more recently, additional base editors have been developed that show improved accuracy and efficiency, and we would recommend these base editors when starting a new study (see, e.g., Anzalone et al., Nature Biotechnology, 2020).

      We are not concerned about cases in which one cell gets two gRNAs, since the chance that the same two gRNAs end up in one cell repeatedly is low, and such events would therefore not result in a significant signal in our screens.

      We don’t think that off-target mutations can explain the discrepancy between pairs of gRNAs that should introduce the same mutation (Figure 3–S1. The effect of the two gRNAs is actually well-correlated, but, often, one of the two gRNAs doesn’t pass our significance cut-off or simply doesn’t edit efficiently (i.e., most discrepancies arise from false negatives rather than false positives). We may therefore miss the effects of some mutations, but we are unlikely to draw erroneous conclusions from significant signals.

      5) In the protein screen normalization uses the total unique barcode counts. Does this efficiently correct for differences from sequencing (rather than total read counts or other methods)? It would be nice to see some replicate plots for the analysis of the fitness as well as the protein screen to be able to judge that.

      We made a new figure that shows a replicate comparison for the protein screen (see below; in the manuscript it is Figure 3–S1A) and commented on it in the manuscript. For this analysis, the eight replicates for each protein were split into two groups of four replicates each and analyzed the same way as the eight replicates. The correlation between the two groups of replicates is highly significant (p < 2.2e-16). The second figure shows that the total number of reads and the total number of unique barcodes are well correlated.

      For the fitness screen, we used read counts rather than barcode counts for the analysis since read counts better reflect the dropout of cells due to reduced fitness. The figure below shows a replicate comparison for the fitness screen. For this analysis, the four replicates were split into two groups of two replicates each and analyzed the same way as the four replicates. The correlation between the two groups of replicates is highly significant (p < 2.2e-16).

      6) In the main text the authors mention very high agreement between gRNAs introducing the same mutation but this is only based on 20 or so gRNA pairs; for many more pairs that introduce the same mutation only one reaches significance, and the correlation in their effects is lower (Fig S4). It would be better to reflect this in the text directly rather than exclusively in the supplementary information.

      We clarified this in the manuscript main text: “For 78 of these gRNA pairs, at least one gRNA had a significant effect (FDR < 0.05) on at least one of the eleven proteins; their effects were highly correlated (Pearson’s R2 = 0.43, p < 2.2E-16) (Figure 3–S1B). For the 20 gRNA pairs for which both gRNAs had a significant effect, the correlation was even higher (Pearson’s R2 = 0.819, p = 8.8e-13) (Figure 3–S1C). These findings show that the significant gRNA effects that we identify have a low false positive rate, but they also suggest that many real gRNA effects are not detected in the screen due to limitations in statistical power.”

      7) When the different gRNAs for a targeted gene are combined, instead of using an averaged measure of their effects the authors use the largest fold-change. This seems not ideal to me as it is sensitive to outliers (experimental error or background mutations present in that strain).

      We agree that the method we used is more sensitive to outliers than averaging per gene. However, because many gRNAs have no effect either because they are not editing efficiently or because the edit doesn’t have a phenotypic consequence, an averaging method across all gRNAs targeting the same gene would be too conservative and not properly capture the effect of a perturbation of that gene.

      8) Phenotyping is performed directly after editing, when the base editor is still present in the cells and could still interact with target sites. I could imagine this could lead to reduced levels of the proteins targeted for mutagenesis as it could act like a CRISPRi transcriptional roadblock. Could this enhance some of the effects or alter them in case of some missense mutations?

      To reduce potential “CRISPRi-like” effects of the base editor on gene expression, we placed the base editor under a galactose-inducible promoter. For both the fitness and protein screens we grew the cultures in media without galactose for another 24 hours (fitness screen) or 8-9 hours (protein screens) before sampling. In the latter case, this recovery time corresponded to more than three cell divisions, after which we assume base editor levels to have strongly decreased, and therefore to no longer interfere with transcription. This is also supported by our ability to detect discordant effects of gRNAs targeting the same gene (e.g., the two mutations leading to loss-of-function and gain-of-function of RAS2), which would otherwise be overshadowed by a CRISPRi effect.

      9) I feel that the main text does not reflect the actual editing efficiency very well (the main numbers I noticed were 95% C to T conversion and 89% of these occurring in a specific window). More informative for interpreting the results would be to know what fraction of the alleles show an edit (vs wild-type) and how many show the 'complete' edit (as the authors assume 100% of the genotypes generated by a gRNA to be conversion of all Cs to Ts in the target window). It would be important to state in the main text how variable this is for different gRNAs and what the typical purity of editing outcomes is.

      We now show the editing efficiency and purity in a new figure (Figure 1B), and discuss it in the main text as follows: “We found that the target window and mutagenesis pattern are very similar to those described in human cells: 95% of edits are C-to-T transitions, and 89% of these occurred in a five-nucleotide window 13 to 17 base pairs upstream of the PAM sequence (Figure 1A; Figure 1–S2) (Komor et al., 2016). Editing efficiency was variable across the eight gRNAs and ranged from 4% to 64% if considering only cases where all Cs in the window are edited; percentages are higher if incomplete edits are considered, too (Figure 1B).”

      Comments regarding findings

      10) It would be nice to see a comparison of the results to the effects of ~1500 yeast gene knockouts on cellular transcriptomes (https://doi.org/10.1016/j.cell.2014.02.054). This would show where the current study extends established knowledge regarding the regulatory inputs of each protein and highlight the importance of directly measuring protein levels. This would be particularly interesting for proteins whose abundance cannot be predicted well from mRNA abundance.

      We agree with the reviewer that it would be very interesting to compare the effect of perturbations on mRNA vs protein levels. We have compared our protein-level data to mRNA-level data from Kemmeren and colleagues (Kemmeren et al., Cell 2014), and we find very good agreement between the effects of gene perturbations on mRNA and protein levels when considering only genes with q < 0.05 and Log2FC > 0.5 in both studies (Pearson’s R = 0.79, p < 5.3e-15).

      Gene perturbations with effects detected only on mRNA but not protein levels are enriched in genes with a role in “chromatin organization” (FDR = 0.01; as a background for the analysis, only the 1098 genes covered in both studies were considered). This suggests that perturbations of genes involved in chromatin organization tend to affect mRNA levels but are then buffered and do not lead to altered protein levels. There was no enrichment of functional annotations among gene perturbations with effects on protein levels but not mRNA levels.

      We did not include these results in the manuscript because there are some limitations to the conclusions that can be drawn from these comparisons, including that our study has a relatively high number of false negatives, and that the genes perturbed in the Kemmeren et al. study were selected to play a role in gene regulation, meaning that differences in mRNA-vs-protein effects of perturbations are limited to this function, and other gene functions cannot be assessed.

      11) The finding that genes that affect only one or two proteins are enriched for roles in transcriptional regulation could be a consequence of 'only' looking at 10 proteins rather than a globally valid conclusion. Particularly as the 10 proteins were selected for diverse functions that are subject to distinct regulatory cascades. ('only' because I appreciate this was a lot of work.)

      We agree with this, and we think it is clear in the abstract and the main text of the manuscript that here we studied 11 proteins. We made this point also more explicit in the discussion, so that it is clear for readers that the findings are based on the 11 proteins and may not extrapolate to the entire yeast proteome.

      Reviewer #3 (Public Review):

      This manuscript presents two main contributions. First, the authors modified a CRISPR base editing system for use in an important model organism: budding yeast. Second, they demonstrate the utility of this system by using it to conduct an extremely high throughput study the effects of mutation on protein abundance. This study confirms known protein regulatory relationships and detects several important new ones. It also reveals trends in the type of mutations that influence protein abundances. Overall, the findings are of high significance and the method appears to be extremely useful. I found the conclusions to be justified by the data.

      One potential weakness is that some of the methods are not described in main body of the paper, so the reader has to really dive into the methods section to understand particular aspects of the study, for example, how the fitness competition was conducted.

      We expanded the first section for better readability.

      Another potential weakness is the comparison of this study (of protein abundances) to previous studies (of transcript abundances) was a little cursory, and left some open questions. For example, is it remarkable that the mutations affecting protein abundance are predominantly in genes involved in translation rather than transcription, or is this an expected result of a study focusing on protein levels?

      We thank the reviewer for pointing out that this paragraph requires more explanation. We expanded it as follows: “Of these 29 genes, 21 (72%) have roles in protein translation—more specifically, in ribosome biogenesis and tRNA metabolism (FDR < 8.0e-4, Figure 5C). In contrast, perturbations that affect the abundance of only one or two of the eleven proteins mostly occur in genes with roles in transcription (e.g., GO:0006351, FDR < 1.3e-5). Protein biosynthesis entails both transcription and translation, and these results suggest that perturbations of translational machinery alter protein abundance broadly, while perturbations of transcriptional machinery can tune the abundance of individual proteins. Thus, genes with post-transcriptional functions are more likely to appear as hubs in protein regulatory networks, whereas genes with transcriptional functions are likely to show fewer connections.”

      Overall, the strengths of this study far outweigh these weaknesses. This manuscript represents a very large amount of work and demonstrates important new insights into protein regulatory networks.

    1. Author Response

      Reviewer #2 (Public Review):

      In this paper, the authors identify topological metrics in gene-regulatory networks that have the potential to predict the sub-types of phenotypic steady states that the network can lead to. The results hold great value for the field of Theoretical Systems Biology.

      The paper becomes too technical too quickly and assumes a lot of knowledge from the reader. Equations and theoretical concepts are not always well defined. In general, I would recommend connecting the results from the simulations/topology metrics to EMP biology earlier in the paper. Alternatively, rather than investigating 5 networks related to EMP, the generalization of the statements could become stronger if the authors explore the trends of the theoretical analysis in networks modeling other biological processes (such as SCLC).

      One of the main findings of the paper is that the distance between the matrix of correlation values between nodes in all steady states obtained from simulation and influence matrix indicates that the mean group strength is a good quantity to identify teams of nodes in the network. However, it remains unclear how to identify groups/teams in the networks based on influence: is it unsupervised (hierarchical?) clustering? How do the authors identify the number of teams of nodes in randomized?

      The authors also explore whether team structure correlates with the stability of relevant biological phenotypes. To characterize stability, they define static (e.g., frustration and stead state frequency) and dynamic network metrics (e.g., coherence and higher-order perturbations), and correlate them to the mean group strength in both WT and randomized networks. Results are promising: team structure and group mean strength show interesting correlative trends with both the static and dynamic metrics. However, everything relies on the mean group strength, which as mentioned before is not convincingly defined in randomized networks.

      Taken together, the conclusions of this paper would be better supported if a better explanation of team identification in gene-regulatory networks would be provided, and if networks related to other biological processes would be investigated.

      We thank the referee for their encouraging remarks and valuable suggestions about improving the manuscript. We are excited that the referee finds our results promising and of great value to the field of theoretical systems biology. Following the suggestions given here, we have now included further clarification on various aspects, included results for regulatory networks of melanoma and small cell lung cancer (SCLC, Fig 9, S11), and described in detail the algorithm used to identify teams in a given network (Methods)

    1. Author Response

      Reviewer #3 (Public Review):

      The manuscript by Barr et al., investigates the molecular phenotype, regulation by type 2 immunity, and function, of ectopic tuft cells that appear in the lungs of mice recovering from infection by the mouse-adapted PR8 strain of influenza A virus. They use reporter mice and either bulk or single cell RNA sequencing to reveal the molecular heterogeneity among tuft cells present in lungs of mice 43 days after PR8 infection. Lineage tracing using a Krt5-CreER driver line was used to demonstrate the basal cell origin of ectopic tuft cells and mice harboring homozygous null alleles for either Pou2f3, Trpm5, IL4Ra or IL25, were evaluated to determine roles for tuft cells and type 2 immunity in regulation of dysplastic epithelial remodeling. Their data confirm that ectopic tuft cells are derived from dysplastic Krt5-expressing cells that appear following PR8 infection, that pre-existing tuft cells play no role in basal cell dysplasia, and that ectopic tuft cells derived from dysplastic basal cells play no role in lung remodeling. Furthermore, they show that neither type 2 cytokines nor IL25, an upstream regulator of type 2 immune responses, play roles in regulating the pulmonary response to PR8 infection. Finally, they show that tuft cells are also induced in the lungs of bleomycin-injured mice and that the presence of tuft cells in alveolar regions of PR8-infected mice does not influence the inability of dysplastic basal cells to assume alveolar epithelial cell fates. The manuscript is well written and experiments were performed with rigorous experimental design and data of high quality. However, even though findings have potential importance and could be of interest, results seem preliminary and lack a strong rationale.

      Major concerns:

      1) Studies of tuft cells in the gut and their response to type 2 immunity, which were the basis for this line of investigation into ectopic tuft cells in the PR8-infected lung, have shown that tuft cells are part of a feed-forward loop leading to tuft cell expansion and enhanced type 2 immune responses including increased abundance of goblet cells. Since ectopic pulmonary tuft cells are derived from dysplastic basal cells after PR8 infection, rather than the reverse, this is clearly not the case in lungs of PR8 infected mice. Furthermore, since tuft cells are derived from hyperplastic basal cells in lungs of PR8-infected mice, it would seem unlikely that they impact the extent of basal cell hyperplasia.

      Ultimately the reviewer is correct in that the mechanisms at play in the post-flu lung promoting ectopic tuft cell expansion are clearly distinct from those in the small intestine. However, this was not a foregone conclusion, especially given that similar Type 2-dependent mechanisms clearly have a role in brush cell (now also termed tuft cell) expansion in the trachea. Regarding tuft cell influence on basal cell hyperplasia, we originally hypothesized that tuft cells differentiating from the migrating, proliferating basal cells may act in a feed-forward fashion to promote continued proliferation of the basal cells, akin to what happens upon tuft cell activation in the intestine. Nevertheless the Reviewer is correct in that our results show that basal cell hyperplasia is independent of tuft cell differentiation, and we feel this is valuable information for the field.

      2) Tuft cell expansion following parasitic infection of the gut and associated type 2 inflammation, and basal cell differentiation into tuft cells leading to their increased abundance following lung injury, are distinct processes and likely to be regulated through distinct mechanisms. As such, the rationale for investigating the roles of type 2 cytokines in the regulation of tuft cell appearance is rather weak. In the absence of data demonstrating how basal to tuft cell differentiation is regulated, this component of the study seems preliminary.

      Amplification of tuft cells in the small intestine (Gerbe et al., 2016; Howitt et al., 2016; von Moltke et al., 2016) and upper airways (Ualiyeva et al., 2021, Bankova et al., 2018) are either totally dependent on or highly influenced by Type 2 cytokines, respectively. Accordingly, it was critical to examine whether a similar mechanism was at play in the lung after influenza injury, i.e. promoting tuft cell amplification downstream of Type 2 cytokines. While our findings demonstrate that post-flu tuft cells arise largely independent of Th2 signals, new findings in other tissues published after submission of the current manuscript do indeed demonstrate Th2 / ILC2-indepdent functions of tuft cells (O’Leary et al., DOI: 10.1126/sciimmunol.abj1080). Our findings support the existence of novel mechanisms regulating tuft cell differentiation, and as the Reviewer suggests, we hope to uncover these mechanisms in future work.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors here follow-up on roles for signaling pathways like ERK in epithelial patterning that have been studied in an emerging literature in both, broadly, the cell competition field and, more specifically, in mouse intestinal organoids. They employ timelapse microscopy to study behavior of human colonic organoids in monolayers as the organoids initially self-organize. They then follow maintenance of organization into densely clustered nodes that have increased cells in cell cycle and the remaining more sparsely populated regions with fewer cycling cells. Nodes also show markers of in vivo colonic stem cells (Lgr5 and myc). They follow propagation of ERK waves using a genetic tool (ERK-JTR) and show that they can emerge from single apoptotic cells in between nodes.

      Strengths of the study include novelty of showing self-organization and behavior of human organoids over time, with good resolution, using microscopy, as well as sophisticated analysis techniques to interpret and present cumulative data over many experiments. Additionally, the paper adds important pieces of the puzzle with respect to how cells may compete and respond across an entire monolayer, and the tools and approaches lend themselves to studying many genes and signaling pathways besides simply Wnt vs ERK.

      Weaknesses in the current version of the manuscript:

      1) The manuscript is focused nearly exclusively on ERK and Wnt but not in terms of the broader context of interpretation of the response of a monolayer to apoptosis of single cells. Some of the original work in the field showed that apoptotic cells enacted Rho- and MLCK-dependent actomyosin contractility, which was proposed to signal neighboring cells by initially pulling them inwards via the contraction (PMIDS: 9456322, 10459006, 11283606, 21721944). But a more intestine-specific literature has long-been extant following up on the critical role of ROCK and MLCK in maintaining barrier after specifically intestinal-cell apoptosis (15825080, 21237166).

      -- A suggestion would be 1) to cite the relevant literature and 2) to interpret some of the experiments within the cytoskeletal mechanistic context already known. In addition to comments about PMA and ERK activation (see next point), the authors could test whether the ERK waves cause myosin II activation and/or are ROCK/MLCK-dependent. Given ROCK inhibition is frequently used in organoid culture, this would seem an obvious avenue to explore. Does the ERK wave propagate the cytoskeletal changes to close the gap and increase centrifugal motility and/or conversely does the actomyosin tugging of the apoptotic cell trigger ERK activation? (Admittedly, the latter question may be hard to address). In short, there is a lot known about monolayer behavior in terms of dynamic cytoskeletal changes that can be addressed here to integrate with the Wnt/ERK roles.

      We completely agree that contractility and the cytoskeleton play vital roles in this process. We have added a section on this in the discussion and cited the relevant literature you suggested. We have conducted an unbiased screen for Erk wave dynamics and have several novel hits related to the mechanical aspect of this process. We are currently validating these hits and feel it would be too preliminary to include here. We are preparing a separate study that will focus on the role of mechanical signaling during Erk wave propagation.

      2) The authors use only PMA as an ERK activator. PMA is a broadly acting drug, principally known as a PI3K inducer. Obviously, Akt and other downstream action of PI3K means many other pathways are stimulated besides ERK. Indeed, ROCK and Src and other cytoskeleton-modifying pathways are modulated by PMA that may not correlate with the ERK effects. Additionally, the movies showing the effects of PMA treatment show a striking increase in apoptotic cells throughout the field, which would obviously confound the interpretation of what happens after relatively rare, internodal apoptotic cells die

      -- A strong suggestion would be to increase the routes to ERK activation the authors use. This could be via receptor tyrosine kinase stimulation (again, like ROCK, EGF is a key organoid medium component), though obviously that would not be much more specific than PMA, but the authors use EGFr inhibition to block ERK, so wouldn’t stimulation be an apt converse approach? Genetic constitutively active KRAS might be introduced. Alternatively, there are pharmacological ways to increase pERK dramatically by inhibiting the dual action phosphatase (see eg PMID: 30475204 in a previous eLife paper). At the least, it would seem the authors should not use an approach that increases apoptosis dramatically.

      This is a great suggestion. We have added an additional figure describing a set of experiments that activate Erk through the expression of an oncogenic KRAS allele (G12V) under control of doxycycline. This resulted in increased uncoordinated Erk activity and loss of nodes. Further, we show that the Wnt inhibitor Pyrvinium also increased Erk activity in organoid monolayers and led to node loss. Consequently, we have tested three independent activators of Erk, all of which led to loss of the proliferative/stem cell niche.

      3) The movies clearly show many dividing cells that are between nodes, and they show apoptotic cells within nodes (eg movie 3a towards the end). While it's clear that apoptotic cells in internodal regions can elicit the wave behavior, it would seem that apoptosis does not universally do this, given the counter-examples.

      -- It would help if the authors could speak to this. Namely, in what cases are there no waves after apoptosis and what are the factors that might contribute (nearness to a node? nearness in time and space to another apoptotic cell?). Presumably, the events are relatively stochastic so there would be occasions for non-stereotypical behavior like wave front interference or augmentation in the case of closely located apoptotic cells.

      We agree. As shown in the movie 3a, there are occasional cell death events in the proliferative region of organoid monolayers. We observed that these cell death events did induce waves but were less frequent compared to non-proliferative regions as quantified in figure 3H. Cells within the proliferative compartment also contain elevated Wnt signaling as shown by Top-GFP signal in figure S6 and LRG5 staining in figure 2B. The margin of the proliferative compartment is also the region where Erk waves tend to die off. Our hypothesis is that Wnt largely suppresses apoptosis and Erk waves.

      Reviewer #2 (Public Review):

      The work by Pond, et al., uses patient derived organoid monolayers to interrogate MAPK signaling in real-time using an ERK reporter. This technology was developed previously to use a target domain of ERK that responds to phosphorylation by altering nuclear-cytoplasmic localization. The active ERK kinase can be inferred by cytoplasmic localization of the reporter. The premise of the paper is that this reporter can be used in human organoid cultures to understand ERK signaling dynamics. Figures 1 and 2 demonstrate the monolayer culture properties and how stem-like and differentiated domains for within the cultures, validated using RNA FISH for MYC, LGR5, and KRT20. Figure 3 describes how an ERK wave radiates out from an apoptotic cell in the cultures, and that the living cells migrate towards to dying cell, presumably to sustain a barrier. In figure 4, data is presented showing that PMA-mediated activation of ERK disrupts the patterning of the monolayers, dispersing the nodes of cells associated with stem/proliferative identity. Finally, in figure 5, the authors show that treating cultures with Wnt3a suppresses ERK activity, while inhibiting ERK may expand WNT/stem cells in the cultures.

      The study is interesting and the model system has a lot of potential.

      However, there are some concerns about the novelty. The reasons for this are:

      1 - the monolayer system has been demonstrated before, very nicely in a 2018 Dev. Cell paper from the Altschuler lab and one of the current manuscript authors.

      2 - ERK-KTR reporters have been used to demonstrate apoptosis induced signaling waves in the epithelium (Gagliardi, 2021, Dev. Cell.)

      3 - ERK activity suppressing stem cell fate has been documented previously (Riemer, 2015; Leach, 2021; Reischmann, 2020; Tong, 2017)

      So while there are exciting aspects of the work, including use of human tissues and live imaging of pathway dynamics, I feel that the novel discoveries using these technologies are somewhat limited.

      Point 1: We agree that the Thorne 2018 paper showed the feasibility of 2D enteroid monolayers using mouse small intestine, yet it was not obvious that this approach would translate to human organoid models. We have demonstrated that this approach can be used for patient derived organoids from human colon, which contributes greatly to the translational potential. Additionally, a major challenge with organoids is tracking cells in space and time in 3D culture condition. We have shown that these primary cultures can be combined with lentiviral live kinase reporters and are amenable to long term culture for the study of single cell dynamics of heterogenous organoid cultures without laborious 3D image analysis.

      Point 2 and 3: We agree KTRs are a well-known and useful tool for studying single cell kinase dynamics. In mammalian cell lines (Gagliardi 2021) and drosophila epithelium (Valon 2021), Erk waves driven by apoptosis were reported to prevent apoptosis in nearby cells and instruct movement to prevent barrier disruption. Here, we showed that Erk waves effect the patterning of the differentiated and stem cell compartments. Our work 1) establishes that Erk waves are found in human colonic epithelium, 2) that this effects the patterning of the differentiated and stem cell compartments and 3) Erk wave signaling is a fundamental part of human colonic epithelial homeostasis. The novelty of this report is connecting apoptosis-driven Erk dynamics to spatial partitioning of cell fates.

    1. Author Response

      Reviewer #1: “(Public Review):

      The main result of the paper is a statistical dependence between the evolved size control strategy and the structure of the cell cycle, in that size control that manifests early (later) in the cell cycle tends to give adder- (weakly sizer-) like strategies. Notably, even when the final evolved network shows weak adder or weak sizer-like behaviour, they find strong sizer-like control in the evolutionary transient. Finally, they constrain the evolutionary algorithm to sense cell size only through stochastic fluctuations of protein concentrations and uncover a strategy that exhibits hallmarks of self-organised criticality.

      The questions studied by the authors are both interesting and timely, and their results are intriguing and well documented. On the whole, the conclusions are convincingly argued, and the authors do an excellent job of extracting qualitative features from their evolved networks. However, the manuscript is a little difficult to read, with the figures being crowded and difficult to parse. In addition, while there is a lot of detail in some places (as in the description of one particular feedback control strategy), other results are less fleshed out (such as statistical summaries of the different simulations). The manuscript would benefit from a sharper presentation of the results.’

      We have done our best to tighten the writing and better focus on the main results of the paper. We have done this in response to the specific criticisms of the reviewers, however, most of the comments indicated that our manuscript was rather dense and so important points had been lost. Therefore, in the revision, we have mostly focused on increasing the clarity rather than condensing our prose further.

      A particularly interesting question addressed in the paper is why adders are more commonly found when sizers are believed to be better at controlling cell size. Here, the authors' simulations give two answers: first, that sizers tend to appear when cell size control is exerted later in the cycle (as in S. pombe). Second, that even when adders eventually evolve, the evolutionary transient passes through a strong sizer strategy. As the adder-vs-sizer question is repeatedly raised, it would strengthen the paper to have a longer and sharper discussion on (a) why early cell size control favours adders, and (b) why sizers appear as transients when fluctuations in cell size are large?’

      We now clarify these key points and extend our discussion. The question as to why sizers appear as transients when fluctuations in cell size are large is more complex. We see repeatedly that sloppy sizers evolve first. But, these sizers are not necessarily that good at giving a low CV. Then, as the system continues to evolve, adders appear that are better at reducing CV than the noisy sizers. This emphasizes that the contribution to reducing the CV comes from two parts, first the slope contribution defining the relationship between the amount of growth in the cell cycle and the cell size at birth, and second, the amount of noise in this process, i.e., how variable the result will be for two cells born the same size. The system proceeds from a noisy sizer to a less noisy adder while reducing the CV as selected for. Thus, we speculate that in the later stages of evolution, where the system has already significantly reduced cell size variability, the ability to more accurately perform size control with less noise reduces the selection pressure on the slope so that adders tend to emerge. To address the comment, we have extended our discussion as to why early cell size control favors adders. We have broken the penultimate paragraph in the discussion into two parts where we now write:

      “Our evolution simulations gave insight into factors that bias evolution towards sizer or adder type control mechanisms (Fig. 4). First, it is worth noting that our evolution simulations were not deterministic. There was no one-to-one correspondence between a given evolutionary pressure and any one specific cell size control mechanism. Rather, our claims represent an average behavior observed over the course of many simulations. It is first worth noting that size control, as measured by the CV at a particular point in the cell cycle, has contribution both from the slope of the correlation between cell size and the amount of cell growth and from the amount of noise characterizing the differences between cells that are initially the same size (Di Talia et al., 2007). It is therefore possible that a low noise adder can produce a lower CV than a higher noise sizer. This is reflected in the evolutionary paths of some of our simulations, which traverse from a noisy sizer to a less noisy adder (Fig. 5). However, we anticipate even noisy sizers will be better than adders at controlling cell size in response to large deviations away from the steady state distribution. This is because sizers will always return the cell size to be within the steady state distribution within a cell cycle.

      In the selection of a size controlling G1 network followed by a timer in S/G2/M, we observed a prevalence of adders that is consistent with the prevalence of adders reported in the literature. While fewer in number, sizers have also been observed. That the most accurate sizers have been observed in the fission yeast S. pombe (Fantes, 1977; Sveiczer et al., 1996; Wood & Nurse, 2015), and that this organism performs cell size control at G2/M rather than at G1/S led us to explore the effect of cell cycle structure on the evolution of cell size control. We found that controlling cell size later in the cycle in S/G2/M biases evolution away from adders and towards sizers. In retrospect, this result can be rationalized since any size deviations incurred earlier during the timer period can be compensated for by the end of the cycle with the sizer. However, when the order is inverted, any size deviations escaping a G1 control mechanism would only be amplified by exponential volume growth during the S/G2/M timer period. A second recent case exhibiting sizer control was found in mouse epidermal stem cells, which exhibit a greatly elongated G1 phase and a relatively short S/G2/M phase (Mesa et al., 2018; Xie & Skotheim, 2020). We found that if we increased the relative duration of G1 in our simulations by shortening the S/G2/M timer, we also see a bias towards sizer control. In essence, by extending G1 to a larger and larger fraction of the cell cycle the control system is gradually approaching a size control taking place at the end of the cell cycle, i.e., an S/G2/M size control. Taken together, these simulations suggest the principle that having size-dependent transitions later in the cell cycle selects for sizers, while having such transitions earlier selects for adders.”

      The final part of the paper, which describes a strategy based on sensing size through concentration fluctuations, is very interesting but brief, which is understandable given the quantity of results presented earlier in the paper. Nonetheless, it provides an excellent example of the power of the authors' approach.

      Overall, the results in this paper are a compelling addition to the recent interest in cell size control.’

      We thank the reviewer for their careful reading of our manuscript and their support.

      Reviewer #2 (Public Review):

      The use of evolutionary models to understand the emergence of cell size control is novel and interesting. One strength of the approach is that simulations do not impose any mechanistic model for cell size control, rather the feedback motif for size control emerges from optimisation of chosen fitness functions. This allows the authors to come up with various size control motifs for given evolutionary pressures and model rules. Interestingly, the authors find that there is no one-to-one correspondence between specific size control mechanisms and evolutionary pressures, rather size control mechanisms are dependent on cell cycle structures. The authors also evolve a size control model based on the sensing of protein concentration fluctuations. This model exhibits interesting features such as self-organized criticality and the existence of very large cells that achieve size homeostasis by undergoing rapid cell divisions. The authors' model, however, comes with many arbitrary choices and assumptions that need further justifications and theoretical results should be compared with experimental data to establish the applicability of the model.

      We thank the reviewer for their careful reading of our manuscript and have worked to address its previous shortcomings as described below.

      Major Comments:

      1) Fitness function choices: Two fitness functions are used for the majority of this paper, number of cell divisions and CV_birth. What motivates the choice of these fitness functions and how do they relate to single-cell fitness?

      We added some text describing the choice of fitness function in the Supplement in the S3A - Fitness subsection. Using the number of cell divisions as a fitness makes sense since the higher the number of divisions in a given window of time, the bigger the population, which corresponds to the classical Darwinian fitness. Adding CV as an extra fitness specifically pushes the system towards better size control, which is the problem we aim to study, and also helps the optimization process. This is an effective way to include in our simulated evolution all observed detrimental effects observed when cell size is not controlled well. In the methods section we write:

      “We impose two evolutionary selection pressures in the form of two fitness functions. The first fitness function is simply the number of cell divisions during a long period, which we call NDiv . This is consistent with the classical definition of fitness as optimizing the number of offspring and is to be maximized by the algorithm. The second fitness function is the coefficient of variation of the volume distribution at birth for those NDiv generations, which we call CVBirth and is to be minimized by the algorithm. This penalizes broad distributions of volume at birth, which are detrimental to cell size homeostasis, which is what we aim to examine here.”

      Since the selection for tight size distribution is enforced via minimization of CV_birth, the model is unlikely to explain the timer control that is observed in some parts of the cell cycle. The authors discuss how a single fitness function results in all-or-nothing selection in the evolutionary algorithm, however, a third simultaneous fitness function is not considered. Are the results of this paper robust with respect to the addition of other selection pressure (for instance, optimization of growth rate)? This is a crucial question that is not addressed in the text.

      While we could always add more fitness functions, we have to start somewhere. The two fitness functions we use make most sense for the problem we are interested in, and allows us to obtain some clear results from the examination of an already complex starting point. Adding more than two fitness functions greatly increases the complexity of the problem. In fact, we are not aware of any work in the field of computational evolution using more than two fitness functions. One reason is that simulated evolution under control of two fitness functions is already not well understood in general (as we discussed previously in Francois & Siggia, Physical Biology 2008; Henry et al Plos Comp Bio 2018). We hope our simulations will inspire other work in this direction.

      2) Cell-cycle structure not considered to be changeable in evolution: Based on the presented details of the evolutionary algorithm, the network topology parameters are varied but not the temporal structure of the cell cycle, i.e. timer in G1/S and sizer S/G2/M or sizer in G1/S and timer in S/G2/M, etc. How do you justify evolution in one part of the cell cycle but not in the other? Do your results hold when the temporal structure is permitted to evolve?

      We are very interested in how the network structure affects the results. To address this point, we did invert size-dependence of the cell cycle phases as suggested by the reviewer i.e., we considered a fission yeast-like network with a timer in G1 and a sizer in S/G2/M (see Fig. 4,5, and S10). The possibilities of performing different types of evolution experiments is almost endless. We therefore restricted our examination to cases inspired by naturally occurring networks in well studied model organisms such as budding and fission yeasts. While it is in principle possible that size control could take place in multiple cell cycle phases, we do not yet know of a naturally occurring example and so chose not to explore this possibility at the present time. Nevertheless, the reviewer is raising a very interesting question as to why evolution selecting for cell size control tends to pick one or another cell cycle phase, but possibly not both, in a particular organism. We do not know the answer to this question at present and refrain from attempting to address it since our manuscript is already quite dense. Future work can explore this interesting direction.

      3) Noise sources: The authors consider noise protein quantity or concentration while neglecting noise in growth rate or division. Can the assumption that growth noise is negligible compared to protein production noise be supported by experimental data? This is a crucial assumption that is not supported by a discussion of physical values or citations. In addition, it is assumed later in the supplement (S132-133) that there is no division noise without presenting justification for why that noise is negligible on the scale of protein production noise.

      As for many other points raised by the referee, there is a necessary balance to achieve between biochemical realism and simplifying assumptions to theoretically study such problems. Of course we fully agree with the reviewer that there are multiple sources of noise in the system. In this study, we chose a hierarchical way of introducing noise in the system, starting with the biggest contributing factor and incrementally adding sources of noise if needed. We chose to first focus on noise in the cell cycle phases themselves whose CV can be as high as 50% (cf Fig. 1 in Di Talia et al 2007 Nature). For this reason, we first introduced noise in the precise timing of the G1/S transition as well as in the timing of the S/G2/M phase duration. Next, we introduced protein production noise because it is larger than the noise associated with cell division and cell growth rate in several cases where it has been measured. For example, the CV of cell growth rate in a diploid budding yeast is ~14% (Di Talia et al 2007 Nature; cf Table S12). The noise in partitioning at cell division is easier to measure in symmetrically dividing cells. For human cells grown in culture, division noise is ~10% (cf Fig. 3G in Zatulovskiy et al 2020 Science). In contrast, noise in protein concentrations is typically higher. This can be seen in the examination of molecular noise across all GFP labeled proteins in budding yeast (Newman et al, Nature 2006, PMID: 16699522). The CV in concentration of regulatory proteins in similarly sized cells is ~20-30% which is larger than noise in division by partitioning or noise in cell growth rate. We therefore next focused our analysis on the effects of protein production noise.

      In revising our manuscript, we now also consider noise in cell growth rate and noise in partitioning of mass at division as suggested by the reviewer. This results in slightly lower control, and more noise in alignment with our intuition. However, broadly speaking, our results are unchanged (see new supporting figures Fig. S6-S7 shown below). We now describe the logic of our series of simulations of increasing complexity in the methods section, which has two new paragraphs that reads as follows: “In this study, we chose a hierarchical way of introducing noise in the system, starting with the biggest contributing factor and incrementally adding additional sources of noise in subsequent analyses. All simulations presented include noise (stochastic control of G1/S transition and timing of S/G2/M, see below) in the cell cycle phases, whose CV has been found to be as high as 50% (Di Talia et al., 2007). Then, we introduced protein production noise via Langevin noise because the CV of regulatory protein concentrations is typically 20-30% (Newman et al., 2006). Importantly, the cell volume also contributes to stochastic effects, which are larger in smaller cells with fewer molecules. Thus, for stochastic simulations, we include a multiplicative 1/√V contribution to the added Gaussian noise term (see more complete description in the Supplement).

      We also checked that our results are largely invariant when adding other sources of noise (see Figs. S5-S7). In these simulations, we also included noise in cell growth rate (CV ~15%; e.g. (Di Talia et al., 2007), and in mass partitioning at cytokinesis (CV ~10%; e.g. (Zatulovskiy et al., 2020).”

      4) Types of biochemical interactions considered: It is assumed that inhibitor protein production rate scales with cell volume. Is this assumption supported by data? The assumption is contrary to the production rate of the inhibitor protein Whi5 in budding yeast, which does not scale with cell volume.

      In general, most proteins are at relatively constant concentration as cells grow. This means that their production rate (measured in number of proteins per time) has to scale in proportion to cell volume. As noted by the reviewer, Whi5 in the budding yeast is an exception to the general rule where the production rate does not scale with cell volume. This Is why Whi5 is diluted by growth, leading to a sizer in G1. However, allowing the network to generate size control with a diluted inhibitor starting point is basically too simple because it would start with a size sensor and does not need to evolve any feedback mechanism. Here, we are focused on exploring how cell size control can be done by a network with multiple feedbacks rather than just the concentration of a single protein. We made those points more explicit in the text, which now included the following sentences in the methods section: “We note that we are not allowing the cell to employ proteins such as Whi5 in budding yeast whose production is independent of cell size so that its concentration is a direct readout of cell size (Schmoller et al 2015; Swaffer et al 2021). We chose to do this because we want to explore how cell size control can be done by a network with multiple feedbacks rather than just the concentration of a single protein with a special dedicated synthesis mechanism.”

      5) Comparisons to data: Currently no attempt has been made to compare the model predictions quantitatively with experimental data that are easily available. For instance, how does the CV of cell birth size predicted by the model compare with cell size distribution in budding yeast or in the fission yeast? The same goes for the scaling of added volume with initial cell volume in different phases of the cell cycle. Furthermore, the noise parameters should also be calibrated to reproduce the cell size variability seen in experiments.

      To facilitate the comparison of our evolution simulations with model organisms we have included Table S1 in the supporting material, where we show the published results for budding yeast, fission yeast, and mammalian cells grown in culture and mouse epidermal stem cells growing in the animal. In fact, it turns out that distribution and CV that we obtained in our simulations are relatively similar in some cases to what is observed experimentally, but can also be much lower and exhibit a tighter control when optimized. However, the comparison is not perfectly fair since the model organisms were grown in laboratory conditions rather than their natural environment for which they are likely more optimized.

      Reviewer #3 (Public Review):

      In this paper, Proulx-Giraldeau et al. develop evolutionary simulations to study how size control can emerge. In the first part of the paper, the authors initiate cell cycle simulations with a simple network that does not allow cell size sensing and ask what molecular networks can lead to size control after evolution. Results show that a wide range of network types allows size control, some of which are comparable to experimentally identified networks such as the dilution inhibitor model in budding yeast. In the second part of the paper, the authors use their framework to ask how the structure of the cell cycle, including the duration of G1 vs. S/G2/M and the form of size control in each of these phases (i.e. 'sizer' or 'adder'), affects the overall size control. While this is a very important question and the authors bring comprehensive and interesting answers, it is less clear that framing the findings in the context of evolution is meaningful. Indeed, the solutions for how the combination of strength of size control, noise levels, and respective duration of the phases can be found analytically/with simulations that are not 'evolving' the cell cycle structure. Additionally, the finding that a sizer in G1 can lead to an overall adder if it is followed by a timer in S/G2/M is only true if a significant amount of noise is added during the timer phase. At present, this finding is discussed as a result of 'evolution' which is confusing and the dependency of this conclusion on the level of noise during S/G2 does not appear very clearly.

      With more cautiously formulated conclusions and a better discussion of already established theoretical and experimental work, this paper will become more accessible to experimentalists and will be a very valuable contribution to the field of cell size control.

      We thank the reviewer for their careful reading of the manuscript and their thoughtful comments.

      Major suggestions:

      1) Fig 4-5. While the use of the evolution simulation seems interesting to identify which underlying network(s) can result in size control, the use of the same framework to compare the result of sizer+timer vs. timer+sizer is less easy to interpret. Previous analytical/simulation approaches have explored how noise & duration of the timer phase can alter the 'sizer' or 'adder' signature (see doi.org/10.1016/j.celrep.2020.107992, doi.org/10.3389/fcell.2017.00092, for example) and what evolutionary simulations add to this question is unclear.’

      We thank the reviewer for pointing out this highly relevant work, which we now cite where appropriate at various places in the manuscript. We agree that several of our results could have been derived from non-evolutionary analysis as performed in this work (such as the conclusion that a sizer followed by a timer can yield an adder). However, many of our other results cannot. For example, we are interested in how a network based on constant concentrations of proteins can measure cell size. Our evolution simulations yield highly non-trivial networks which we then proceed to analyze. We now clarify the distinction between our approach using evolution simulations to the more traditional analytical approach in the discussion. We added the following text: “We note that these generic results of how sizers and adders can govern cell size homeostasis can be derived from more traditional analytical methods (Barber et al., 2017; Willis et al., 2020). However, our evolution simulations are particularly useful because the molecular networks that evolved give non-trivial insights into how the observed size homeostasis dynamics can be regulated.”

      – What is the authors' interpretation of why the optimization of Pareto vs. number of divisions yield different size control results (Fig. 4A)? Is it possible that these different fitness parameters allow for the evolution of different levels of noise/duration of the timer phase?

      This relates to what we discuss in section “A two-step evolutionary pathway for cell size control”. We think the effect is intuitive : if there is no selection on CV, there is no reason for the system to evolve good noise control in general. Then in the absence of secondary effects such as size dependent growth rates, etc…, networks such as the one presented in Fig 5 A are essentially optimum for the number of divisions, and are pure sizers. This is not related to the timer phase as far as we can see. We added a few words at the end of that section to make this more explicit.

      – In the conclusion: 'G1 control is more conducive to the evolution of adders, while G2 control is more conducive to sizers', do the authors really believe that this is an evolutionary acquired trait, or are their observations instead the natural consequence of having a noise-adding phase (timer + multiplicative noise) after a phase with size control?

      We believe what the reviewer says, ie, adder is a consequence of noise-adding phase after the size control. We do not think this is necessarily an evolutionary acquired trait. As discussed above, and now in our discussion, this result could have been found using traditional analytical approaches. That the result is similar in a computational evolution simulation is interesting because the flexibility of the PhiEvo algorithm might have allowed for different phenomenological results to emerge. That they did not do so further strengthens the intuition built up from the analytical approach.

      – A perfect sizer in G1, followed by a timer (with exponential growth) in S/G2/M would simply give an overall 'noisy sizer' (i.e. the slope of final volume vs. initial volume would still be 0 but with some variability around the slope). Only beyond a certain level of noise added in S/G2/M, would the sizer signature be lost. Would it be possible for the authors to perform simulations with different levels of noise (on the timer in S/G2) to help understand this conclusion better? This conclusion could be one of the most valuable to experimentalists studying different organisms.

      This is an excellent suggestion by the reviewer and we have performed these evolution experiments examining the effect of modulating the noise in the S/G2/M timer. We consider a CV in the timer of 0, 5, and 8% corresponding to no, medium, and high noise respectively. The average duration of the timer is half the time it takes to double the cell’s volume. Having specified the S/G2/M timer parameters, we then evolved and selected networks as previously, and compared ensembles of 60 networks for each noise level. The results are in line with our and the reviewer’s intuition. Increasing the noise, progressively leads to a loss of the sizer signature and increases the CV of cell size at birth. These results are described in a new paragraph in the results section modulating cell cycle structural constraints selects for sizers and adders, which reads as: “We next considered the effect of changing the amount of noise in the timer phase of the cell cycle. To do this, we examined the evolution of networks performing size control in G1 and where the S/G2/M phase with an increasing amount of noise. Increasing the noise in the timer progressively reduced the amount of size control done by the network (Fig. S5). This is likely because the fixed duration of S/G2/M allows the system to accurately reset protein concentrations for the subsequent cell cycle to promote accurate G1 control (Willis et al., 2020). We also examined the effects of adding noise to the cellular growth rate and to volume partitioning at division and found similar results (Fig. S6-S7).”

      The results are shown in the new supporting figure 5.

      2) Some aspects of the mathematical formalism were unclear: - Working with the hypothesis that growth is exponential and at a constant rate is reasonable. However, the description of the scenario where growth modulation contributes to size homeostasis is incorrect. E.g. the statement 'cells further from the optimum size grow slower' is not accurate. If size control occurs via growth regulation, what is expected is a negative correlation between size and growth rate (big cells grow slow, small cells grow fast).

      To clarify this point, we have modified the sentence to read as: “In the first class, it is crucial that the growth rate per unit mass of a cell depends on cell size so that cells that are significantly larger than the optimum cell size grow slower.”

      – The quantity I is produced with a rate proportional to volume, degraded at a constant rate, diluted by cell growth': why is I diluted? Concentration should be constant if I increases at the same rate as volume. 'the quantity of I does not initially depend in any way on the volume'. Does the quantity of I not increase with volume (since concentration is constant)?

      The equation for the amount of I does not have a dilution term, but the equation for the concentration of I does. This is easy to see if you consider stopping synthesis of I but continuing cell growth. In the case where I is stable, the concentration of I would decrease in proportion to the growth rate of the cell, which is the dilution term. In the case of constant synthesis of I, the concentration is indeed constant at equilibrium and reflects a balance between protein synthesis and dilution and degradation (e.g., see Eq. S4).

      Fig. 3, The rescaling of the variables to tau and Veq was difficult to understand. Fig. 3A: If T_S/G2/M is at ~0.5 of the doubling time tau, how relevant is it to look at the behaviour of T_(Vc) for values of T_(Vc)/tau above 0.5 (and beyond 1)? Fig 3B: for which value of T(Vc) is the prediction made?

      Time is rescaled to the amount of time it takes to double the biomass. Volume was rescaled to the average volume at the G1/S transition for a population of cells at the size distribution's steady state. We realize now that this nomenclature is unclear, and have replaced Veq with <VG1/S>, which we believe is more clear.

      Because of the timer constraint, T_(Vc)/tau has to be at least 0.5, which corresponds to a G1 phase with 0 duration. But, in principle, T_(Vc)/tau could have any value larger than 0.5. The range of T_(Vc)/tau is set by the size control mechanism after we specify the range of Vc that we wish to examine. To clarify this, we now denote what parts of the plot correspond to cells increasing or decreasing in size.

      The prediction is the solid line and is made for a bit more than the range of cell sizes that we see in the steady state simulation. We think there is confusion about our nomenclature for a single point indicated on each line as ‘Added Veq’. This point represents the average amount of volume added at steady state. To clarify this we now label this as <∆V>.

      4) Discussion:

      – Including a discussion of previous theoretical work that explored the consequences of varying the relative duration of the timer and sizer phases would be valuable.’

      As discussed above, we have now cited the previous theoretical work in the introduction, results, and discussion. We thank the reviewer for pointing out this omission.

      – A reason commonly evoked to explain why cells might show sizer vs. adder behaviour is the role of the growth mode: S. pombe is a sizer but is thought to grow linearly, E. coli behaves like a sizer when it grows slower than usual (see Walden et al. 2015). It would be helpful to mention this when discussing S. pombe and remind the reader that the findings of this paper are limited to exponential growth mode.

      As suggested, we clarify that our analysis is restricted to exponential growth rates and that S. pombe growth rates have been reported to deviate from exponential.

      – The paper seems to be focusing on the noise of the size control mechanism (i.e. probability of transitioning through G1/S based on levels if I) but does not address the question of other sources of noise (i.e. asymmetry at division). What do the authors think about the role of such sources of noise as selective pressure on size control mechanisms evolution?

      This point was also raised by referee 2. There is a necessary balance to achieve between biochemical realism and simplifying assumptions to theoretically study such problems. Of course we fully agree with the reviewer that there are multiple sources of noise in the system. In this study, we chose a hierarchical way of introducing noise in the system that starts with the biggest contributing factor and incrementally adding sources of noise if needed.

      In revising our manuscript, we now also consider noise in cell growth rate and noise in partitioning of mass at division as suggested by the reviewer. This results in slightly lower control, and more noise in alignment with our intuition. However, broadly speaking, our results are unchanged (see new supporting figures Figs. S6-S7). We now describe the logic of our series of simulations of increasing complexity in the methods section, which has a new paragraph that reads as follows: “In this study, we chose a hierarchical way of introducing noise in the system, starting with the biggest contributing factor and incrementally adding additional sources of noise in subsequent analyses. All simulations presented include noise (stochastic control of G1/S transition and timing of S/G2/M, see below) in the cell cycle phases, whose CV has been found to be as high as 50% (Di Talia et al., 2007). Then, we introduced protein production noise via Langevin noise because the CV of regulatory protein concentrations is typically 20-30% (Newman et al., 2006). Importantly, the cell volume also contributes to stochastic effects, which are larger in smaller cells with fewer molecules. Thus, for stochastic simulations, we include a multiplicative 1/√V contribution to the added Gaussian noise term (see more complete description in the Supplement).

      We also checked that our results are largely invariant when adding other sources of noise (see Figs. S5-S7). In these simulations, we also included noise in cell growth rate (CV ~15%; e.g. (Di Talia et al., 2007), and in mass partitioning at cytokinesis (CV ~10%; e.g. (Zatulovskiy et al., 2020).”

    1. Author Response

      Reviewer #1 (Public Review):

      “The synthesis and metabolism of sphingolipid (SL) are involved in wide range of biological processes. In the present study, the authors investigate the role of SPTLC1, one of the essential subunits of serine palmitoyl transferase complex, in both physiological and pathophysiological angiogenesis, via using inducible endothelial-specific SPTLC1 knockout mice. They found SPTLC1 deficiency in ECs inhibited retinal angiogenesis along with reducing several SL metabolites in plasma, red blood cells, and peripheral organs. In addition, the authors found SPTLC1 EC-KO mice are resistant to APAP-induced liver injury. Overall, the in vivo findings in the present study are of potential interest and the authors have given clear evidence that endothelial SPTLC1 is critical to retinal angiogenesis. However, the underlying mechanisms are completely lacking in the present study. Most of the evidence provided is circumstantial, associative, and indirect.”

      We appreciate the positive comments of the reviewer. We have addressed the reviewer’s concern regarding underlying mechanisms as detailed below.

      “To be specific,

      1. The authors found endothelial SPTLC1 is important to both angiogenesis and the plasma lipid profile. However, the authors did not present the data to demonstrate the relationship between them. The in vivo findings about the phenotype and the plasma lipid profile might be true and unrelated. It would be important to know whether supplementing the reduced lipid induced by SPTLC1 KO could rescue the angiogenesis related phenotype in mice, or, whether the alternative way to inhibit the SL synthesis could mimic the phenotype of KO mice.”

      In the manuscript, we discussed the possibility whether S1P is involved, since it is one of the most down-regulated SL in the plasma and a major regulator of angiogenesis. We think it is unlikely that reduced plasma S1P is responsible for the phenotype. First, the retinal angiogenesis defect in Sptlc1 ECKO mice is the opposite of S1pr1 ECKO as we have published previously (PMID: 22975328, PMID: 32059774). Moreover, deletion of sphingosine kinase, the enzyme produces S1P, in the endothelium does not influence retinal angiogenesis at P6 (Figure 3 Supplement 2 A and B). Loss of S1P chaperone ApoM- i.e., Apom KO, which exhibits 50% reduction of plasma S1P, does not show change in retinal vascular development (Figure 3 Supplement 2 C and D). Taken together, our results strongly suggest that reduction in plasma S1P is not the cause of vascular defect in Sptlc1 ECKO retinas.

      Based on our results in the manuscript, loss of SPT enzyme activity in endothelial cells reduced SL species in the endothelial cells and the plasma. Our in vitro and VEGF intraocular injection experiments (new data) suggests that the angiogenic defects seen in Sptlc1 ECKO mice is due to cell intrinsic defects in VEGF signaling and not due to changes in plasma SL levels. We have edited the discussion section to address this issue.

      “2. A major issue is that the present study did not reveal is a real downstream target. It is possible that VEGF signaling might be impaired by SPTLC1 knockout as discussed by the authors. However, the authors did not demonstrate this point with data. Including both in vivo and in vitro data to evaluate the effects of SPTLC1 deficiency on VEGF signaling might further strengthen the hypothesis. Besides, with in vitro experiments, the authors might further find the critical metabolite(s) involved in VEGF signaling and angiogenesis.”

      As discussed above, we agree with the review’s critique and have addressed this essential point with new experiments (both in vitro and in vivo) in Figure 5. Our new data shows that SPT pathway supplies the glycosphingolipid GM1, which is needed for efficient VEGF-induced ERK phosphorylation and tip cell formation.

      Reviewer #2 (Public Review):

      “Andrew Kuo et al. investigated the role of endothelial de novo sphingolipids (SL) synthesis using endothelial cell specific SPTLC1 knockout (ECKO) mice. They showed that these mice exhibited low concentration of various SL species in not only ECs but also RBC, circulation, and other non-EC tissues. They also showed that ECKO mice exhibited impaired angiogenesis in normal and oxygen-induced retinopathy models, consistent with the decrease of endothelial proliferation and tip cell formation. They finally revealed that these mice were resistant to acetaminophen-induced acute liver injury in early phase. The experiments were well-designed, and the results were clear and convincing. The authors concluded that endothelial cells were the major source of SL in circulation and various organs (liver and lung) other than retina (and probably brain). The weakness of the current version of the manuscript is that the authors did not elucidate the mechanisms underlying the observed phenomena.

      1) The authors showed impaired angiogenesis in ECKO mice using neonatal retina model. Based on the fact that this phenotype was similar to that in endothelial VEGFR2 deficient mice, they suggested that VEGF responsiveness is altered in ECKO mice. Although this hypothesis is plausible, the authors would need to prove it by evaluating VEGFR signaling (VEGFR phosphorylation, Akt activation etc.) in ECKO mice.”

      We thank the reviewer for positive comments. As for the weakness identified, we have addressed this point by conducting new in vitro and in vivo experiments (detailed above). The new Figure 5 addresses this issue directly.

      “2) The acetaminophen-induced liver injury was reduced in ECKO mice in early phase. However, it is still unclear whether SL production itself affects liver injury. The authors discussed the possibility that gene deficiency increases unconsumed serine resulting in GSH increase, but it is essentially independent to SL. If possible, it would be good if the authors could investigate the effect of SL administration on the liver injury progression.”

      We appreciate the reviewer’s concern about liver injury model in the Sptlc1 ECKO mice. Our data suggests that SL species supplied from EC impacts hepatocyte response to stress. Since the acetaminophen induced liver injury is highly dependent on reactive oxygen species, our finding that increased glutathione levels in the Sptlc1 ECKO mice may be involved in the phenotype. However, we are simply considering them as biochemical markers of liver injury. This has been addressed in the discussion.

      “3) This paper showed the impaired cell proliferation in Sptlc1 KO EC mice, and discussed it. Authors described that this phenotype was similar to that of Nos3 KO mice, but its inconsistency with Sptlc2 ECKO adult mice was only justified by a word "isoform-selective function". Authors could quantify eNOS expressions in Sptlc1 KO mice, compared results and then discuss this matter. “

      In figure 1C, we used eNOS as an EC marker to show purity during our EC isolation process. In fact, we did not observe change of eNOS expression in Sptlc1 ECKO. We also did not detect elevated phospho-eNOS in Sptl1c ECKO in contrast to Sptlc2 ECKO adult mice (Figure1 supplement 4). Additionally, our work in the retina was performed in postnatal-genedeletion pups from P6-P17 which is different from the published Sptlc2 ECKO study. The differences in gene deletion strategy (early postnatal vs. adult) could result in differences in eNOS expression . We have added discussion about this issue.

    1. Author Response

      Joint Public Review

      1) The structures of the PDZ domains of PSD95 have been determined and they are well-folded and stable. In addition, the PSG module has been shown to adopt a stable structure after expression and purification. The authors should cite papers, their own and those by Zeng et al. (e.g. J. Mol. Bio, 2018), to reassure readers that the protein is not destabilized by the cysteine mutations. The authors need to state how many purifications of the mutants have been done and how many replicates have been made for the FRET measurements. Did the FRET data change over time?

      We appreciate the importance of selecting labeling sites that do not disrupt protein structure and activity. There are two protein constructs in this work: full-length PSD-95 and the PSG truncation of this same protein, which have been expressed hundreds of times over more than a decade in my lab. The cysteine mutations used in this work have all been validated as non-disruptive to the protein and the dyes in several ways. 1) We selected labeling sites using the available x-ray and NMR structures to ensure surface accessible residues within alpha helices or short loops to minimize tertiary structural disruption; 2) we ensured that the two point mutations don’t affect the expression and purification protocols. Misfolding or changes in conformation would be visible on elution profiles from chromatography as well as proteolytic cleavage patterns, which are sensitive to protein folding; 3) in our previous work, we measured both donor anisotropy and acceptor quantum yield for all of the variants in use here but one, which relied on existing sites in a new combination. Dyes involved in interactions with proteins or changes in dye environment would become apparent through changes in quantum yield and anisotropy. Any problematic labeling sites have been purged from the current work, which uses a small subset of the mutants from our earlier work. The repeatability of the expression and purification of all these constructs has been demonstrated in our published work and is not affected by the specific labeling mutants in use. The stability of these constructs is supported by the numerous other NMR and x-ray crystallography studies published on these robustly expressing proteins. To highlight this important issue, we have added additional discussion of the origin and validation of these mutants in the text on page 4 and in the methods section. We also included references to the tables of photophysical measurements for the library of PSD-95 cysteine mutants adapted for this study.

      We did not explicitly track the number of purifications used in this work, which spanned more than five years. We were not aware of any expectation to provide such records but will be more aware going forward. The measurements for this paper come from one or in some cases two protein expression runs, each of which generates 2 or more cell pellets. Each of these pellets generates a single affinity and ion exchange purified sample. This is then aliquoted and frozen, which may produce more than a dozen samples for fluorescent labeling. Individual labeled samples are given additional rounds of desalting and size exclusion chromatography immediately before measurements to ensure than the full length proteins are used and that there has been no aggregation or degradation. In terms of repeatability, the data shown in this manuscript involves repeat measurements of the same constructs using different FRET dye pairs, collected on different instruments at different times and still shows excellent agreement. All of the measurements involve as few as one protein expression run and a minimum of two separate labeling and purifications for two independent sets of measurements. Some variants exceeded this standard but this was not tracked during this long study.

      Regarding the agreement of experimental observables across different protein preparations, one of the variants within the existing dataset (P2-S3) was measured on two experimental setups, two years apart, using two different expression runs each with separate protein purifications and labeling reactions. Comparison of these measurements revealed that the mean FRET efficiency values measured at Clemson were 0.70 while that measured at HHU was 0.71 w mean DDA lifetimes were 2.29 and 2.4, respectively.

      2) The authors have not explained how the approach taken in this paper compares to their previous simulated annealing approach of mapping PDZ3 using FRET data in McCann et al., 2012. That study resulted in a model in which PDZ3 binds to a completely different interface, which is not mentioned in this manuscript.

      We apologize for this oversight and thank the reviewers for this reminder. The omission was an error of trimming the manuscript for brevity and we appreciate the opportunity to highlight how much our approach has improved over the intervening time. We have included commentary on our previous modeling in the revised discussion.

      3) The biochemical disulfide (DS) mapping experiments provide a useful check of predictions of the FRET and DMD conclusions. However, in order to interpret these correctly, the authors need to show data from negative controls testing cysteine pairs that are predicted NOT to interact.

      We agree that negative controls are a critical part of the disulfide mapping experiments and thank the reviewers for this suggestion. As a negative control, we selected a cysteine pair that showed low FRET in our 2012 PNAS paper (Q374C-K591C), which was not included in this work nor was the cysteine pair involved in contact interfaces identified from simulations or modeling. This cysteine pair showed no evidence of intramolecular disulfide formation. In the manuscript, we have provide an additional supplemental figure panel to document that this negative control sample does not form disulfides.

      4) The SH3-GUK domain of PSD95 can undergo domain swap dimerization and the dimerization is promoted by binding of the synGAP PDZ-ligand to PDZ3. The authors should mention the existence of domain-swap dimerization (citing McGee [2001] and Zeng et al. [2018]) and indicate whether they tested that the FRET-labeled proteins are monodisperse. This is particularly important in light of the high variation in diffusion time for individual variants - 0.91-10.19 ms (see also #10 below). In particular, the P3-G4 FRET variant has a long diffusion time of 10.19; could it be undergoing domain swap dimerization?

      We are very interested in the prospect of domain swapping as has been suggested previously. However, we have not seen evidence for this at the concentrations used here. As reported in our 2012 PNAS paper, both full-length PSD-95 and the PSG fragment are monodisperse as judged by size exclusion chromatography, which suggests that lack of stably populated oligomeric states under these conditions at 10-5 molar concentrations. The PSG fragment runs very true to its calculated formula weight while the full-length protein does migrate faster than expected based on formula weight but not high enough to be a dimer.

      The DS mapping experiments did reveal some higher molecular weight species. However, these higher order species never accounted for more than 5% of the total input. Thus, any intramolecular interaction is transient and not well occupied under the buffer conditions and concentrations used in these studies. Our size exclusion and disulfide mapping experiments are carried out at protein concentrations that are orders of magnitude higher than used for single molecule imaging. Thus, dimerization is unlikely at the single-molecule concentrations used for the present FRET experiments. If dimerization were to occur, we would expect the appearance of additional static subpopulations in the MFD histograms. If dimerization were significant, we would also expect the appearance of an additional diffusion term in fluorescence correlation curves, which was not the case in these experiments.

      5) On page 4, line 5 the authors state: "the number and occupancy of conformational states were set as global fitting parameters". This assumes that the protein is unbiased by the labeling and that the protein behaviour is independent of the purification batch. Have the authors verified this?

      The reviewers are correct in stressing the importance of quality control in the selection of labeling sites and reproducibility in sample preparation. The PSD-95 purification has been carried out hundreds of times in the Bowen lab using different variants. The cysteine mutations used in this work have all been validated as non-disruptive to the protein and the dyes in several ways. 1) We selected labeling sites using the available x-ray and NMR structures to ensure surface accessible residues within alpha helices or short loops to minimize tertiary structural disruption; 2) we ensured that the two point mutations don’t affect the expression and purification protocols. Misfolding or changes in conformation would be visible on elution profiles from chromatography as well as proteolytic cleavage patterns, which are sensitive to protein folding; 3) in our previous work, we measured both donor anisotropy and acceptor quantum yield all of the variants in use here but one, which relied on existing sites in a new combination. We have insured that sites with poor properties are never included in our published work. Indeed, the reproducibility of sample preparation, using chromatography before and after labeling, gives confidence that the attachment of fluorescent dyes is not altering macromolecular properties. For the dyes to change the protein structure, they would have to interact competitively with the protein interfaces or disrupt local structure. These would be expected to change the dye quantum yield or the anisotropy, which were each measured in our previous work. In addition, the multiparameter fluorescence detection includes anisotropy measurements of the current samples. None of these measurements reveal aberrant fluorophore behavior (Supplemental File 3C).

      This alone does not rule out that the dyes affect the conformational ensemble. One can take additional confidence that our protein handling workflow does not affect the results from the cross-methods agreement that we demonstrate in the current work. First, between measurements of both full-length PSD-95 and its PSG truncation, using confocal and TIRF experiments boosts confidence. The labeled samples for each experiment were prepared from the same purified proteins but labeled independently with different dye pairs. The different dyes attached to the samples used for confocal and TIRF did not impact the time averaged distances between these residue pairs save for one slight outlier. Additionally, our cross-validation using disulfide mapping, which is entirely label free, provides additional confidence that the interdomain contact interfaces, observed in the data collected using the labeled proteins, are preserved when the labels are not present. Finally, independent DMD simulations of label-free PSG were in excellent agreement with regards to the predominant states identified from rigid body docking based on experimental FRET distance and the disulfide mapping.

      6) On line 6 the authors state: "Based on fitting statistics, we demonstrate that a two-state model with a small donor-only (or no FRET) population (Supplementary file 1C &D) is sufficient to fit all data.” From the average Χ2 this can be concluded, but for individual datasets sometimes a 1 state model or 3 state model seems more appropriate. The authors should explain why measuring more cys mutants justified using 'one unifying model'? How can the data contain donor-only contributions if pulsed-interleaved excitation (PIE) is used to select only molecules with active donor and acceptor fluorophores?

      We apologize for the lack of clarity as to how we arrived at the determination that two states were present in the conformational ensemble. The fitting statistics show that there is an improvement in global fitting upon increasing the number of states in the model from one-state to multiple states. The statistics in the former Supplementary file 1C show significant improvement upon fitting with two states relative to one while adding a 3rd state marginally improves 3 variants while the remaining 9 remain unchanged or show a slightly worse fit. The former Supplementary File 1D (now 3C) provides a list of the values for each of the constants that arise from fitting the 2-state model to all datasets simultaneously and the individual fit statistic for fitting this model to the specific variant dataset. This table assigns the global population fractions and their associated donor lifetimes but was not used to assign the number of states. That there are two states is based solely on the improvement in fitting statistics with two states shown in the former Supplementary File 1C. Thus, the statistics do not justify us including an additional state. Because this is such a critical point, we have moved the former Supplementary File 1C to the main text as Table 2 and add additional discussion to the manuscript to highlight how we arrived at a 2 state model.

      The reviewer is correct that a global fit of the dataset could result in suboptimal fits for an individual FRET pairs to satisfy the global minimum. In this case, most variants were best fit by a two state model. The reason for using one unifying model is our underlying assumption that the same conformational distribution for PSD-95 is sensed differently by each labeling combination. A primary conclusion from this assumption is that all variants share a population distribution. A secondary assumption is that protein handling is not biasing this conformational ensemble, which we verify as described above. Each measurement provides part of the same story so we were only interested in models which simultaneously explained all observed FRET data, and as such enforced the single global model. A global fit also proved the best way to uniquely assign each distance to its corresponding state. Furthermore, the FRET Network Robustness analysis explicitly examined how much our model depends on any one labeling variant and found no systematic deviations. This revealed an ensemble of structures that satisfy the data without enforcing a global model for all samples simultaneously.

      We also thank the reviewer for correctly observing that we misapplied the term donor-only (DO) in the manuscript. The population we referred to is more appropriately termed a “No-FRET” or “low-FRET” population. The reviewer is correct that active, FRET-labeled molecules were selected using PIE parameters. We have corrected this in the manuscript.

      7) All variants are shown to be dynamic, but they are positioned differently on the dynamic FRET line (Fig. 1D and S3). Does the same kinetic model underly each variant? If the same state occupancies are implied, then why not the same kinetic constants, especially for distances probing the same two domains?

      While the global population fraction is shared between variants the transitions rates for Individual variants are not constrained. As such the variants do not share a single equilibrium rate constant. While the FRET data is fit to two global states, our DMD simulations showed that there is substantial fuzziness within these global states. Thus, the full kinetic network is more complex than the global 2-state transition. As our screening of DMD snapshots showed, each FRET variant is uniquely sensitive to the underlying conformational transitions. Hence, the system is underdetermined and we are not able to adequately determine forward and backward kinetic rates for each variant individually.

      It is important to recall that the data shown in multiparameter FRET histograms has been binned with millisecond time resolution, which is slower than the local conformational dynamics arising from fuzzy domain rearrangements. The position of the peak will depend on the underlying rate constants. Our Photon Distribution Analysis reveals the kinetic processes that dominate the broadening of the FRET efficiency distributions. This analysis also measures the fractions of the effectively “static” population. Fast transitions, which do not significantly contribute to changes in FRET efficiency (or broadening) on the binning timescale, will appear as static populations. Thus, the simple PDA model captures the broadening that is also present in MFD histograms, but does not adequately describe dynamics at the fastest timescales.

      8) Could the data also be explained by "fuzziness" within domains, without interdomain dynamics? The authors should discuss this given the possibility of domain swap dimerization of the SH3-GuK domain.

      In this work, we use the term fuzziness to refer to alternate residue interactions and domain orientations within a global contact basin. Using this definition, we do not expect significant structural rearrangements within the PDZ, SH3 or GuK domains. These domains are well folded and have been studied individually and in combination using x-ray crystallography and NMR, which did not reveal local distortions of the domain fold (e.g. SH3-GuK interactions). This is not to say that there are not conformational dynamics within loop regions or other small scale subdomain motions. Our rubric for selection of labeling sites is to avoid large loops to minimize the local dynamics as this conformational variability compromises the resolving power of the FRET restraint. Our DMD screening provides details as to how each FRET pair senses changes in local and global conformation. In comparison to the global changes extracted from the fluorescence lifetime decays, the intradomain dynamics are occurring rapidly on small length scale and are not expected to affect our global positioning of PDZ3. We do not observe a significant population of dimers or other multimers under the concentrations used for these experiments as discussed above.

      9) Regarding supplemental File 2: The authors should justify that PDA is an appropriate method to quantify relaxation time of Fluorophores. Dynamics being so fast, how do the authors explain that when binned in 2 ms time bins, discrete subpopulations in the PDA histograms are still clearly observed (e.g. Figure 2B, Fig. 2 supp 3)? Why would the protein move through certain very discrete states and not others? Doesn't this imply that the model is oversimplifying the actual mechanism (even though the Chi^2 is alright)? It is strange that for some mutants (fig 2 supp 3B P1G3) PDA displayed discrete states, while for others (e.g. fig 2 supp 3A P2G6) PDA histograms were smooth, implying it cannot be a low-histogram-count artifact. Or can it?

      We apologize for this confusion but the photon distribution analysis was not used to “quantify relaxation times” of the fluorophores, which comes from fitting of the lifetime decays. Rather, PDA was used to estimate the rates of exchange between limiting states (i.e. the inter-fluorophore distances derived from fitting the fluorescence decays). Obtaining the rates is accomplished by fitting time-binned FRET efficiency histograms with a model that accounts for broadening due to exchange between limiting states.

      We agree with the reviewers that the two-state model, which is sufficient to fit the lifetime decays, is too simplified to fully describe the dynamic exchange between limiting states. To address this, we performed the FNR analysis to describe the limiting state basins within which fast dynamics occur. This extends the model beyond two discrete limiting states. Further, DMD screening shows that different FRET variants do report differently on the underlying conformational landscape. Some exhibit a degree of degeneracy showing similar FRET efficiency for different conformations making each variant insensitive to specific subsets of possible transitions.

      Using fluctuation correlation analysis to probe FRET-induced changes in intensities, we observed dynamics on the 10-5 second timescale, which is much too fast to give rise to broadening in the fluorescence observable histograms. However, these dynamic transitions did not correspond to exchange between states with large differences in FRET efficiency because, if such fast dynamics involved a large change in FRET, this would be associated with a narrow distribution about the mean in MFD histograms. We explain the appearance of distinct peaks for some variants as an increase in the relative contribution of fast dynamics within limiting ensembles compared to the slower processes of exchange between limiting ensembles. This can occur without a relative shift in forward/backward exchange rates and with only a slight shift in the overall relaxation rates on the timescales to which PDA is sensitive (~.01-1 ms).

      10) Regarding supp file 3A and Table S9: The spread on tdiff, (the average diffusion time through the confocal volume) for individual variants is very broad - 0.91-10.19 ms. Considering that the authors use global fits for many different parameters, it's surprising that they didn't use it for this parameter which should unbiasedly be the same for all the protein mutants, at least if all are well-behaved (i.e. non-aggregating). The high variation in tdiff may be a warning that the model is not accurately accounting for all dynamics. For example, might the P3-G4 variant be undergoing domain swap dimerization?

      We thank the reviewers for their observation and apologize for the confusion as to why there are differences in the diffusion time through the confocal volume for the different variants. We expect that there would be three distinct diffusion times because the samples were measured on two experimental setups using different confocal volumes and pinhole sizes. There are also two distinct protein constructs (full-length and PSG), which differ in molecular weight. The longest timescale processes included in the fFCS fits are attributed to long-timescale photophysical effects, such as blinking. As discussed above, we do not expect a significant population of dimers or other multimers at the pM concentrations used for these single molecule experiments.

      We agree with reviewers that the diffusion time for a given construct on a given instrumental setup should be a constant. In this light, we reanalyzed the filtered fFCS curves with enforced consistency for the diffusion times in measurements involving the same construct measured on the same setup. While this refitting slightly changed the values of fit parameters, none of these differences significantly affected the parameters used for modeling and therefore the conclusions of the paper have not been impacted. We have updated the manuscript to indicate the change in the fit models.

      11) In the results section, the authors state: "Summarizing the dynamics observed for the PDZ3-GuK variants, fFCS depicts three relaxation times." This is an overstatement because the authors imposed these three broad relaxation times. The authors should describe how they made these assignments. Is this common practice? Regarding Supplemental File 2 versus Supplemental File 3A: In principle, the relaxation time implied from fFCS and that from PDA should align. However, the 'Average' of fFCS and the T_R of PDA do not align. Is it possible that the dynamics analysis from PDA should have been constrained in some way by the results from fFCS? It would be useful to add error estimations for PDA here.

      We agree with the reviewer that it is an overstatement to say that the number of relaxation terms arises from the correlation analysis. We have removed this statement and instead focus on the differences in dynamics. The assignment of three relaxation terms was made to probe the extent of dynamics across decades in time as each time regime is typically associated with distinct forms of protein dynamics. We enforced these consistent timescales in order to directly compare amplitudes across different FRET variants. However, we do not enforce any assignment that dynamics arising from a particular type of exchange process occur at the same timescale.

      We also agree that obtaining agreement between PDA and fFCS is desirable. In our experience, such agreement is only obtainable for simple kinetic schemes when dynamics probed by fFCS and PDA all occur within the same relative timescales. Here, the contributions to dynamics occur across several decades in time including those obtainable only through fFCS analysis but too fast to be quantified by PDA. Using the methods we employed, we recover only the effective relaxation times rather than the absolute kinetic rate constants because the system is underdetermined. Differences for individual variants arise because the variants differ in sensitivity to specific transitions (Figure 8-Figure Supplement 1) while fFCS and PDA differentially report on the underlying kinetic scheme.

      12) Regarding the DS bond formation data, the authors state, "The α-basin variant showed slightly more DS formation than the beta-basin variant in full-length PSD-95 but the rates of DS formation were similar". It isn't clear what this means physically. It seems to suggest that there is static heterogeneity in the population, i.e. some proteins can and some proteins cannot form DS bonds. The presence of this effect may contradict the assumption that every state at some point interconverts to any other state, which underlies the FRET PDA analysis. The authors should discuss this possible inconsistency.

      We agree with the reviewer that this statement was not clear. It was never our intention that the DS formation kinetics be directly related to FRET data in this way. The goal of DS mapping experiments was to provide qualitative confirmation that supertertiary structures suggested by DMD and FRET experiments occur in solution. We meant to focus on the DS formation kinetics, which are in indication of structural proximity. The extent of DS formation comes from the fitting as a matter of course. The reactions progress to near completion (Figure 7-Figure Supplement 1). The differences in extent of disulfide formation, while real, are very small and we did not intend to highlight them. We have removed any discussion of the extent of DS formation in the manuscript.

      13) In the discussion of the DS experiments, the authors state, "We also observe significant kinetic differences when PSD-95 is truncated in agreement with FRET studies." This sentence is vague. The authors need to state more completely what they mean here. Exactly what is in agreement with the FRET studies?

      We agree with the reviewers that the claim was vague. We intended to communicate that the DS mapping is generally consistent with FRET experiments in that they confirm the proposed limiting conformational states. The formation of disulfides at these points confirms the accessibility and proximity of these sites with respect to one another within the supertertiary structure. Also, both DS mapping and fFCS observed changes when PSD-95 was truncated to the PSG fragment. However, the rates of DS formation are not directly comparable to the rates of conformational dynamics. We have removed this statement from the paper to avoid directly linking these two unrelated kinetic measurements.

      14) The text in the section on "Structural Modeling with Experimental FRET Restraints" is often unclear. The authors appear to have equated States A and B, formerly used only in the seTCSPC analysis to the alpha and beta basins extracted from the DMD snapshots. The authors should discuss whether there might be other conformations in the DMD results that would be consistent with the FRET-derived distances from seTCSPC? It seems possible that there could be, given that in Fig 6 sup 1, large discrepancies exist between simulated distances and FRET-measured distances for some of the FRET pairs. The authors should discuss explanations for the discrepancies that do not compromise the actual model.

      We apologize for the lack of clarity in our description of structural modeling with FRET restraints. We thank the reviewer for the suggestions as to how we can improve this discussion. In the course of this study, we do reach the conclusion that states A and B, obtained from modeling solely based on FRET data, are equivalent to conformations within the alpha and beta basins from DMD, respectively. Because the representative structures were obtained independently via distinct techniques, we felt that it would be premature to use the same terminology when we are introducing the FRET results.

      We agree that more than a single snapshot from DMD per basin appropriately satisfies the FRET restraints and that no one model satisfies all restraints equally. Our goal with the later FNR analysis, which explicitly incorporates FRET-derived restraints, was to identify ensembles of structural snapshots from DMD that are compatible with experimental data. Instead of finding the single best model for the full set of FRET-derived distances, each snapshot in the ensembles from FNR satisfies all distance thresholds independently. Thus, the ensembles from FNR do refer to both experiment and DMD.

      Further, the vertical lines shown in Figure 8 Figure Supplement 1 represent the distances from the initial global fit of all samples simultaneously. For some variants, this likely includes biases in certain distances due to the enforcement of this global model, which FNR seeks to alleviate. For SH3-GK FRET pairs, these deviations are most likely the result of the restraints placed on the motions of the GK domain in the DMD simulations.

      15) A weakness of the modeling approaches in this manuscript is that they are difficult to validate. Could the authors include a test of the modeling in which they show how small changes of the input FRET data would influence the final FRET-restrained model? Could they quantify their confidence in the final model, given all the limitations of the FRET data?

      We agree with reviewers as to the importance of validating structural models regardless of the data modality used in their determination. We respectfully disagree that this study is lacking in model validation. In this work, we generated models based on confocal FRET data and validated the FRET models using independent DMD simulations and disulfide mapping. We also employed smTIRF measurements using a different dye pair to independently validate the time-averaged FRET from confocal measurements. While this may fall short of complete validation of the associated dynamic information, we feel that this represents the state of the art in model validation regardless of the experimental approach. While it is difficult to validate novel methods for deriving structural models, we feel that have done so through cross-validation against other established techniques.

      As suggested, we did test the dependency of the experimental models on small changes in the input FRET data. To accomplish this, we used the same analysis framework described for FRET Network Robustness Analysis. Instead of removing datasets as in FNR, we introduced artificial error into the FRET distances for each variant and repeated the classification of DMD structures using the altered distances. For each trial, we introduced a random, artificial error on each of the FRET distances and repeated the classification of structures from DMD into the two basin ensembles. To check the dependence on the magnitude of the error, we used introduced a random error to each variant between 5 and -5% or between 15 and -15% of the original distance. Each condition was repeated 3 times with different random errors. To compare conditions, we measured the change in the center of mass of the surface distribution composed from the individual PDZ3 centers of mass identified by that screen (Figure 8-Figure supplement 2). We found that increasing the distance error did not significantly impact the classification of structures into the two ensembles. The variance in the mean ensemble positions over three repeats increased with increasing error along with small shifts in the mean positions. Notably, +/-15% is greater than the uncertainties in distances obtained via global fitting of fluorescence decays, suggesting that the intrinsic uncertainty in the FRET-derived distances from a single fit (Supplemental file 3D) does not significantly impact the ensemble assignment or their fuzziness.

    1. Author Response

      We thank the reviewers for their thoughtful and constructive comments which have helped us improve our manuscript. In our revised manuscript, we will respond to three main weaknesses:

      1. We will address the inconsistency in the experimental design across the behavior and the transcription experiments by repeating the behavior with an experimental timeline that more exactly matches that of the animals used in transcriptional studies;

      2. We will further validate and justify our use of TRAP and our focus on the NAc as the sole brain region of investigation;

      3. We will revise the language throughout the manuscript, especially in the discussion, to reduce anthropomorphizing of our results and interpretations. Below we have provided responses to specific concerns articulated by each reviewer.

      Reviewer #1 (Public Review):

      The monogamous vole provides unique opportunities to study the neural basis of pair bonding and this study exploits that opportunity in a novel way. Focusing on the nucleus accumbens, the authors conduct RNA-Seq to characterize the transcriptome in same-sex and opposite-sex pairs when bonded, when separated for a short time and when separated for a long time at which point the literature has in the past demonstrated the willingness to form a new bond. They determine that the transcriptome of pair bonding includes a preponderance of glial-associated gene changes and that it degrades with long-term separation. To the latter point, they then conduct a neuron enriching trap schema to find those genes subject to change specifically in neurons.

      The strength of the report is the clever experimental design, the unusual animal model, and the comparisons of same-sex and opposite-sex pairs and long-term and short-term separations.

      The weakness is that the behavioral changes observed are not what was expected based on prior work and are relatively modest, providing a disconnect between the outcome and the more dramatic transcriptional changes. A second weakness is the focus on the nucleus accumbens which is a brain region most closely associated with reward. While pair bonding may be rewarding, that component may be independent of the memory of a partner or the willingness to partner anew. Lastly, there is no clear connection between the identified transcriptome and either the formation or degradation of the pair bond.

      We thank the reviewer for noting the unique strengths of using prairie voles to investigate this specific question and for praising our experimental design, which compares opposite-sex and same-sex paired males at each time point to disentangle the effects of pair bonding from general social affiliation and isolation.

      Reviewers #1 and #3 noted the mismatch between the behavioral and transcriptional responses. Specifically, we found little evidence of bond dissolution following long term separation despite substantial erosion of the pair bond transcriptional signature. They further note that the experimental design employed to assess behavior and transcription differed, which may have contributed to the apparent mismatch. Importantly, our initial behavioral assessment as presented in Figure 1 of the manuscript had two strengths. It measured intra-animal changes in behavior over time and minimized the number of animals required. However, we agree with the reviewers, and we are currently repeating the behavior experiments to match the transcription experiments. Specifically, separated partners will be kept in separate colony rooms to ensure no possible access to partner-associated sensory cues (visual, auditory, olfactory), and we will use separate cohorts of animals for short- and long-term separation. This design avoids partner re-introduction during the short-term partner preference test. The results of this work will be informative regardless of outcome. If we observe a dissolution of pair bond behaviors, it indicates that re-exposure to a partner after a short, 48-hour separation has a powerful effect on bond duration following separation. If we do not observe any change in pair bond behaviors following separation, it would confirm that pair bond behaviors are more resistant to erosion than are transcriptional signatures of pair bonding.

      We have focused on the NAc because it is a critical hub that is engaged upon attachment formation and is implicated in loss processing. Specifically, studies have shown that blockade of neuromodulatory signaling (i.e. oxytocin and dopamine) in this region impairs bond formation and can lead to bond dissolution. Our group and others have demonstrated that plasticity within this region - in patterns of neuronal activity and in synaptic response to oxytocin - are associated with bond formation and maturation (1, 2). And literature on drugs of abuse has demonstrated an important role for the NAc in encoding of reward associations (3), which ultimately underlies partner preference. Additionally, in human neuroimaging studies, Prolonged Grief Disorder is associated with an enhanced signal in the NAc when viewing images of the lost loved one, suggesting that normal resolution of grief corresponds with a decrease in NAc activity elicited by reminders of the lost loved one (4). Thus, our focus on this region is well supported. Nonetheless, we recognize that the NAc does not act in a vacuum, and the efferent and afferent connectivity of different NAc cell types is well delineated, paving the way for future work (5, 6).

      Additionally, we agree with the reviewer that pair bonding behavior is multifaceted and comprised of several discrete behaviors that are not dissociable in the partner preference test. Partner-associated reward and partner memory may be independently encoded, and disruption of either process would manifest as a decrease or lack of partner preference. In our complete response to reviewers and revision of the manuscript, we will address this point more thoroughly. Finally, we interpret the reviewer’s last comment to be a request for functional manipulations to validate that the predicted transcriptional changes have a behavioral effect. This is beyond the scope of this manuscript but an active area of future research.

      Reviewer #2 (Public Review):

      The goal of this study is to understand the molecular mechanisms by which pair bonded animals recover following the loss of a partner.

      Strengths of this work include: (1) The organism - a novel model for studying pair bonding and loss; (2) The integrative nature of the study; it integrates behavior and brain gene expression RNASeq data and vTRAP; (3) The important and understudied question about how pair bonded animals recover from loss; (4) The thorough and careful analysis of highly multidimensional and complex datasets

      Weaknesses include: (1) the major comparison is between same vs opposite sex housed pairs. This design controls for social effects somewhat, but the two treatment groups differ not just with respect to whether or not they are pair bonded, but also in whether or not they had associated with a male or female. Differences between the treatments could reflect pair bonding, or perhaps something about the sex of the partner. It would be useful to have an additional control group, or data on the behavior of individuals within both types of pairs while they are co-housed. Were transcriptomic effects more detectable in pairs that were more bonded together behaviorally? That would suggest that the gene expression signatures really reflect something about the bond rather than other confounds, for example; (2) The vTRAP method is fancy but what is it really adding? (3) The authors interpret the transcriptomic differences as promoting the ability to form a new bond but there are probably other processes that are contributing to the differences in gene expression. Some of the differentially expressed genes could be involved in promoting a new pair bond, but there could also be a signature of the memory of the identity of the partner, the signature of the bond itself, etc. (4) Some of the interpretations go a little too far, especially in terms of anthropomorphism. The impact of the work includes further development of voles as an important model for studying social behavior and insights into the molecular processes important for recovering from the loss of a partner.

      We thank the reviewer for recognizing the strength of our study organism and experimental techniques as well as rigorous analyses to answer an important question about adapting to partner loss.

      Regarding the noted weaknesses:

      (1) We chose to compare opposite sex pair bonds to same sex affiliative relationships as this is the standard within the field, and we note that reviewers 1 and 3 found this to be a strength of our study design (7–11). Peer relationships in prairie voles are difficult to distinguish behaviorally from those of opposite-sex pairs (Fig 1) because both same and opposite-sex paired voles show selective preference for their pairmate and selective agression towards other voles (7). As such, the critical feature that makes pair bonding different is mating, which requires an opposite sex partner in voles, and our experiments are optimally designed to identify the longitudinal transcriptional changes that result from mating and cohabitating with an opposite-sex partner. In order to best match our two groups, only animals with a preference score >50% were included in the transcriptional experiment, ensuring that we were comparing animals with an affiliative preference for their partner - whether that individual was the same or opposite sex.

      We interpret the reviewers comment to be that they want us to compare opposite-sex-paired animals with and without bonds. This can be achieved two ways. First, we can compare to a promiscuous species, such as meadow voles, which will mate and cohabitate without forming bonds, but this is confounded by species differences in transcription that may exist independent of bonding. Second, we can compare bonded voles to the small subset that do not form bonds. While intriguing, this is experimentally challenging as only ~10-20% of males fail to form a bond when paired with a sexually receptive female (in the current study, 16% had a preference < 50% after two weeks of pairing, which is consistent with prior reports - (9–11)). Put simply, we would need to pair hundreds of voles to opportunistically collect a sufficient number of non-bonders for transcriptional assessment across our experimental conditions. While we hope to eventually be able to do such an experiment, litter sizes, consideration of animal welfare, and other constraints make this largely untenable at present.

      Data on the behavior of individuals within both types of pairs while they are co-housed is already provided via results of a partner preference test performed after 2 weeks of co-housing and prior to re-housing or separation (Fig 2B and 3B). We find the reviewer’s suggestion of finding a relationship between the transcriptional signature and the pair bonding strength an interesting question, and we undertook a preliminary analysis examining whether animals with different pair bond strength aggregate on a PCA analysis of gene expression. There was no apparent relationship, although we are performing additional analyses such as exploratory factor analysis. The fact that we have not found a relationship between the baseline partner preference and the transcription in these initial analyses is perhaps unsurprising. First, bonding may require some threshold change in gene expression, with bond strength reflected in non-genomic information, such as synapse formation or strengthening, or axonal ensheathment. Second, we only performed transcriptional analyses on animals with a baseline partner preference >50%; we would not necessarily expect a dissociation given the uniformly strong bonds across these animals.

      (2) We feel that inclusion of TRAP adds substantially to this manuscript and to our understanding of the neuromolecular underpinnings of bonding and loss in the NAc. The value of this experiment is twofold. As noted by Reviewer 3, “the TRAP approach in prairie voles is novel and will provide a great resource to the research community.” The prairie vole community has just developed its first transgenic Cre lines, which could be paired with vTRAP to query bond-associated gene expression changes exclusively in Cre-expressing neurons (15). Second, we noticed a puzzle in our tissue-level data. The majority of cells in the NAc are neurons (16, 17), and the vast majority of pair bonding studies of this region have focused on neuronal phenotypes, but our transcriptional signatures were linked to changes in glial populations. Ultimately, changes in glia are likely to act via their interactions with neurons, and vTRAP enables us to query the neuronal transcriptional changes within our data. Supporting that this provides novel insights into our datasets, when we cluster transcripts based on their expression profiles following short and long-term separation, we predict different GO terms from the tissue level and neuronally-enriched gene sets. For instance, the GO terms resulting from cluster 2 for neuronal genes (Fig 4) includes “response to amphetamine” within the top 10 results, but the same cluster of genes from tissue level sequencing predicts this GO term as the 34th result.

      (3) We agree with the reviewer that adapting to partner loss is a multifaceted process that likely engages numerous biological and emotional systems in voles. The explanation we offer for the transcriptional changes during loss is based on previous work in the field and is one possible interpretation. We will expand on this point during revision of the manuscript.

      (4) We thank the reviewer for encouraging us to be objective with our interpretations. We will address this comment during revision of the manuscript.

      Finally, we thank the reviewer for recognizing the value of our study for not only the field of voles but the bereavement field more broadly.

      Reviewer #3 (Public Review):

      In this manuscript, the authors investigate the behavioral and brain transcriptional alterations associated with short- and long-term partner separation in the monogamous male prairie vole. Male prairie voles continue to show affiliative behavior after short- (2 days) and long-term (4-weeks) partner separation, with similar effects for same and opposite-sex pairs. However, the transcriptional signature in the nucleus accumbens exhibits marked alterations after long-term separation.

      Strengths:

      1) A key strength of this manuscript is its use of the monogamous prairie vole to investigate transcriptional alterations associated with pair bonding and subsequent pair separation. This sort of behavior cannot be investigated in typical rodent model systems (e.g., mice, rats), and the choice of using prairie voles allows for dissection of potential mechanisms of social bonding with relevance to partner loss in humans.

      2) Investigation of behavioral measures and transcriptional alterations at both short- and long-term time points after pairing and separation is a strength of the manuscript. These time points were selected based on previous studies in laboratory and wild prairie voles related to the time it takes to form a pair bond and for the male prairie vole to leave the nest after the loss of the female pair. The datasets generated will be of great use to the scientific community.

      3) The authors investigate the behavior and transcriptional profiles after same-sex as well as opposite-sex pairing. This is considered a thoughtful decision on the authors' part which allows them to tease apart the effects of same vs. opposite sex.

      4) The use of numerous behavioral measures to assess both affiliative and aggressive behaviors is a strength of the approach.

      5) The authors use many biostatistical approaches (e.g., RRHO, WGCNA, Enrichr) to probe the transcriptomics data. These approaches allow the authors to move beyond simply assessing transcriptional profiles separately, but to look for patterns that are similar or different across datasets.

      6) The authors use rigorous statistical methods to assess behavioral measures.

      7) The TRAP approach in prairie voles is novel and will provide a great resource to the research community.

      Weaknesses:

      1) The methods state that prairie voles were treated differently in the behavioral and transcriptomics studies. Specifically, for the separation in the behavioral studies, prairie voles were separated by sight, but not necessarily by the smell from partners (i.e., partners were kept ~1 foot apart). However, prairie voles in the transcriptomics studies were separated by both sight and smell (i.e., partners were sacrificed after separation). Thus, it is possible that the lack of degradation of pair bond-related behavior after long-term separation might be due to these prairie voles being able to smell their partners after separation. This is considered a moderate flaw in the design of the studies which limits the integration of results between behavior and transcriptomics. This might be why the authors do not see a strong behavioral degradation of pair bond-related behavior after long-term separation but do see a strong transcriptional signature.

      2) While RRHO is helpful to assess overall patterns of transcriptional signatures across datasets, its utility for determining the exact transcripts is limited. This is because of how RRHO determines the overlapping transcripts for its Venn diagram feature (by taking the point where the p-value is most significant and taking the list to the outside corner of that quadrant).

      3) TRAP expression was verified in only one animal. Thus, the approach has not been appropriately confirmed.

      We thank the reviewer for their thoughtful comments on the innovative strengths and advantages of our manuscript.

      Regarding the noted weaknesses:

      (1) Please see our response to Reviewer #1, who shares your concerns.

      (2) We agree that RRHO is particularly useful for assessment of overall patterns. We interpret the Reviewer’s comment to mean that when extracting the overlapping gene lists from an RRHO quadrant for downstream analyses, we should filter that list for genes whose differential expression passes a nominal p-value cutoff to reduce the amount of biologically insignificant conclusions we are drawing from the RRHO data. Our initial analyses used just such a threshold-based approach by identifying GO terms via differentially expressed genes of the combined pair bond (Figure 2) using both p-value and log2Fold cutoffs. This analysis revealed a number of terms associated with glial cell proliferation, differentiation, and function (Fig 2H). Such processes occur over a time frame of days to weeks, with different phases of differentiation characterized by different gene expression profiles. To explore this further, we used the genes in the UU and DD RRHO quadrants without implementing a p-value cutoff to see if additional genes associated with these GO-identified pathways may be showing subtle but consistent directional changes (Fig 3). Importantly, we only use the overlapping RRHO gene lists to determine how previously defined biological processes via DEG-predicted GO terms change across conditions; we are not using the RRHO gene lists to generate new GO terms. This allowed us to look for patterns within the identified pathways that may give insight into how transcription might be affecting gliogenesis. This analysis was similarly suggested to us from other experienced users of RRHO plots (see Acknowledgements). There are also several published studies that use RRHO UU and DD quadrant overlap (18–22).

      (3) Most labs rarely confirm Cre-dependence of vectors in more than one or two animals as the results, including those shown in Fig S9A, are typically definitive (i.e. no expression in the absence of Cre, expression in the presence of Cre). In addition to the images shown in figure S9A, we used fluorescent guided dissection to harvest tissue/mRNA, serving as an additional visual confirmation of RPL10-GFP expression in the animals used to generate Figure 4. Since submission, we have also confirmed that this vector also expresses in rats when Cre-recombinase is present. However, prior to resubmission, we will perform additional surgeries to confirm that TRAP is only expressed in the presence of Cre-recombinase.

      References

      1. J. L. Scribner, E. A. Vance, D. S. W. Protter, W. M. Sheeran, E. Saslow, R. T. Cameron, E. M. Klein, J. C. Jimenez, M. A. Kheirbek, Z. R. Donaldson, A neuronal signature for monogamous reunion. Proceedings of the National Academy of Sciences. 117, 11076–11084 (2020).
      2. A. M. Borie, S. Agezo, P. Lunsford, A. J. Boender, J.-D. Guo, H. Zhu, G. J. Berman, L. J. Young, R. C. Liu, Social experience alters oxytocinergic modulation in the nucleus accumbens of female prairie voles. Current Biology. 32, 1026-1037.e4 (2022).
      3. E. S. Calipari, R. C. Bagot, I. Purushothaman, T. J. Davidson, J. T. Yorgason, C. J. Peña, D. M. Walker, S. T. Pirpinias, K. G. Guise, C. Ramakrishnan, K. Deisseroth, E. J. Nestler, In vivo imaging identifies temporal signature of D1 and D2 medium spiny neurons in cocaine reward. Proc. Natl. Acad. Sci. U.S.A. 113, 2726–2731 (2016).
      4. M.-F. O’Connor, D. K. Wellisch, A. L. Stanton, N. I. Eisenberger, M. R. Irwin, M. D. Lieberman, Craving love? Enduring grief activates brain’s reward center. NeuroImage. 42, 969–972 (2008).
      5. T. Hikida, S. Yao, T. Macpherson, A. Fukakusa, M. Morita, H. Kimura, K. Hirai, T. Ando, H. Toyoshiba, A. Sawa, Nucleus accumbens pathways control cell-specific gene expression in the medial prefrontal cortex. Sci Rep. 10, 1838 (2020).
      6. C. Baimel, L. M. McGarry, A. G. Carter, The Projection Targets of Medium Spiny Neurons Govern Cocaine-Evoked Synaptic Plasticity in the Nucleus Accumbens. Cell Reports. 28, 2256-2263.e3 (2019).
      7. N. S. Lee, N. L. Goodwin, K. E. Freitas, A. K. Beery, Affiliation, aggression, and selectivity of peer relationships in meadow and prairie voles. Frontiers in Behavioral Neuroscience. 13 (2019), doi:10.3389/fnbeh.2019.00052.
      8. O. J. Bosch, H. P. Nair, T. H. Ahern, I. D. Neumann, L. J. Young, The CRF System Mediates Increased Passive Stress-Coping Behavior Following the Loss of a Bonded Partner in a Monogamous Rodent. Neuropsychopharmacology. 34, 1406–1415 (2009).
      9. O. J. Bosch, J. Dabrowska, M. E. Modi, Z. V. Johnson, A. C. Keebaugh, C. E. Barrett, T. H. Ahern, J. Guo, V. Grinevich, D. G. Rainnie, I. D. Neumann, L. J. Young, Oxytocin in the nucleus accumbens shell reverses CRFR2-evoked passive stress-coping after partner loss in monogamous male prairie voles. Psychoneuroendocrinology. 64, 66–78 (2016).
      10. A. J. Grippo, B. S. Cushing, C. S. Carter, Depression-like behavior and stressor-induced neuroendocrine activation in female prairie voles exposed to chronic social isolation. Psychosomatic Medicine. 69, 149–157 (2007).
      11. A. J. Grippo, D. Gerena, J. Huang, N. Kumar, M. Shah, R. Ughreja, C. Sue Carter, Social isolation induces behavioral and neuroendocrine disturbances relevant to depression in female and male prairie voles. Psychoneuroendocrinology (2007), doi:10.1016/j.psyneuen.2007.07.004.
      12. J. R. WILLIAMS, C. S. CARTER, T. INSEL, Partner Preference Development in Female Prairie Voles Is Facilitated by Mating or the Central Infusion of Oxytocin. Annals of the New York Academy of Sciences. 652, 487–489 (1992).
      13. C. Sue Carter, A. Courtney Devries, L. L. Getz, Physiological substrates of mammalian monogamy: The prairie vole model. Neuroscience and Biobehavioral Reviews. 19, 303–314 (1995).
      14. L. L. Getz, C. S. Carter, L. Gavish, The mating system of the prairie vole, Microtus ochrogaster: Field and laboratory evidence for pair-bonding. Behavioral Ecology and Sociobiology. 8, 189–194 (1981).
      15. K. Horie, K. Inoue, S. Suzuki, S. Adachi, S. Yada, T. Hirayama, S. Hidema, L. J. Young, K. Nishimori, Oxytocin receptor knockout prairie voles generated by CRISPR/Cas9 editing show reduced preference for social novelty and exaggerated repetitive behaviors. Horm Behav. 111, 60–69 (2019).
      16. K. E. Savell, J. J. Tuscher, M. E. Zipperly, C. G. Duke, R. A. Phillips, A. J. Bauman, S. Thukral, F. A. Sultan, N. A. Goska, L. Ianov, J. J. Day, A dopamine-induced gene expression signature regulates neuronal function and cocaine response. Sci Adv. 6, eaba4221 (2020).
      17. D. Avey, S. Sankararaman, A. K. Y. Yim, R. Barve, J. Milbrandt, R. D. Mitra, Single-Cell RNA-Seq Uncovers a Robust Transcriptional Response to Morphine by Glia. Cell Reports. 24, 3619-3629.e4 (2018).
      18. S. L. Fulton, S. Mitra, A. E. Lepack, J. A. Martin, A. F. Stewart, J. Converse, M. Hochstetler, D. M. Dietz, I. Maze, Histone H3 dopaminylation in ventral tegmental area underlies heroin-induced transcriptional and behavioral plasticity in male rats. Neuropsychopharmacology. 47, 1776 (2022).
      19. S. G. Caradonna, T.-Y. Zhang, N. O’Toole, M.-J. Shen, H. Khalil, N. R. Einhorn, X. Wen, C. Parent, F. S. Lee, H. Akil, M. J. Meaney, B. S. McEwen, J. Marrocco, Genomic modules and intramodular network concordance in susceptible and resilient male mice across models of stress. Neuropsychopharmacol. 47, 987–999 (2022).
      20. J. S. Wang, T. Kamath, C. M. Mazur, F. Mirzamohammadi, D. Rotter, H. Hojo, C. D. Castro, N. Tokavanich, R. Patel, N. Govea, T. Enishi, Y. Wu, J. da Silva Martins, M. Bruce, D. J. Brooks, M. L. Bouxsein, D. Tokarz, C. P. Lin, A. Abdul, E. Z. Macosko, M. Fiscaletti, C. F. Munns, P. Ryder, M. Kost-Alimova, P. Byrne, B. Cimini, M. Fujiwara, H. M. Kronenberg, M. N. Wein, Control of osteocyte dendrite formation by Sp7 and its target gene osteocrin. Nat Commun. 12, 6271 (2021).
      21. D. A. Gallegos, M. Minto, F. Liu, M. F. Hazlett, S. Aryana Yousefzadeh, L. C. Bartelt, A. E. West, Cell-type specific transcriptional adaptations of nucleus accumbens interneurons to amphetamine. Mol Psychiatry, 1–15 (2022).
      22. B. J. Hilton, A. Husch, B. Schaffran, T. Lin, E. R. Burnside, S. Dupraz, M. Schelski, J. Kim, J. A. Müller, S. Schoch, C. Imig, N. Brose, F. Bradke, An active vesicle priming machinery suppresses axon regeneration upon adult CNS injury. Neuron. 110, 51-69.e7 (2022).
    1. Author Response

      Reviewer #1 (Public Review):

      In this paper the authors present variations in carbon oxidation state and hydration state in proteomes available in RefSeq. Then they use this information to predict community level proteomes, and their corresponding carbon oxidation states and hydration states, based on available 16S rRNA gene sequences from selected previously published datasets. When combining this with information about the environmental setting of the individual samples analyzed, the authors are able to demonstrate connections between redox conditions and proteomic carbon oxidation state and hydration state. Furthermore, they explore how individual taxonomic groups at different taxonomic levels contribute to forming these connections.

      A weakness with the study is that the described environmental proteomes are inferred from 16S rRNA gene sequence data and not observed directly. However, there is good reason to believe that the conclusions drawn in the paper are valid.

      The study sheds light on microbial adaptations on the genome level that so far have received relatively little attention. The paper is also interesting from an ecological perspective regarding the general question of how microbial communities are shaped by environmental settings.

      To attempt to bring more attention to environmental constraints, a plot (Figure 4E in the published paper) was redrawn to more clearly show how carbon oxidation state of estimated community proteomes not only is lower in more reducing conditions for a variety of environments but also shows the largest differences for hydrothermal systems and shale-gas wells. This finding is discussed in terms of geological sources of reductants and provides new evidence that the chemical makeup of microbial communities reflects their geological context.

      Reviewer #2 (Public Review):

      This manuscript mainly investigated the carbon oxidation and stoichiometric hydration states of the inferred community proteomes according to 16S rRNA gene compositions from the published datasets and explored their potential associations with environmental parameters such as redox gradients, oxygen concentrations and salinity.

      Predictions of the carbon oxidation and stoichiometric hydration states on the basis of microbial proteomes can provide some meaningful information for disentangling microbial response to environmental changes. As we know, some genes in microbial genomes are not expressed and transformed to proteins. Therefore, such gene redundancy in genomes may lead to bias in predicting the carbon oxidation and stoichiometric hydration states.

      Our study uses available data sources to identify informative differences of elemental compositions of proteomes predicted from genomes. There are numerous examples in the literature of using protein sequences predicted from genomes to make comparisons of amino acid composition (for example, in eLife: https://doi.org/10.7554/eLife.57347), so it would appear to be acceptable with some level of uncertainty to use genomic data to make comparisons between (amino acid or elemental) compositions of predicted proteomes.

      Furthermore, this study compiled many 16S rRNA gene datasets from previous studies. Different primer sets were applied in those studies, and such difference will result in distinct 16S rRNA gene compositions. Accordingly, it is essential to deal with the influence of different primer sets on the 16S rRNA gene compositions among samples. Unfortunately, such information is missing in the method section.

      Primer sets used in the source studies have been added to Table 1 in the published paper. The Discussion was modified to acknowledge limitations in making comparisons *between* datasets obtained using different primers. However, the main results of this study are based on differences of carbon oxidation state (Zc) *within* individual datasets (for instance, along the vertical redox gradients shown in Figure 3).

      The intra-dataset differences of Zc themselves are compared across datasets in Figure 4E. However, it can be expected that the effects of technical variability – including not only primer pairs but also DNA extraction methods, etc. – would tend to be reduced in these inter-dataset comparisons of intra-dataset differences, in contrast to direct inter-dataset comparisons. The index plot at the center of Figure 2 does make a direct inter-dataset comparison, but the outcome is consistent with trends identified in previous analyses of shotgun metagenomic datasets, 16S primers and other technical differences between studies notwithstanding.

      Additionally, the community proteomes in this study were inferred from 16S rRNA genes. The marker gene of 16S rRNA cannot well predict their corresponding genomes, possibly leading to prediction of biased proteomes. Therefore, it should avoid to use 16S rRNA genes for predicting microbial genomes and proteomes.

      Despite the various sources of uncertainty in making estimates of elemental composition of communities from 16S rRNA genes and reference proteomes, comparisons with shotgun metagenomic data support the reliable identification of trends within datasets (Figure 5 in the published paper).

      It seems that the relationships between carbon oxidation states/stoichiometric hydration state and redox/salinity gradients have been reported in previous studies (e.g., Dick et al 2019, 2020, 2021). The finding of this study is not new in comparison with the previously reported.

      The explorations in previous studies of chemical links between communities and environments were based on analysis of shotgun metagenomic data. The ability to reproduce those findings by analyzing 16S rRNA gene sequence data is a new advance in this study.

      Other new results in the published paper are the different magnitudes of Zc differences in various environments (which were not previously documented from shotgun metagenomes; Figure 4E) and the comparison of shotgun metagenome and 16S-based estimates of Zc for the time series of injected fluids in the Marcellus Shale (Figure 5B). The latter results are particularly interesting; the close correspondence for Days 0, 7, and 13 supports the basic reliability of the 16S-based estimates, while the increasing divergence at Days 82 and 328 suggests the onset of some interfering mechanisms (the speculation is made that this could be related to viral lysis and heterotrophic degradation of the released DNA). Also, the published paper presents the first analysis of carbon oxidation state of proteins – from either shotgun metagenome sequences or 16S rRNA-based estimates – for microbial communities in various body sites using data from the Human Microbiome Project (Figure 5D).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by de la Vega and colleagues describes Neuroscout, a powerful and easy-to-use online software platform for analyzing data from naturalistic fMRI studies using forward models of stimulus features. Overall, the paper is interesting, clearly written, and describes a tool that will no doubt be of great use to the neuroimaging community. I have just a few suggestions that, if addressed, I believe would strengthen the paper.

      Major comments

      1) How does Neuroscout handle collinearity among predictors for a given stimulus? Does it check for this and/or throw any warnings? In media stimuli that have been adopted for neuroimaging experiments, low-level audiovisual features are not infrequently correlated with mid-level features such as the presence of faces on screen (see Grall & Finn, 2022 for an example involving the Human Connectome Project video clips). How to disentangle correlated features is a frequent concern among researchers working with naturalistic data.

      We agree with the reviewer that collinearity between predictors is one of the biggest challenges for naturalistic data analysis. However, absent consensus on how to best model these data, we find that it is out of scope of the present report to make strong recommendations. Instead, our goal was to design an agnostic platform that would enable users to thoughtfully design statistical models for their particular goal. Papers such as Grall & Finn (2022) will be critical in advancing the debate on how to best analyze and interpret such data.

      We explicitly address this challenge in a new paragraph in the discussion under “Challenges and future directions:

      “A major challenge in the analysis of naturalistic stimuli is the high degree of collinearity between features, as the interpretation of individual features is dependent on co-occurring features. In many cases, controlling for confounding variables is critical for the interpretation of the primary feature— as is evident in our investigation of the relationship between FFA and face perception. However, it can also be argued that in dynamic narrative driven media (i.e. films and movies), the so-called confounds themselves encode information of interest that cannot or should not be cleanly regressed out (Grall & Finn, 2022).[…] Absent a consensus on how to model naturalistic data, we designed Neuroscout to be agnostic to the goals of the user and empower them to construct sensibly designed models through comprehensive model reports. An ongoing goal of the platform—especially as the number of features continues to increase—will be to expand the visualizations and quality control reports to enable users to better understand the predictors and their relationship. For instance, we are developing an interactive visualization of the covariance between all features in Neuroscout that may help users discover relationships between a predictor of interest and potential confounds.” (pg. 11)

      Note we shortened the second paragraph of the discussion by two sentences as it had touched on this subject, and was better addressed separately.

      In addition, we ensured to highlight the covariance structure visualization in the Results section:

      “At this point, users can inspect the model through quality-control reports and interactive visualizations of the design matrix and predictor covariance matrix, iteratively refining models if necessary.” (pg. 3)

      2) On a related note, do the authors and/or software have opinions about whether it is moreappropriate to run several regressions each with a single predictor of interest or to combine all predictors of interest into a single regression? (Or potentially a third, more sophisticated solution involving variance partitioning or another technique to [attempt to] isolate variance attributable to each unique predictor?) Does the answer to this depend on the degree of collinearity among the predictors? Some discussion of this would be helpful, as it is a frequent issue encountered when analyzing naturalistic data.

      This is a very sensitive methodological point, but one for which it is hard to find a univocal answer in the literature. While on the one hand it can be deceptive to model a single feature in isolation (as illustrated by our face perception analyses), more complex models pose different challenges in terms of robust parameter estimation and variance attribution. Resolving these challenges goes beyond the scope of our work, and it is ultimately our goal to provide a flexible tool which will enable these types of investigations, and enable users to take responsibility and provide motivations for methodological choices made using the platform. We touch on Neuroscout’s agnostic philosophy on this issue under “Challenges and future directions” (pg. 11; quoted above).

      However, we also agree that in part the solution to this problem will be methodological. This is particularly true for modeling deep learning based embeddings, which can have hundreds of features in a single model. We are currently working on expanding beyond traditional GLM models in Neuroscout, opening the door to more sophisticated variance partitioning techniques, and more robust parameter estimation in complex models. We highlight current and future efforts to expand Neuroscout’s statistical models in the following paragraph:

      “However, as the number of features continues to grow, a critical future direction for Neuroscout will be to implement statistical models which are optimized to estimate a large number of covarying targets. Of note are regularized encoding models, such as the banded-ridge regression as implemented by the Himalaya package. These models have the additional advantage of implementing feature-space selection and variance partitioning methods, which can deal with the difficult problem of model selection in highly complex feature spaces such as naturalistic stimuli. Such models are particularly useful for modeling high-dimensional embeddings, such as those produced by deep learning models. Many such extractors are already implemented in pliers and we have begun to extract and analyze these data in a prototype workflow that will soon be made widely available. “ (pg. 11)

      3) What the authors refer to as "high-level features" - i.e., visual categories such as buildings,faces, and tools - I would argue are better described as "mid-level features", reserving the term "high-level" for features that are present only in continuous, engaging, narrative or narrative-like stimuli. Examples: emotional tone or valence, suspense, schema for real-world situations, other operationalizations of a narrative arc, etc. After all, as the authors point out, one doesn't need naturalistic paradigms to study brain responses to visual categories or single-word properties. Much of the work that has been done so far with forward models of naturalistic stimuli has been largely confirmatory (e.g., places/scenes still activate PPA even during a rich film as opposed to a serial visual presentation paradigm). This is a good first step, but the promise of naturalistic paradigms is ultimately to go beyond these isolated features toward more holistic models of cognitive and affective processes in context. One challenge is that extracting true high-level features is not easily automated, although the ability to crowdsource human ratings using online data collection has made it feasible to create manual annotations. However, there are still technical challenges associated with collecting continuous-response measurement (CRM) data during a relatively long stimulus from a large number of individuals online. Does Neuroscout have any plans to develop support for collecting CRM data, perhaps through integration with Amazon MTurk and/or Prolific? Just a thought and I am sure there are a number of features under consideration for future development, but it would be fabulous if users could quickly and easily collect CRM data for high-level features on a stimulus that has been uploaded to Neuroscout (and share these data with other end users).

      The reviewer makes a very good point regarding the fact that many so-called “high-level” features are best called “mid-level”. As such, we have changed our use of “high-level” to “mid-level perceptual features” throughout the manuscript.

      “Currently available features include hundreds of predictors coding for both low-level (e.g., brightness, loudness) and mid-level (e.g., object recognition indicators) properties of audiovisual stimuli…” (pg. 3)

      That said, we do believe that as machine learning (and in particular deep learning) models evolve, it will become more feasible to extract higher level features automatically. This has already been shown with transformer language models, which are able to extract higher-level semantic information from natural text. To this end, we have ensured to design our underlying feature extraction platform, pliers, to be easily extensible, to ensure the continued growth of the platform as algorithms evolve. We ensure to highlight this in the Results section ‘Automated annotation of stimuli’:

      “The set of available predictors can be easily expanded through community-driven implementation of new pliers extractors, as well as public repositories of deep learning models, such as HuggingFace and TensorFlowHub. We expect that as machine learning models continue to evolve, it will be possible to automatically extract higher-level features from naturalistic stimuli.” (pg. 3)

      We also ensured to highlight the extensibility of pliers to increasingly power deep learning models in the Discussion by revising this sentence

      “As a result, we have designed Neuroscout and its underlying feature extraction framework pliers to facilitate community-led expansion to novel extractors— made possible by the rapid increase in public repositories of pre-trained deep learning models such as HuggingFace and TensorFlow Hub” (pg. 10)

      As to the point of a potential extension to Neuroscout for easily collecting crowd source stimuli annotations, we are in full agreement that this would be very useful. In fact, this feature was part of the original plan for Neuroscout, but fell out of scope as other features took priority. Although we are unsure if this extension is a short term priority for the Neuroscout team (as it likely would take substantial effort to develop a general purpose extension), the ability to submit user-generated features to the Neuroscout API should make it possible to design a modular extension to Neuroscout to collect such features.

      We mention this possibility briefly in the future directions section:

      “Other important expansions include facilitating analysis execution by directly integrating with cloud-based neuroscience analysis platforms (such as Brainlife.io) and facilitating the collection of higher-level stimulus features by integrating with crowdsourcing platforms such as MechanicalTurk or Prolific.” (pg. 11)

      4) Can the authors talk a bit more about the choice to demean and rescale certain predictors, namely the word-level features for speech analysis? This makes sense as a default step, but I wonder if there are situations in which the authors would not recommend normalizing features prior to computing the GLM (e.g., if sign is meaningful, if the distribution of values is highly skewed if the units reflect absolute real-world measurements, etc). Does Neuroscout do any normalization automatically under the hood for features computed using the software itself and/or features that have been calculated offline and uploaded by the user?

      In keeping with Neuroscout’s philosophy to be a general purpose platform, we have not performed any standardization of features. Instead, users can choose to modify raw predictor values by applying transformations on a model-by-model basis. Currently available transformations through the web interface include: scale, orthogonalize and threshold. Note that there is a wider range of transformations available in the BIDS Stats Model, but we are hesitant to advertise these yet, as they are more difficult to use.

      We revised our description of transformations in the Result section to clarify these transformations are model specific:

      “Raw predictor values can be modified by applying model-specific transformations such as scaling, thresholding, orthogonalization, and hemodynamic convolution.” (pg. 3)

      We also clarify that variables are ingested without any in-place modifications in the Methods section. The only exception is that we down-sample highly dense variables (such as those from auditory files, which can result in thousands of value per second), to save disk space:

      “Feature values are ingested directly with no in place modifications, with the exception of down sampling of temporally dense variables to 3hz to reduce storage on the server.” (pg. 13)

      With respect to the word frequency analysis, the primary reason we scaled variables was to facilitate imputing missing values for words not found in the look-up dictionary. By scaling the variable, we were able to replace missing values with zero, effectively assigning them the average word frequency value. We clarified this strategy in the Methods section:

      “In all analyses, this variable was demeaned and rescaled prior to HRF convolution. For a small percentage of words not found in the dictionary, a value of zero was applied after rescaling, effectively imputing the value as the mean word frequency.” (pg. 17)

      On a more general note, when interpreting a single variable with a dummy coded contrast (i.e. 1 for the predictor of interest, and 0 for all other variables), it’s not necessary to normalize features prior to modeling, as fMRI t-stat maps are scale-invariant (although the parameter estimates will be affected).

      We added a note with our recommendations in the Neuroscout Documentation: https://neuroscout.github.io/neuroscout-docs//web/builder/transformations.html#scale

      Reviewer #2 (Public Review):

      The authors present a new platform for constructing and sharing fMRI analyses, specifically geared toward analyzing publicly-available naturalistic datasets using automatically-extracted features. Using a web interface, users can design their analysis and produce an executable package, which they can then execute on their local hardware. After execution, the results are automatically uploaded to NeuroVault. The paper also describes several examples of analyses that can be run using this system, showing how some classical feature-sensitive ROIs can be derived from a meta-analysis of naturalistic datasets.

      The Neuroscout system is impressive in a number of ways. It provides easy access to a number of publicly-available datasets (though I would like to see the current set of 13 datasets increase in the future), has a wide variety of machine-learning features precomputed on the video and audio features of these stimuli, and builds on top of established software for creating and sandboxing analysis workflows. Performing meta-analyses across multiple datasets are challenging both practically and statistically, but this kind of multi-dataset analysis is easy to specify using Neuroscout. It also allows researchers to easily share a reproducible version of their pipeline simply by pointing to the publicly-available analysis package hosted on Neuroscout. The platform also provides a way for researchers to upload their own custom models/predictors to extend those available by default.

      The case studies described in the paper are also quite interesting, showing that traditional functional ROIs such as PPA and VWFA can be defined without using controlled stimuli. They also show that, running a contrast for faces does not produce FFA until speech (and optionally adaptation) is properly controlled for, and that VWFA shows relationships to lexical processing even for speech stimuli.

      I have some questions about the intended workflow for this tool: is Neuroscout meant to be used for analysis development in addition to sharing a final pipeline? The fact that the whole analysis is packaged into a single command is excellent for reproducibility but seems challenging to use when iterating on a project. For example, if we wanted to add another contrast to a model, it appears that this would require cloning the analysis and re-starting the process from scratch.

      An important principle of Neuroscout from the onset of the project was to minimize undocumented researcher degrees of freedom, and maximize transparency in order to reduce the file drawer effect which can contribute to biased results in the published literature. As such, we require analyses to be registered and locked as the modal usage of our application. In the case of adding a contrast, it is true that this would require a user to clone the analysis. Although all of the information from the previous model would be encoded in the new model, this would require re-estimating the design matrix which could be time consuming. However, in our experience, users almost always add new variables to the design-matrix when a study is cloned, which would in any case require re-estimating the design matrix for all runs and subjects. We believe this trade-off is worthwhile to ensure maximal reproducibility, but also point out that since Neuroscout’s data is freely available via our API, power users could directly access the data if they need to use it in a less constrained manner.

      We believe that these important distinctions are best addressed in the newly developed Neuroscout documentation which we now reference throughout the text (https://neuroscout.org/docs/web/browse/clone.html).

      I'm also unsure about how versioning of the input datasets and the predictors is planned to be handled by the platform; if datasets have been processed with multiple versions of fmriprep, will all of those options be available to choose from? If the software used to compute features is updated, will there be multiple versions of the features to choose from?

      The reviewer makes an astute observation regarding the versions of input data (predictors & datasets). Currently we have only pre-processed the imaging data once per data, and as such this has not been an issue. However, in the long run we certainly agree this would be important to give users the ability to choose which pre-processed version of the raw dataset they want to use, as certainly there could be differing but equally valid versions. We have opened an issue in Neuroscout’s repository to track this issue, and plan to incorporate this ability in a future version (https://github.com/neuroscout/neuroscout/issues/1076).

      With respect to feature versions, every time a feature is re-extracted, a new predictor_id is generated, and the accompanying meta-data such as time of extraction is tracked for that specific version. As such, if a feature is updated and re-extracted, this will not change existing analyses. By default, we have chosen to obscure this from the user to make the user experience simpler. However, there is an open issue to expand the frontend’s ability to explicitly display different versions, and allow users to update older analyses with newer versions of features. Advanced users already have access to this functionality by using the Python API (PyNS) to directly access all features, and create analyses with more precision.

      We have made a note regarding this behavior in the Neuroscout Documentation: https://neuroscout.github.io/neuroscout-docs/web/builder/predictors.html

      I also had some difficulty attempting to test out the platform, so additional user testing may be necessary to ensure that novice users are able to successfully run analyses.

      We thank the reviewer for this bug report, which allowed us to fix a previously unnoticed issue with a subset of Neurosout datasets. We have been incontact with the reviewer to ensure that this issue was successfully addressed.

    1. Author Response

      Reviewer #1 (Public Review):

      1) While the authors identify the suppressors in known genetic interactors (GIs) of the yeast SEC53, it is worth testing if the compensatory mutations are rewiring the GIs, thereby explaining the lack of comparable compensations observed in reconstituted strains. If altered GIs explain the suppression, then while yeast serves as an excellent tool to perform these assays, the human context of the disease may require a different set of genetic suppressors and, therefore, a different target than the yeast PGM1 ortholog.

      Our data show that pgm1 mutations alone greatly improve growth of sec53-V238M strains. Our data also indicate other pathways of compensation. Whether each of these compensatory mechanisms translate to humans is unknown. However, the observed enrichment of compensatory mutations in genes whose human homologs are associated with Type 1 CDG, suggests that many of these genetic interactions are likely to be conserved.

      Also, are Sec53 and Pgm1 proteins directly interacting in yeast and whether these mutations are on the interaction interface?

      As we mention above, there is no support for a direct physical interaction between Sec53 and Pgm1.

      2) Based on the data obtained between pACT1 and pSEC53-driven expression of the SEC53 mutant alleles, the pattern of suppressors appears to be different. Authors report that the variants expressed from strong pACT1 promoters show more suppressors than those driven by native promoters. Is this a general trend in experimental evolution that slower-growing strains tend to show lesser suppressors? For example, on Page 6, line 154, "compensating for Sec53-F126L dimerization defects are rare or not easily accessible". The statement suggests that the authors did obtain suppressors that compensate for the dimerization defect. At the same time, while rare (also, are authors suggesting suppression of dimerization defect as in better dimerization?), the rate of obtaining suppressors seems to be linked to the severity of the fitness defects of the strains. The lack of suppressors may be a limitation of the evolution experiments. Indeed later in the manuscript, the authors noticed that while PGM1 suppressors obtained in V238M can also suppress F126L alleles, the suppression was not as efficient. Could it be that evolution experiments in slower-growing strains predominantly enrich suppressors in other pathways (i.e., not in the CDG orthologs) that restore the growth better and compete out the relatively weaker suppressors in PGM1? In fact, the authors report similar effects on Page 7, lines 204-210. These two paragraphs are contradictory and should be explained further.

      All of our sequencing was performed on strains with sec53 under the control of the pACT1 promoter. While we did not identify unique sec53-F126L suppressors, we cannot exclude that sec53-F126L suppressors exist, so we describe them as “rare or not easily accessible”. While it is possible that the slower growth rate of the sec53-F126L allele could impact the likelihood of observing suppressors, we think it is more likely due to the nature of the variant (dimerization defect versus stability defect) rather than growth rate. In other laboratory evolution experiments the same beneficial mutation typically has a greater effect in slower-growing backgrounds (for example: doi.org/10.1126/science.1250939).

      3) Authors report that the LOF of PGM1 compensates for the SEC53 mutations. However, the evolution experiments did not capture any LOFs in PGM1. The fitness comparisons in evolution experiments are different as many different genotypes compete in a mix. Therefore, the fitness assays in a clonal population may not represent these differences well. To test this argument, authors can try to mimic the evolution experiments by mixing two genotypes to check competitive fitness, like the co-culture of pgm1 suppressor obtained via evolution experiments with pgm1Δ.

      Though we did not perform a direct head-to-head competition between a pgm1 suppressor and a pgm1Δ, our data suggest that the pgm1 delete would outcompete some of the lower-fitness suppressors. In the Discussion we speculate as to why we do not see deletion mutations: “Given that most of the evolved clones containing pgm1 mutations are more fit than the reconstructed strains, it is possible that other evolved mutations interact epistatically only with non-loss-of-function pgm1 mutations.”. Though it is beyond the scope of the present manuscript, it would be possible to rerun the evolution experiment in sec53-V238M strains carrying either a pgm1 missense suppressor or a pgm1Δ. Under the hypothesis of additional interacting loci, only the pgm1 missense suppressors would be more likely to acquire additional compensatory mutations.

      Reviewer #3 (Public Review):

      Vignogna et al. used yeast genetics, experimental evolution and biochemistry to tackle human congenital disorders of glycosylation (CDG), a disease mostly caused by mutations in PMM2. They took advantage of the observation that the budding yeast gene SEC53 is almost identical to human PMM2, and used experimental evolution to find interactors of SEC53/PMM2. They found an overrepresentation of mutations in genes corresponding to other human CDG genes, including PGM1. Genetic and biochemical characterizations of the pgm1 mutations were carried out. This work is solid, although authors did not reveal why reduction of pgm1 activity could compensate for defects of a particular mutant allele of sec53.

      Out of curiosity, if the authors were to simply focus on the preexisting mutations, would they have gotten the materials for most of the experiments in this article? In other words, how important is the experimental evolution?

      The evolution experiment was crucial as the specific pgm1 mutations we identified here have not been reported elsewhere, nor have the orthologous mutations been identified in human PGM1.

      A strain table with full genotypes is needed.

      We added a strain genotype table (Supplemental Dataset 2).

    1. Author Response

      Reviewer #2 (Public Review):

      In this MEG work employing two types of bistable perception test and unique regression analyses, the authors identified different neural frequencies to different components of visual perception: its content and stability.

      Strengths:

      This study has a nice set of three different experiments to clarify neural differences between content, memory and stability of visual perception.

      The state space analysis appears to be powerful to identify such different neural signatures for different cognitive components as well.

      Weaknesses:

      Despite such strengths, this work may have the somewhat critical weakness specified in the recommendations for the authors.

      First, in the analysis to identify content-specific neural frequency, the authors concluded that the SCP is more relevant to the visual perceptual content compared to the neural activity in the alpha and beta-band frequencies. In my impression, to claim this, it would be necessary to show statistically significant differences in the prediction accuracy between the SCP and the other frequencies. Given the not-so-high prediction accuracy seen in the SCP-based analysis, such statistical supports appear essential.

      We have now directly compared decoding accuracy for SCP and alpha/beta oscillations, which showed statistically significant differences in both the ambiguous and unambiguous conditions for both ambiguous images. We have added these results as a supplementary figure (new Figure 2—figure supplement 1).

      Second, two behavioural metrics in the neural state space analysis-i.e., Switch and Direction-may be too arbitrary. As suggested by the power-law distribution of the percept duration, the neural dynamics during seemingly stable percept may not be able to be described in linear functions. Instead, the brain may go back and forth between several neural states even when we are thinking we're experiencing stable visual consciousness. If so, the current definition of the Switch metric and Direction index, which seems to be based on the behaviour of the Switch index, may be arbitrary. In other words, I feel the authors may have to elaborate the rationale for the definitions of such metrics.

      First, we note it is generally accepted in the field that the distribution of percept durations follows a gamma distribution instead of a power-law distribution (e.g., Sterzer et al., TiCS 2009; Blake & Logothetis Nature Rev. Neurosci 2002; Kleinschmidt et al., 1998; Leopold et al., TiCS 1999), and microswitches have not been reported either using the more classic task as that employed here or the more recently developed ‘no-report’ task of using eye-tracking statistics to deduce perceptual switches without overt report (e.g., Frassle et al., J Neurosci 2014).

      Second, while brain activity may fluctuate during these time periods, it never crosses the threshold of evoking a conscious report, and thus we would expect that such fluctuations, if they do occur, would be of a lower magnitude than those that do produce a conscious report.

      Most importantly, our goal here is to define behavioral metrics in order to identify components of neural dynamics underpinning the relevant aspect of behavior. As such, our definition of the behavioral metric should not be directly informed by observed spontaneous dynamics of brain activity (especially those that may be observed in the data but are of unclear relevance to perceptual switching); otherwise the analysis would be prone to circularity and spurious correlations (i.e., using observed brain dynamics to inform construction of behavioral metrics might pick up aspect of brain dynamics not really relevant to behavior in the analysis results).

      Finally, the timing characteristics of ‘Switch’ and ‘Direction’ behavioral metrics are not arbitrary; instead they are the simplest behavioral functions that allow a comparison of pre- and post-switching periods (or when the percepts might be in the ‘stabilizing’ phase vs. the ‘destabilizing’ phase). Nevertheless, the regression analysis can pick up on other temporal patterns of changes not exactly the same as our defined behavioral metric. This can be seen for SCP and beta activity projected onto the Direction axis, where it has the lowest value at ~20th percentile of the trial (not 50th percentile as assumed by the behavioral metric). To confirm that the analysis is not highly dependent on the precise timing definition of the behavioral metrics, we ran a control analysis, where the switching point was set at 30%tile (rather than 50%tile as in the original analysis). This control analysis resulted in a similar pattern of neural results (Figure R1).

      Figure R1: Changing temporal behavior definition (switching point moved from 50th percentile to 30th percentile of percept duration) does not significantly alter the neural results. Compare to Figure 4—figure supplement 1, ‘Switch’ and “Direction’ Columns.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper shows that a principled, interpretable model of auditory stimulus classification can not only capture behavioural data on which the model was trained but somewhat accurately predict behaviour for manipulated stimuli. This is a real achievement and gives an opportunity to use the model to probe potential underlying mechanisms. There are two main weaknesses. Firstly, the task is very simple: distinguishing between just two classes of stimuli. Both model and animals may be using shortcuts to solve the task, for example (this is suggested somewhat by Figure 8 which shows the guinea pig and model can both handle time-reversed stimuli).

      The task structure is indeed simple. In the context of categorization tasks that are typically used in animal experiments, however, we would argue that we are the higher end of stimulus complexity. Auditory categories used in most animal experiments typically employ a category boundary along a single stimulus parameter (for example, tone frequency or modulation frequency of AM noise). Only a few recent studies (for example, Yin et al., 2020; Town et al., 2018) have explored animal behavior with “non-compact” stimulus categories. Thus, we consider our task a significant step towards more naturalistic tasks.

      We were also faced with the practical factor of the trainability of guinea pigs (GPs). Prior to this study, guinea pigs have been trained using classical conditioning and aversive reinforcement on detecting tone frequency (e.g., Heffner et al., 1971; Edeline et al., 1993). More recently, competitive training paradigms have been developed for appetitive conditioning, using a single “footstep” sound as a target stimulus and manipulated sounds as non-target stimuli (Ojima and Horikawa, 2016). But as GPs had never been trained on more complex tasks before our study, we started with a conservative one vs. one categorization task. We mention this in the Discussion section of the revised manuscript (page 27, line 665).

      To determine whether these results hold for more complex tasks as well, after receiving the reviews of the original manuscript, we trained two GPs (that were originally trained and tested on the wheeks vs. whines task) further on a wheeks vs. many (whines, purrs, chuts) task. As earlier, we tested these GPs with new exemplars and verified that they generalized. In the figure below, the average performance of the two GPs on the regular (training) stimuli and novel (generalization) stimuli are shown in gray bars, and individual animal performances are shown as colored discs. The GPs achieved high performance for the novel stimuli, demonstrating generalization. We also implemented a 4-way WTA stage for a wheek vs. many model and verified that the model generalized to new stimuli as well.

      For frequency-shifted calls, these two GPs performed better for wheeks vs. many compared to the average for wheeks vs. whines shown in the main manuscript. The 4-way WTA model closely tracked GP behavioral trends.

      The psychometric curves for wheeks vs. many categorization in noise (different SNRs) did not differ substantially from the wheeks vs. whines task.

      We focused our one vs. many training on the two conditions that showed the greatest modulation in the one vs. one tasks. However, these preliminary results suggest that the one vs. one results presented in the manuscript are likely to extend to more complex classification tasks as well. We chose not to include these new data in the revised manuscript because we performed these experiments on only 2 animals, which were previously trained on a wheeks vs. whines task. In future studies, we plan to directly train animals on one vs. many tasks.

      Secondly, the predictions of the model do not appear to be quite as strong as the abstract and text suggest.

      We now replace subjective descriptors with actual effect size numbers to avoid overstatingresults. We also include additional modeling (classification based on the long-term spectrum) and discuss alternative possibilities to provide readers with points of comparison. Thus, readers can form their own opinions of the strengths of the observed effects.

      The model uses "maximally informative features" found by randomly initialising 1500 possible features and selecting the 20 most informative (in an information-theoretic sense). This is a really interesting approach to take compared to directly optimising some function to maximise performance at a task, or training a deep neural network. It is suggestive of a plausible biological approach and may serve to avoid overfitting the data. In a machine learning sense, it may be acting as a sort of regulariser to avoid overfitting and improve generalisation. The 'features' used are basically spectro-temporal patterns that are matched by sliding a crosscorrelator over the signal and thresholding, which is straightforward and interpretable.

      This intuition is indeed accurate – the greedy search algorithm (described in the original visionpaper by Ullman et al., 2002) sequentially adds features that add the most hits and the least false alarms compared to existing members of the MIF set to the final MIF set. The latter criterion (least false alarms) essentially guards against over-fitting for hits alone. A second factor is the intermediate size and complexity of MIFs. When MIFs are too large, there is certainly overfitting to the training exemplars, and the model does not generalize well (Liu et al., 2019).

      It is surprising and impressive that the model is able to classify the manipulated stimuli at all. However, I would slightly take issue with the statement that they match behaviour "to a remarkable degree". R^2 values between model and behaviour are 0.444, 0.674, 0.028, 0.011, 0.723, 0.468. For example, in figure 5 the lower R^2 value comes out because the model is not able to use as short segments as the guinea pigs (which the authors comment on in the results and discussion). In figure 6A (speeding up and slowing down the stimuli), the model does worse than the guinea pigs for faster stimuli and better for slower stimuli, which doesn't qualitatively match (not commented on by the authors). The authors state that the poor match is "likely because of random fluctuations in behavior (e..g motivation) across conditions that are unrelated to stimulus parameters" but it's not clear why that would be the case for this experiment and not for others, and there is no evidence shown for it.

      Thank you for this feedback. There are two levels at which we addressed these comments inthe revised manuscript.

      First, regarding the language – we have now replaced subjective descriptors with the statement that the model captures ~50% of the overall variance in behavioral data. The ~50% number is the average overall R2 between the model and data (0.6 and 0.37 for the chuts vs. purrs and wheeks vs. whine tasks respectively). We leave it to readers to interpret this number.

      Second, our original manuscript lacked clarity on exactly what aspects of the categorization behavior we were attempting to model. As recent studies have suggested, categorization behavior can be decomposed into two steps – the acquisition of the knowledge of auditory categories, and the expression of this knowledge in an operant task (Kuchibhotla et al., 2019; Moore and Kuchibhotla, 2022). Our model solely addresses how knowledge regarding categories is acquired (through the detection of maximally informative features). Other than setting a 10% error in our winner-take-all stage, we did not attempt to systematically model any other cognitive-behavioral effects such as the effect of motivation and arousal. Thus, in the revised manuscript, we have included a paragraph at the top of the Results section that defines our intent more clearly (page 5, line 117). We conclude the initial description of the behavior by stating that these factors are not intended to be captured by the model (page 6, line 171). We also edited a paragraph in the Discussion section for clarity on this point (page 26, line 629).

      In figure 11, the authors compare the results of training their model with all classes, versus training only with the classes used in the task, and show that with the latter performance is worse and matches the experiment less well. This is a very interesting point, but it could just be the case that there is insufficient training data.

      This could indeed be the case, and we acknowledge this as a potential explanation in therevised manuscript (page 22, line 537; page 27, line 653). Our original thinking was that if GPs were also learning discriminative features only using our training exemplars, they would face a similar training data constraint as well. But despite this constraint, the model’s performance is above d’=1 for natural calls – both training and novel calls; it is only the similarity with behavior on the manipulated stimuli that is lower than the one vs. many model. This phenomenon warrants further investigation.

      Reviewer #2 (Public Review):

      Kar et al aim to further elucidate the main features representing call type categorization in guinea pigs. This paper presents a behavioral paradigm in which 8 guinea pigs (GPs) were trained in a call categorization task between pairs of call types (chuts vs purrs; wheek vs whines). The GPs successfully learned the task and are able to generalize to new exemplars. GPs were tested across pitch-shifted stimuli and stimuli with various temporal manipulations. Complementing this data is multivariate classifier data from a model trained to perform the same task. The classifier model is trained on auditory nerve outputs (not behavioral data) and reaches an accuracy metric comparable to that of the GPs. The authors argue that the model performance is similar to that of the GPs in the manipulated stimuli, therefore, suggesting that the 'mid-level features' that the model uses may be similar to those exploited by the GPs. The behavioral data is impressive: to my knowledge, there is scant previous behavioral data from GPs performing an auditory task beyond audiograms measured using aversive conditioning by Heffner et al., in. 1970. [One exception that is notably omitted from the manuscript is Ojima and Horikawa 2016 (Frontiers)]. Given the popularity of GPs as a model of auditory neurophysiology these data open new avenues for investigation. This paper would be useful for neuroscientists using classifier models to simulate behavioral choice data in similar Go/No-Go experiments, especially in guinea pigs. The significance of the findings rests on the similarity (or not) of the model and GP performance as a validation of the 'intermediary features' approach for categorization. At the moment the study is underpowered for the statistical analysis the authors attempt to employ which frequently relies on non-significant p values for its conclusions; using a more sophisticated approach (a mixed effects model utilizing single trial responses) would provide a more rigorous test of the manipulations on behavior and allow a more complete assessment of the authors' conclusions.

      We thank the reviewer for their feedback and the suggestion for a more robust statistical approach. We have now replaced the repeated measures ANOVA based statistics for the behavior and model where more than 2 test conditions were presented (SNR, segment length, tempo shift, and frequency shift) with generalized linear models with a logit link function (logistic activation function). In these models, we predict the trial-by-trial behavioral or model outcome from predictors including stimulus type (Go or Nogo), parameter value (e.g., SNR value), parameter sign (e.g., positive or negative freq. shift), and animal ID as a random effect. To evaluate whether parameter value and sign had a significant contribution to the model, we compare this ‘full’ model against a null model that only has stimulus type as a predictor and animal ID as a random effect. These analyses are described in detail in the Materials and Methods section of the revised manuscript (page 36, line 930).

      These analyses reveal significant effects of segment length changes, and weak effects of tempo changes on behavior (as expected by the reviewer). Both the behavior and model showed similar statistical significance (except tempo shift for wheeks vs. whines) for whether performance was significantly affected by a given parameter.

      The behavioral data presented here are descriptive. The central conceptual conclusions of the manuscript are derived from the comparison between the model and behavioral data. For these comparisons, the p-value of statistical tests is not used. We realized that a description of how we compared model and behavioral data was not clear in the original manuscript. To compare behavioral data with the model, we fit a line to the d’ values obtained from the model plotted against the d’ values obtained from behavior, and computed the R2 value. We used the mean absolute error (MAE) to quantify the absolute deviation between model and behavior d’ values. Thus, high R2 values would signify a close correspondence between the model and behavior regardless of statistical significance of individual data points. We now clarify this in page 12, line 289. We derive R2 values for individual stimulus manipulations, as well as an overall R2 by pooling across all manipulations (presented in Fig. 11). This is now clarified in page 21, line 494.

      Reviewer #3 (Public Review):

      The authors designed a behavioral experiment based on a Go/ No-Go paradigm, to train guinea pigs on call categorization. They used two different pairs of call categories: chuts vs. purrs and wheeks vs. whines. During the training of the animals, it turned out that they change their behavioral strategies. Initially, they do not associate the auditory stimuli with rewards, and hence they overweight the No-Go behavior (low hit and false alarm rate). Subsequently, they learned the association between auditory stimuli and reward, leading to overweighting the Go behavior (high hit and false alarm rates). Finally, they learn to discriminate between the two call categories and show the corresponding behaviors, i.e. suppress the Go behavior for No-go stimuli (improved discrimination performance due to stable hit rates but lower false alarm rates).

      In order to derive a mechanistic explanation of the observed behaviors, the authors implemented a computational feature-based model, with which they mirrored all animal experiments, and subsequently compared the resulting performances.

      Strengths:

      In order to construct their model, the authors identified several different sets of so-called MIFs (most informative features) for each call category, that were best suited to accomplish the categorization task. Overall, model performance was in general agreement with behavioral performance for both the chuts vs. purrs and wheeks vs. whines tasks, in a wide range of different scenarios.

      Different instances of their model, i.e. models using different of those sets of MIFs, performed equally well. In addition, the authors could show that guinea pigs and models can generalize to categorize new call exemplars very rapidly.

      The authors also tested the categorization performance of guinea pigs and models in a more realistic scenario, i.e. communication in noisy environments. They find that both, guinea pigs and the model exhibit similar categorization-in-noise thresholds.

      Additionally, the authors also investigated the effect of temporal stretching/compression of calls on categorization performance. Remarkably, this had virtually no negative effect on both, models and animals. And both performed equally well, even for time reversal. Finally, the authors tested the effect of pitch change on categorization performance, and found very similar effects in guinea pigs and models: discrimination performance crucially depends on pitch change, i.e. systematically decreases with the percentage of change.

      Weaknesses:

      While their computational model can explain certain aspects of call categorization after training, it cannot explain the time course of different behavioral strategies shown by the guinea pigs during learning/training.

      Thank you for bringing this up – in hindsight the original manuscript lacked clarity on exactlywhat aspects of the behavior we were trying to model. As recent studies have suggested, categorization behavior can be decomposed into two steps – the acquisition of the knowledge of auditory categories, and the expression of this knowledge in an operant task (Kuchibhotla et al., 2019; Moore and Kuchibhotla, 2022) . Our model solely addresses how knowledge regarding categories is acquired (through the detection of maximally informative features). Other than setting a 10% error in our winner-take-all stage, we did not attempt to systematically model any other cognitive-behavioral effects such as the effect of motivation and arousal, or behavioral strategies. Thus, in the revised manuscript, we have included a paragraph at the top of the Results section that defines our intent more clearly (page 5, line 117). We conclude the initial description of the behavior by stating that these factors are not intended to be captured by the model (page 6, line 171). We also edited a paragraph in the Discussion section for clarity on this point (page 26, line 629).

      Furthermore, the model cannot account for the fact that short-duration segments of calls (50ms) already carry sufficient information for call categorization in the guinea pig experiment. Model performance, however, only plateaued after a 200 ms duration, which might be due to the fact that the MIFs were on average about 110 ms long.

      The segment-length data indeed demonstrates a deviation between the data and the model.As we had acknowledged in the original manuscript, this observation suggests further constraints (perhaps on feature length and/or bandwidth) that need to be imposed on the model to better match GP behavior. We originally did not perform this analysis because we wanted to demonstrate that a model with minimal assumptions and parameter tuning could capture aspects of GP behavior.

      We have now repeated the modeling by constraining the features to a duration of 75 ms (thelowest duration for which GPs show above-threshold performance). We found that the constrained MIF model better matched GP behavior on the segment-length task (R2 of 0.62 and 0.58 for the chuts vs. purrs and wheeks vs. whines tasks; with the model crossing d’=1 for 75 ms segments for most tested cases). The constrained MIF model maintained similarity to behavior for the other manipulations as well, and yielded higher overall R2 values (0.66 for chuts vs. purrs, 0.51 for wheeks vs. whines), thereby explaining an additional 10% of variance in GP behavior.

      In the revised manuscript, we included these results (page 28, line 699), and present results from the new analyses as Figure 11 – Figure Supplement 2.

      In the temporal stretching/compressing experiment, it remains unclear, if the corresponding MIF kernels used by the models were just stretched/compressed in a temporal direction to compensate for the changed auditory input. If so, the modelling results are trivial. Furthermore, in this case, the model provides no mechanistic explanation of the underlying neural processes. Similarly, in the pitch change experiment, if MIF kernels have been stretched/compressed in the pitch direction, the same drawback applies.

      We did not alter the MIFs in any way for the tests – the MIFs were purely derived by trainingthe animal on natural calls. In learning to generalize over the variability in natural calls, the model also achieved the ability to generalize over some manipulated stimuli. The fact that the model tracks GP behavior is a key observation supporting our argument that GPs also learn MIF-like features to accomplish call categorization.

      We had mentioned at a few places that the model was only trained on natural calls. To addclarity, we have now included sentences in the time-compression and frequency-shifting results affirming that we did not manipulate the MIFs to match test stimuli. We also include a couple of sentences in the Discussion section’s first paragraph stating the above argument (page 26, line 615).

    1. Author Response

      Reviewer #1 (Public Review):

      The actual description of the methods does not allow the reader to evaluate the precision of two important processing steps. First, rCBF measures are supposed to be restricted to the cortex, but given the pCASL image spatial resolution, partial volume effects with white matter probably exist, especially in younger infants. Furthermore, segmenting tissues on the basis of anatomical images (especially T1-weighted) is complicated in the first postnatal year. As rCBF measurements are very different between grey and white matter, the performed procedure might impact the measures at each age, or even lead to a systematic bias on age-dependent changes. Second, the methodology and accuracy of the brain registration across infants are little detailed whereas it is a challenging aspect given the intense brain growth and folding, the changing contrast in T1w images at these ages, and the importance of this step to perform reliable voxelwise comparison across ages.

      We thank the reviewer for this comment. We have added more descriptions in the methods to address this comment. Briefly, individual rCBF map was generated in the individual space and calibrated by phase contrast MRI to minimize the individual variations of processing parameters such as T1 of arterial blood (Aslan et al., 2010). Cortical segmentation was also conducted in individual space. Then different types of images including rCBF map and gray matter segmentation probability map in the individual space were normalized into the template space. An averaged gray matter probability map was generated after inter-subject normalization. After carefully testing multiple thresholds in the averaged gray matter probability maps, 40% probability minimizing the contamination of white matter and CSF while keeping the continuity of the cortical gray matter mask across the cerebral cortex was used to generate the binary gray matter mask shown on the left panel of Figure R1 below. Despite poor contrasts and poor cortical segmentation of T1-weighted images of younger infants rightfully pointed out by this reviewer, the poor cortical segmentation of younger infants was compensated by the averaged cortical mask and measurement of rCBF in the template space. As demonstrated in the right three panels in Figure R1, the rCBF measure in the cortical mask in the template space is consistent across ages for accurate and reliable voxelwise comparison across age.

      Figure R1. The gray matter mask and segmented cortical mask overlaid on rCBF map of three representative infants aged 3, 6, and 20 months in the template space. The gray matter mask on the left panel was created to minimize the contamination of white matter and CSF while keeping the continuity of the cortical gray matter mask across the cerebral cortex. The contour of the gray matter mask was highlighted with bule line.

      The authors achieved their aim in showing that the rCBF increase differs across brain regions (the DMN showing intense changes compared to the visual and sensorimotor networks). Nevertheless, an analysis of covariance (instead of an ANOVA) including the infants' age as covariate (in addition to the brain region) would have allowed them to evaluate the interaction between age and region (i.e. different slopes of age-related changes across regions) in a more rigorous manner. Regarding the evaluation of the coupling between physiological (rCBF) and functional connectivity measures, the results only partly support the authors' conclusion. Actually, both measures strongly depend on the infants' age, as the authors highlight in the first parts of the study. Thus, considering this common age dependency would be required to show that the physiological and connectivity measurements are specifically related and that there is indeed a coupling.

      We thank the reviewer for this comment. Following the reviewer’s suggestion, we conducted an analysis of covariance (ANCOVA) and found significant interaction between regions and age (F(6, 322) = 2.45, p < 0.05) with age as a covariate. This ANCOVA result is consistent with Figure 3c showing differential rCBF increase rates across brain regions. The ANCOVA result was added in the last paragraph in the Results section “Faster rCBF increases in the DMN hub regions during infant brain development”.

      Regarding the evaluation of the coupling between physiological (rCBF) and functional connectivity measures (FC), the Figure 5, Figure 5–figure supplement 1 and 2 were generated exactly to test that the FC-rCBF coupling specifically localized in the DMN is not due to mutual age dependency. Briefly, Figure 5B demonstrated significant correlation only clustered in the DMN regions using the correlation method demonstrated in Figure 5-figure supplement 1. Furthermore, nonparametric permutation tests with 10,000 permutations were conducted. Such permutation tests are sensitive and effective with Figure 5c revealing significant coupling only in the DMN regions. If coupling is related to mutual age dependency, Figure 5c would demonstrate significant coupling in Vis and SM network regions too.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, Maxime R. and co-authors intended to investigate the consequence of dystrophin absence/alteration in myoblasts, the effector cells of muscle growth and regeneration, and the early role of such cells in the pathogenesis of the disease. They carried out a transcriptomic analysis, comparing transcripts expressed by dystrophic myoblasts isolated from two murine models of DMD (Dmdmdx and Dmdmdx-βgeo) and control healthy mice. The expression of a large number of genes, comprising key regulator of myogenic differentiation (Myod1, Myog, Pax3 etc.) resulted affected in comparison to control in both mouse lines.

      We believe that the novelty and importance of these result lie in demonstrating for the first time that the loss of full-length dystrophin expression is both necessary and sufficient to trigger molecular and functional abnormalities in myoblasts. The fundamental point is that, contrary to the prevailing belief, the dystrophin function may not be just to provide sarcolemma stability in myofibers but rather that there is a disease continuum: DMD defects in satellite cells (Dumont et al., 2015, Ref 45), cause myoblast dysfunctions diminishing muscle regeneration (this work), and also impairing myofiber differentiation (Shoji et al., Ref 4), with the resulting fibre being unstable and therefore degenerating. These data can better explain all the symptoms of dystrophic muscle pathology, where abnormalities in satellite cells, myoblasts and myofibers form the pathological vicious cycle. Moreover, we identify the key trigger behind these abnormalities in dystrophic myoblasts, which is MyoD downregulation. Furthermore, we demonstrate that the additional loss of short dystrophin isoforms, although these are expressed in myoblasts, do not exacerbate the phenotype. This latter finding is very important given the near complete lack of understanding of the pathology in dystrophin-null patients.

      Authors highlighted similar gene expression modifications also in a myoblast cell line previously established from the mdx mouse.

      Analogous alterations found in both primary myoblasts and in the established myoblast cell line demonstrate that this change is cell-autonomous and not evoked by the external factors in the dystrophic niche, e.g. inflammatory mediators. This also shows that the dystrophic phenotype resists the transcriptomic drift as it is maintained through numerous passages. This approach was praised later on in the review.

      To assess the outcomes from the gene ontology analysis, which pointed on the alteration of muscle system and regulation of muscle system processes, authors evaluated the proliferative, chemotactic and differentiative capacities of dystrophic myoblasts. Myoblasts presented increased proliferation, reduced chemotaxis and quite surprisingly, improved differentiating capacity, if considering the transcriptomic data.

      The key pathways (proliferation, migration and differentiation), that are essential for myoblast to evoke muscle regeneration, were confirmed to be altered in functional analyses, thus proving these transcriptomic alterations to be functional and biologically relevant. Our data showing accelerated differentiation in mdx myoblasts fully agree with findings by others, both in primary cultures and in isolated myofibers (Yablonka-Reuveni &Anderson, 2005, Ref 22).

      Finally, Maxime R. and co-authors carried out a transcriptomic analysis in myoblasts from DMD human subjects. Even though the profile of altered gene expression resulted similar and the GO studies seemed to focus on the same pathway categories, a significative divergence was observed particularly at the level of gene expression.

      Given that myoblasts from individual DMD patients present heterogeneous phenotypes (Choi et al., 2016), such divergence at the level of individual gene expression between mouse and human is to be expected. Nevertheless, these changes become convergent in altered GO categories and pathways. In the revised manuscript we have included additional genome-scale metabolic analysis in human DMD myoblasts. This revealed significant alteration in specific metabolic pathways. These are consistent with the metabolic alterations found previously in dystrophic muscle and brain, thus confirming the commonality of dystrophic defects found here in myoblasts and described before in dystrophic tissues. Moreover, this analysis is an additional proof that DMD myoblasts are significantly altered when compared to healthy cells.

      Authors link transcriptomic abnormalities and functional changes in proliferation, chemotaxis and differentiation of the dystrophic myoblasts with the alterations (probably epigenetic changes) occurring in satellite cells of dystrophic mice, consequent to the absence of the dystrophin protein. Such modifications in gene expression are supposed to be inherited by pathological myoblasts due to the division of the SC that is no longer asymmetric as occurring in healthy tissue.

      Strengths

      Transcriptomic data from samples of different sources are solid and rigorous statistical analyses have been carried out.

      Transcriptomic and functional data from primary proliferating myoblasts of the two mouse models and from the myoblast cell line are similar. This is a convincing evidence that the transcriptomic alterations observed in primary myoblasts are not influenced by the exposure to the niche environment present in the dystrophic muscle, but that are cell autonomous.

      Authors adopted a 3D culture for the functional analysis concerning myoblasts differentiations, in this way better mimicking the process occurring in vivo.

      Weaknesses

      The mdx mouse phenotype is mild in comparison to the severe symptoms and the rapid disease progression experimented by most of the human DMD subjects. Mdx mice is characterized by cycle of degeneration/regeneration initiating around the age of 6 weeks and continuing for several weeks. It was expected that authors discussed this point in detail, also considering that the animals used in this study were 8 weeks old.

      The mdx mouse has a mutation resulting in the loss of full-length dystrophin expression, which reflects the molecular defect affecting the majority of DMD patients. Therefore, mdx is the most commonly used pre-clinical model in DMD studies. The intensity of myonecrosis during this active degeneration and regeneration period (starting at 12 days and not at 6 weeks) is as aggressive as in patients. In fact, it has been suggested that the intensity of myonecrosis seen in mdx mice would be lethal to DMD patients (Duddy et al., 2015). The difference between human and mdx mouse pathology is that, starting at 10 weeks of age, the fibre replacement in mdx leg muscles reduces gradually, due to an unknown mechanism. Therefore, we isolated myoblasts at 8 weeks, when mdx replicates the human pathology. To emphasise the relevance of our findings for the human pathology, we discuss this point in detail in the revised manuscript.

      Furthermore, transcriptomic analysis of the human DMD myoblasts highlighted many differences as well as similarities when compared to mouse samples. Why do not focus more on this aspect? According to the authors, dystrophic abnormalities in myoblasts manifest irrespective of differences in genetic backgrounds and across species. The last one is a strong statement that should have been supported at least by functional data regarding chemotaxis proliferation and differentiation of human DMD myoblasts.

      What we meant by: “dystrophic abnormalities in myoblasts manifest irrespective of differences in genetic backgrounds and across species” is that the lack of full-length dystrophin expressions results in identical molecular defects in mouse and human primary myoblasts and also in the dystrophic cell line, despite numerous gene expression alterations triggered by the long-term culture in the latter We agree that linking the functional alterations in human dystrophic myoblasts to the transcriptomic alteration that we identified is important. And indeed, altered proliferation, migration and differentiation of human DMD myoblasts have been described before (Witkowski and Dubovitz., 1985; Nesmith et al., 2016; Sun et al., 2020). In fact, these previous findings that were never fully investigated, prompted us to undertake this study. Thus, our data provide a molecular underpinning for these abnormalities. In the revised manuscript we have elaborated on the existing functional data supporting alterations in human myoblasts.

      Further functional analyses will be needed to understand their consequences. It would require investigation of numerous parameters, including significant alterations in metabolic pathways, which we identified and described in the revised version of this manuscript. Given the aforementioned individual variability in patients’ population demonstrated by heterogeneous phenotypes in myoblasts, such functional analyses would need to involve a significant number of probands.

      Therefore, a detailed study in a sufficiently large cohort of DMD myoblasts is a logical next step from the identification of specific pathway alterations described here. But it is an extensive new project beyond our immediate capability.

      In the discussion, the authors suggest two possible mechanisms as responsible for alterations in the behavior of the SC that ultimately affect the functionality of myoblasts, an RNA-mediated pathological process or an alteration in epigenetic regulation. They consider the latter mechanism more likely. This is based in particular on transcriptomic data showing the downregulation of important genes involved in histone modifications, normally linked to transcriptional activation. They also reported from the literature that HDAC inhibitors upregulate MyoD, a gene that is effectively downregulated in this study. Since the authors postulate that the epigenetic dysregulation of Myod1 expression is responsible for the pathological cascade of gene downregulation, ultimately leading to the pathological phenotype, it would have been interesting to evaluate the impact of HDACi on this pathways or the overexpression of enzymes responsible for H3K4 methylation as Smid1 (downregulated in this study).

      We have presented several hypotheses regarding the mechanism in which loss of full-length dystrophin expression could affect myoblasts, including restricted spatio-temporal requirement for small amounts of full-length dystrophin and an RNA-based mechanism. The notion that epigenetic dysregulation of Myod1 expression causes a pathological cascade of transcription downregulation of genes controlled by MyoD was based on our finding that transcripts downregulated in dystrophic myoblasts exhibit overrepresentation of MyoD binding sites. We discussed this as a likely mechanism, supported by a body of literature on the known alterations of epigenetic regulation found in DMD (fifteen papers in total). We also offered a hypothesis that since treatment of mdx mice with histone deacetylase inhibitors (HDACi) promoted myogenesis (Saccone et al., 2014) and HDACi upregulate Myod1 (Mal et al., 2001), HDACi could increase myogenesis by counteracting the changes we found in dystrophic myoblast. However, while evaluation of the impact of HDACi or of the overexpression of enzymes responsible for H3K4 methylation would prove or disprove this one of the working hypotheses we made in the Discussion, it would, in no way, alter the key discovery of this study, which is that loss of full-length dystrophin expression results in major cell-autonomous abnormalities in proliferating myoblasts. Thus, if preferred, this Discussion paragraph could be shortened not to detract the reader from the main findings of this manuscript.

      Reviewer #2 (Public Review):

      This study is one of many that explore various abnormalities in the mononuclear myogenic cell compartments in DMD. Although the aim has been extensively investigated in the last several decades, it is still relevant.

      It is correct that abnormalities of proliferation, migration and differentiation in dystrophic myogenic cells have been reported over decades, but these were not followed up and often disregarded. Certainly, their causative link to DMD mutations and their consequences for the pathology were never investigated. Our study is the first to provide the comprehensive molecular underpinning for these abnormalities, demonstrating that the loss of full-length dystrophin expression directly and significantly affects myoblasts.

      The biggest limitation of this study is that it relies on the RNAseq analyses of extensively cultured myoblasts. While the computation analyses are profound, the study lacks any mechanistic explanation for the relevance of the transcriptional differences seen in the DMD myoblasts.

      We are not sure where this opinion had originated from. In fact, we used freshy isolated primary myoblasts in RNAseq experiments and then confirmed the key alterations functionally in primary myoblasts freshy isolated from two strains of DMD mice. Furthermore, we performed the mechanistic analyses, where we linked process alterations to functional defects, in which we focussed on proliferation, migration and differentiation, as processes known to impact the DMD pathology.

      In an approach considered as one of the strengths of our work by the other Reviewer, these findings in primary myoblasts were then reproduced in myoblast cell line, to demonstrate that alterations observed are not evoked by the exposure to the niche environment present in the dystrophic muscle, but that are cell-autonomous. Importantly, DMD mutant cells show these alterations despite being extensively cultured in vitro, demonstrating expressivity of this mutation. Finally, alterations were confirmed in human primary myoblasts.

      Cell purity, the myogenic status of the cells, passage number, and the period that cells were in culture are not well described. This study's cell isolation method allows contamination with non-myogenic cells that can significantly influence the RNAseq analyses. Immunostaining for myogenic markers, for example, MyoD, would indicate the purity of the cell culture. Extensive culturing of the primary myoblasts promotes clonal selection and introduces numerous molecular alterations; thus, the passage number and duration of the culture are significant factors. It looks that some assays were conducted with cells in the high passage. For example, in myogenic differentiation assay where they needed one million cells for each pellet. Maybe that is the reason for the low differentiation rate presented in Sup. Fig 2.

      Cell homogeneity across genotypes was fully confirmed by sample-based hierarchical clustering, clearly segregating transcripts into groups corresponding to genotypes. Furthermore, the same alterations were found in corresponding myoblast cell lines, which purity and myogenic potential was demonstrated previously (Onopiuk et al., 2015). Therefore, varying contamination with non-myogenic cells could not significantly influence these results. However, for completeness, in the revised manuscript (Supplementary Figure 8) we described cell characterisation using MyoD as a marker, proving that the well-established myoblast isolation procedure used by us produces pure myoblast cultures.

      As for the differentiation assay, isolated myoblasts were never passaged extensively (one passage only) but sufficient numbers were obtained through the efficient isolation. Moreover, cells from every genotype were maintained and treated identically. Therefore, under these given conditions, any differences were the result of the DMD gene mutation and not culturing.

      It is hard to explain how DMD myoblasts differentiate better than the WT controls if they have a suppressed myogenic program in the proliferation stage. Even at day 0 of differentiation, DMD myoblasts differentiated better according to the RT-qPCR presented in Figure 5c. Additionally, it is unusual that the marker of differentiation Myog and Myh1 reached the peak at day 2 of differentiation for the WT myoblasts.

      In fact, our data fully agree with findings by others, that mdx cells display accelerated differentiation both in primary cultures and in isolated myofibers (Yablonka-Reuveni &Anderson, 2005). Our team recently demonstrated that DMD mutations evoke marked transcriptome and miRNome dysregulations early in human muscle cell development (Mournetas et al, 2021). Expression of key coordinators of muscle differentiation was dysregulated in proliferating dystrophic myoblasts, the differentiation of which was subsequently found to be altered, in line with the mouse cells studied here. Clearly, further studies into the mechanisms of this and numerous other alterations described by us here are urgently needed, as these may uncover new therapeutic targets.

      As to whether it is unusual for these differentiation markers to peak at that time, we cannot comment, as no reference for this statement was given and the expressions can vary depending on the experimental conditions used – in our case the 3D culture could make the difference. Yet, again, cells from every genotype were maintained and treated identically and so any differences reflect the impact of the DMD mutation.

    1. Author Response

      Reviewer #1 (Public Review):

      The current manuscript examined patients with inborn errors of immunity (IEI) using whole exome sequencing (WES) and identified de novo variants (DNVs) associated with the disease. They found 14 genes associated with DNVs, including four novel genes - PSMB10, DDX1, KMT2C, and FBXW11, and conducted a systematic assessment of affected genes.

      Given the level of heterogeneity underlying IEI, the sample size is limited. Although the authors clearly stated this, the analysis of the current manuscript does not add much value to describing genes affected by DNVs. The sample size is small to perform exome-wide evaluation (authors described they did "exome-wide evaluation" in Abstract - line 10 but there is no statistical evaluation to prioritize effect genes). They could go with systems biology approaches, explaining the biological pathway of affected genes or underlying cell types from immune single-cell datasets. As the authors stated that IEI constitutes a large group of heterogeneous disorders, there should be some analysis to explain the functional convergence of affected genes in disease development.

      We believe the term ‘exome-wide evaluation’ might have led to misinterpretation. We used it in the context of reviewing each DNV found in a single patient’s exome outside the diagnostic IEI gene panel (i.e. ‘exome-wide’), instead of reviewing DNVs across all exomes. We have rephrased the sentences containing this term. The main purpose of this manuscript was to identify ‘all’ coding DNVs in each case, and explore whether they include any pathogenic or novel candidate DNVs. Our main purpose was to urge the IEI field to apply trio-based WES more systematically, and share candidate DNVs with the field for further validation.

      As the reviewer points out, our sample size would be too limited to perform systems biology approaches for variant prioritization. The signal-to-noise ratio would be very high, because many genes causing inborn errors of immunity remain to be discovered and the studied group of patients with inborn errors of immunity is very heterogeneous. This means that we would not have the power to investigate potential enrichment or burden of DNVs in specific genes nor the functional convergence of affected genes or pathways in specific phenotypes. In this study, we aimed to show the additional value of the systematic DNV analysis as a method to identify and prioritize candidate variants in individual cases, but ideally we would like to answer other important research questions using computational/statistical approaches in a larger cohort in the future, as has been performed in other rare disease fields. The suggestion of the reviewer is helpful, and this approach has been shown to implicate novel pathways enriched in disease for various forms of neurodevelopmental diseases for which ten-thousands of trio-based WES have been performed [9, 10].

      For DNV identification, the authors filtered out variants with ExAC & gnomAD AF > 0.1% or GoNL AF > 0.5%. I think this is too lenient a cutoff for filtering for DNV. For example, gnomAD AF 0.1% is approximately ~200 individuals in population. Given the filtering parameters (<5 variation reads, <20% variant allele frequency, or low coverage DNVs), they did not use specific filtering metrics to find DNV and there might be false-positive variants in the final DNV set. As far as I can find in the manuscript, they used the GATK pipeline from the previous study (REF 29). The GATK unified genotype generates a range of filtering metrics to increase specificity in variant filtering. It is very surprising that the authors seem to use three parameters (variation reads → FORMAT:AD[1]; variant allele frequency → FORMAT:AB? and low coverage → FORMAT:DP? but the authors did not state the cutoff) to filter de novo variants, which are fragile to false-positive variant calling.

      The chosen population database fraction cut-offs align with DNV filtering strategies in literature. We have not chosen a stricter cut-off to avoid missing true positives, since patients with IEI can exhibit late-onset disease, variable penetrance and have postzygotic mutations, while limiting the chance of false-positive findings. For instance, we have reduced local false-positives by filtering on allele frequencies in our in-house database and Dutch population database. Moreover, automated DNV calling required >2% alternate reads in either parent and variants were prioritized based on prediction scores and annotated immune function. Additionally, and in accordance with this expert reviewer, we have now put a stricter cut-off in place for variation reads (from 5 to 10) to further minimize false-positive findings. Lastly, we visually inspected the final 14 candidate DNVs in IGV and/or Alamut, which supports the validity of the findings. The DNVs reported in our final DNV list (Table 2B) are therefore unlikely to contain falsepositive findings.

      Reviewer #2 (Public Review):

      The manuscript by Hebert et al., reports on the utility of TRIO-based whole-exome sequencing (WES) in patients who presented as sporadic cases and are suspected of having inborn errors of immunity (IEIs). The authors developed an in-house pipeline for data analysis and used a set of known algorithms to prioritize the impact of genetic variants located mostly in the coding region of proteins. The data analysis was done in two steps; the first step involved the routine WES diagnostic analysis that led to the identification of pathogenic (P) and likely pathogenic variants (LP) in genes already associated with IEIs. The authors claim that this analysis resulted in a likely molecular diagnosis in 19 (~15%) of patients, while an additional 14% of cases were carriers for VUSs or other risk factors in the disease causal genes. As many of these variants are either inherited from one parent or are present as heterozygous (monoallelic) variants in genes associated with recessive diseases, their clinical significance is unclear.

      In the second step, the authors focused on the identification of de novo variants (DNVs), including SNVs, CNVs, and small indel, since these variants are more likely to be deleterious on protein function. The authors identified 136 non-synonymous DNVs, which were then filtered down to 14 best candidate variants using various in silico tools and database searches. These 14 variants included DNVs in genes previously associated with autoinflammatory diseases, such as CAPS and RELA haploinsufficiency. Three patients are found to carry de novo copy number variants (CNVs) of unknown clinical significance. Finally, several de novo loss-of-function (LoF) variants have been identified in genes that are not yet associated with any IEIs but are good functional candidates. Their potential pathogenicity is further supported by the observation that they are found in genes intolerant to loss of function. Functional validation has been performed only for the patient carrier of the novel FBXW11 splice variant. The authors state that the maximum solve rate (i.e., probable molecular diagnosis) in this cohort might be as high as 23%, which is comparable to similar reports of patients with IEIs, however, the reported results do not yet support this conclusion.

      The main conclusion of this study is that TRIO-based WES analysis for DNVs could improve the diagnostic rate and can result in the identification of novel disease-causing genes. TRIO-based sequencing is also preferable when analyzing patients from populations underrepresented in gnomAD and ExAC. As the cost of WES has come down, WES has been increasingly used in the clinical diagnosis of many human disorders. Despite the major progress in the development of novel sequencing technologies and new in silico tools, the diagnostic rate is still below 50%. In summary, this study suggests that despite the identification of over 400 genes associated with IEIs, there are many more genes to be identified and that the heritability of these diseases is very complex.

      We thank the reviewer for the elaborate summary of our study and the suggestions that have helped to further improve the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a study that is aimed at understanding the binding mechanism of D-serine to the two different binding lobes of the NMDA receptor. D-serine is a known agonist and binder of the GluN1 ligand-binding domain, but its interaction with the GluN2A is unknown. Using long time-scale conventional molecular dynamics simulations, the researchers observe that D-serine interacts and associates readily with both binding domains, often via protein surface pathways referred to as a guided-diffusion mechanism. As observed previously, free-energy calculations show that D-serine stabilizes the closure of both binding domains. Finally, analysis of the effect of glycans shows that these modifications play a role in further stabilizing the closed state of the ligand-binding domains.

      Amongst this broad and careful analysis, the major finding from this work is that D-serine surprisingly associates with GluN2A, which has been known to bind glutamate to enable activation of the channel. Since the binding of D-serine to GluN2A had not been observed previously, they proposed that D-serine acts as an inhibitor for glutamate at high concentrations. This hypothesis was investigated and supported by electrophysiological experiments, yielding a novel result that presents new interpretations for the field. However, the guided-diffusion mechanism still remains hypothetical and is unclear as to whether this is in fact a driving force, or requirement, for the binding. Specifically, the following questions warrant further investigation:

      1) Specific or non-specific association? It is possible that non-specific association events of ligands to the protein could be an intrinsic artifact of the MD simulations. To investigate this, it would be informative to compare the current results with a negative control simulation where the ligand was replaced with a similar amino acid or molecule that has been verified as a non-binder for NMDAR.

      To address this, we quantified the non-specific association signal by comparing the number of successful binding events to random association (see response to Essential Revisions #4). In theory, any appropriately small amino acid could associate with the conserved arginine of each LBD through its C-terminus (as evidenced by our PMF of glycine bound to GluN2A). However, an amino acid’s ability to remain bound long enough to induce LBD closure is largely dependent on the presence of interactions with the LBD bottom lobe.

      2) Dissociation events? Further clarification is required to understand whether any dissociation events are observed in these simulations to the non-specific sites or the final binding site. If dissociation is not observed, how does this impact the interpretation of the binding mechanisms that characterize only the association events?

      Association and dissociation are both observed and documented in Datasets S2-S4. We added clarification to the text on page 5 about the nature of both processes and how pathways are defined by residues that allow the agonist to enter and leave the binding site. As illustrated in the clustering dendrograms, association (even-numbered events) and dissociation (odd-numbered events) pathways are present in all clusters.

      3) Testing the hypothesis of guided diffusion. It is proposed that guided diffusion drives serine binding to its site. This would imply that the residues on this path are important, and if mutated, would decrease the association rate and the ability to compete with glutamate. Additional electrophysiological experiments or direct binding experiments would be useful in understanding the relevance of guided diffusion in the ligand-binding mechanism of NMDARs.

      To address this point, we performed additional TEVC experiments generating D-serine dose-response curves for GluN1a Arg694Ala and Arg695Ala, and GluN2A Arg692Ala and Arg695Ala. The curves for both GluN2A mutants support our guided diffusion mechanism, as they lowered the D-serine inhibition potency (These mutants also likely also alter glutamate binding, but since D-serine and glutamate bind through the same residues, it is not possible to separate out individual contributions.) The GluN1a mutants did not show altered behavior, supporting the increased diffusiveness of D-serine binding to GluN1 compared to GluN2A. These additional findings are included in the main text on page 12 and in Fig. 4D.

      Reviewer #2 (Public Review):

      In this manuscript, Yovanno et. al. did a comprehensive mechanistic study of D-serine binding to NMDAR ligand-binding domains (LBDs). The framework of the current investigation is built upon this research group's previous studies of NMDAR agonists glutamate and glycine binding. Using an aggregated 51 microseconds of all-atom MD simulations of spontaneous binding, the authors applied rigorous pathway similarity analysis to cluster the paths through which D-serine enters the LBDs from the bulk solution. The most interesting and unexpected result from this study is the spontaneous binding of D-serine to the GluN2A LBD, which was previously known to be the glutamate binding site.

      By computing the overlap coefficient for all binding pathways, the authors concluded that D-serine binding to GluN2A LBD through "guided" diffusion, while to GluN1 through random diffusion (the clustered pathways comprise random contacts rather than specific, conserved residue contacts). A "guided" binding pathway further suggests that the agonist binding could be sensitive to the conformational change within and around the binding pocket, and vice versa.

      To investigate whether D-serine binding events are able to modulate the GluN2A LBD conformation, the authors then computed a series of LBD conformational free energy landscapes (2D-PMF) using 2D-umbrella sampling simulations. The 2D-PMF profiles confirmed that D-serine stabilizes the closed LBD conformation, just like glutamate. Because the D-serine 2D-PMF shows a metastable state that was absent in glutamate 2D-PMF, the authors argue that D-serine may not stabilize the closed conformation to the same extent as glutamate. Likewise, based on the 2D-PMF of GluN1 LBD, the authors suggest that D-serine has a higher potency than glycine, in part due to its ability to more strongly stabilize a closed LBD conformation.

      The simulations above generated the hypothesis that D-serine could function as a competitive antagonist of glutamate at high concentrations. This computationally derived hypothesis is beautifully tested by the authors' dose-response curves and the Schild plot.

      One question that would merit further clarification is whether the binding affinity of D-serine to the two LBDs is stronger or weaker in comparison with glutamate and glycine. The difference in agonist potency could be due to the difference in binding affinity and/or efficacy. Stabilizing the closed LBD conformation may indicate the efficacy of the agonist, but affinity (Kd) will still play a role in the final potency.

      Indeed, as Reviewer 2 pointed out, affinity should play a role since the D-serine inhibition here is attributed to the competitive binding of D-serine against glutamate as we showed with our Schild plot. The bona fide binding site for D-serine is GluN1 LBD where D-serine binds more strongly than glycine (Furukawa/Gouaux 2003). In the GluN1 LBD, D-serine is a full agonist. The D-serine binding to the GluN2A LBD (the finding here) is substantially weaker (mM) than glutamate (~1 uM).

      While a glycosylated GluN1/GluN2A dimer was used for the majority of MD simulations, the authors also checked the "reality" by mapping the pathway residues onto the NMDAR heterotetramer structure. The role of glycans in D-serine binding pathways was further investigated by conducting an additional 30 microseconds simulations of the non-glycosylated dimer. It was found that glycans introduced small kinetic "traps" that slow down the binding process. Glycan was also found to stabilize LBD closure from 1D-PMF profiles.

      The detailed mechanistic insight and D-serine's inhibitory effect on NMDAR, unraveled by this study, may play an important role in therapeutic strategies, and thus is likely to have a broad impact in the field.

    1. Author Response

      Reviewer #2 (Public Review):

      Dr Muktupavela et al. present a novel likelihood-based method for inferring the strength of natural selection and basic demographic parameters, such as mobility rates, from time-stamped ancient DNA data in a spatially explicit framework. This is an elegant method that is, in many ways, a natural extension of previous work in the field that has focussed mainly on inferring natural selection from temporal data to a spatial setting. In addition to the simplest scenarios of isotropic dispersal the authors also consider models with different dispersal rate in longitudinal and latitudinal directions, as well as biased dispersal. Selection strength, dispersal rates and bias are assumed to be constant across space and piecewise constant in time (but it would be very straightforward to relax these assumptions). The bias component of the model is an interesting addition that, in principle, allows to broadly account for the effect of long-range dispersals such as the spread of agriculture across Europe from the fertile crescent and Bronze age migrations from the Asian steppes on the spatiotemporal pattern of allele frequencies.

      Although the main idea is clearly communicated, there is room for improvement of the manuscript regarding investigating the properties of the model and presenting the results. Notably, the authors assume that the age of mutation is known and correct in their assessment of the performance of the model on simulated data (which may inflate the reported accuracy of the reconstructions) and use estimates from the literature when the method is applied to empirical data. Although it is necessary to specify the age of the allele, and this could easily have been treated as a free parameter in the framework. I would like to see a discussion of why the method may not be suitable for this, and a more systematic test for the sensitivity of the method to misspecification of the age (which could be very substantial, especially if the population history has been complex). In the cases where the model is run for different allele age estimates in the manuscript, such as for the lactase persistence scenario, the authors should present the (approximate maximum) likelihoods for the different scenarios in the text.

      An explanation as to why we do not infer the age of the allele (see text below) has been added to the main text under section “Parameter search” (lines 531-533). Briefly, we chose to construct our method in a way that uses the age of the allele as an input parameter rather than estimating it since there are multiple equally possible solutions with various combinations of allele age and selection coefficient values. This is demonstrated Appendix A3.

      We also added a description of log-likelihood values when we vary the allele ages under section “Robustness of parameters to the assumed age of the allele” in lines 324-329, the results of which are presented in supplementary Figure 6–Figure Supplement 9 and Figure 8–Figure Supplement 6.

      Briefly, we assessed the likelihood of the best fitted models by varying the ages of the rs4988235(T) and rs1042602(A) alleles. We can see that in the case of rs4988235(T) allele the allele age used in this study (7,441 years) gives the most likely solution among the explored ages. In the case of the rs1042602(A) allele, we found that there are multiple nearly equally likely ages when looking at ages at least as old as 15,000 years.

      A further weakness of the method is that it uses the Fisher information matrix to estimate uncertainty. While this works well if the posterior distribution is narrow, it can severely underestimate the uncertainty if this is not case, in particular if the distribution is non-gaussian in the tails. It would be better, but perhaps computationally prohibitively expensive, to report Bayesian posterior distributions for the parameters as well as Bayes factors that could be used to formally compare the fit of different models to the data.

      We agree with the reviewer that implementing Bayesian parameter fitting would likely provide a more robust understanding of the uncertainty of the estimates as well as an opportunity to formally compare different models using Bayes factors (although at the cost of an increase of computational intensity). Changing the inference engine of our method in this manner (while keeping it computationally feasible) is something we are currently investigating and hope to release as part of a future Bayesian version of our method. In the meantime, we have added a discussion of this caveat in our manuscript (sixth paragraph).

      Finally, although the rationale behind the model is clearly described, the detailed descriptions of the model and the numerical implementation have some shortcomings. First, there are typos in the appendix where the continuous model is derived from a discrete approximation (the right-hand side of Eq. (8) should not contain the term p(x,y,t) for it to be consistent with Eqs. (9) and (10)). Second, any differential equation model is incomplete without specifying the boundary conditions. This is especially important here as the assumption of uniform diffusion and advection on the grid is violated by the constraints imposed by the land mask, where the population is assumed to vanish on water areas (suggesting an absorbing boundary condition). Further down in the methods, details are also missing on how Eq. (10) was solved numerically, merely that it was discretized at a certain resolution.

      Looking more closely at the Eq (8), we do believe that the term p(x,y,t) should be there since it is moved to the left-hand side of the Eq (9) by simple algebraic rearrangements of the terms of the equation.

    1. Author Response

      Evaluation Summary:

      1) The paper is well written, and its style/formatting are optimal. The baseline signature moderately predicted outcome, and the data after one cycle further improved the algorithm, though this decreases its utility as a pure predictive tool

      We thank the editor and the reviewers for their positive feedback regarding the style and formatting of the manuscript. We concur that longitudinal sampling of blood, before and after one cycle of treatment, renders the predictive signature marginally more laborious to generate. In an ideal setting, we would be able to solely generate a predictive signature based on baseline characteristics - unfortunately such a test does not yet exist.

      In this study, we propose adding an easily obtainable blood sample after the first cycle of treatment to significantly improve our ability to predict response. Due to the ease of sampling them, we believe that blood biopsies will be key as the search for predictive biomarkers expands. Since the inception of our study, there have been numerous impactful pieces of published literature assessing PBMCs, mainly in response to immune checkpoint blockade 1-6. Given that our risk signature is now validated in an immunotherapy trial (EACH trial NCT03494322), we are even more confident with our unique approach to longitudinal sampling to developing a predictive model to systemic therapy. The trial design of the validation study is now included as supplementary (Figure 2A) in the manuscript.

      2) Signatures were not prospectively validated on an independent cohort; the algorithm was developed around a first-line therapy that is no longer considered to be the standard of care for HNSCC; and, while most of the conclusions are supported by the data, some of the caveats (such as the lack of a validation cohort, key in predictive biomarker development), are not addressed.

      Thank you. We will address this comment in two parts – (a) with regards to the validation cohort part and (b) for the status of the EXTREME treatment regimen in the original cohort. In this revised version, we have validated our risk signature in an independent cohort of patients who received cetuximab and avelumab (anti-PD-L1) in a single-arm, phase 2 clinical trial setting. Beyond serving purely as a validation cohort, it also demonstrates the applicability of our model in predicting response to immune checkpoint blockade-based therapy in keeping with contemporary advances in systemic treatment for HNSCC. The risk signature strongly predicted response in the new independent cohort giving us more confidence in our model’s ability to predict outcome for systemic therapy regimens beyond cytotoxic chemotherapy and cetuximab. Figure 5B shows the strong correlation between the risk signature and disease outcome in the validation cohort (Kendall rank correlation, t=0.725 p=0.0181).

      Secondly, the EXTREME regimen (platinum/5-FU/cetuximab) remains a first-line standard of care treatment in the UK and European countries for HNSCC patients with negative PD-L1 status (CPS score <1) which account for around 15% of all HNSCC patients 7. While the US Food and Drug Administration (FDA) approved pembrolizumab in combination with chemotherapy as first-line treatment regardless of PD-L1 expression and pembrolizumab alone for patients with PD-L1-expressing tumours (CPS ≥1), the European Medicines Agency (EMA) approved pembrolizumab with or without chemotherapy only for patients with a CPS ≥1, and this has been highlighted in the European Society for Medical Oncology (ESMO) and the UK National Institute for Health and Care Excellence (NICE) guidelines 8 and (https://www.nice.org.uk/guidance/ta661/chapter/1-Recommendations).

      Furthermore, chemotherapy with EXTREME regimen is standard of care for patients with contraindications to immune checkpoint inhibitors such as autoimmune disease 8. It can also be considered as second-line treatment in patients who only received pembrolizumab monotherapy in the first line setting.

      3) However the overall impact in the field of this work seems limited by a number of factors, including that the authors focused on immune cell subpopulations and exosomes, which narrows the scope (no cytokines or other biomarkers were included).

      Thank you. We selected a finite number of covariates based on a few factors – (a) published literature, (b) previous data generated by the group and (c) the applicability of the findings to the clinic. Instead of an exploratory article in which we could generate an infinite number of covariates by a technique similar to RNA sequencing, we opted for a select set of covariates. This hypothesis-driven approach generated a strong signature that is now validated across two trials. The focus on immune population is driven by our hypothesis that systemic changes in the PBMCs are indicative and reflective of the status of the intra-tumoral immune response. In the revised manuscript we used a custom immune focused imaging mass cytometry antibody panel to probe tissue sections from 9 patients. We now show that the key populations driving the predictive model in the periphery are not only reflected at the tumoral level, but these disparate immune cell subpopulations also interact. See Figure 6 in which we use a machine learning approach to segment cells and assign them to distinct immunological subpopulations. We found that the peripheral monocyte population strongly correlated with a tumoral macrophage population having a similar marker expression pattern. We found that the peripheral central memory CD8 T cells inversely correlated with tissue resident memory T cells. The tissue presence of both these cells correlated positively with outcome. Most strikingly, these two populations were most likely to co-localize with each other at the tissue level at a frequency of almost double the second highest co-localization. Data on the nature of the interplay between peripheral systemic immunity and intra-tumoral immunity is novel and rarely exists in the literature outside the scope of in-vivo animal models. Here we describe these interactions using human patient samples treated with a clinically relevant therapy.

      Given the limited amount of patient sera collected in the trial we opted to perform exosome analysis on markers known to impact the response to the anti-EGFR/HER3 treatment/immune responses. This was in line with our labs work to use exosome FRET-FLIM as a surrogate for tissue FRET-FLIM which we originally used to discover a potential dimer dependent mechanism for anti-EGFR treatment resistance in neoadjuvant breast cancer patients9; and more recently published on a colorectal patient sample cohort from the COIN study 10. While exosome EGFR-HER3 heterodimer failed to reach significance in our risk signature, it was close as depicted in the Kaplan-Meier curve from Figure 3C. We of course acknowledge the potential added benefit of having serum cytokine array analysis. While that was not feasible for this study our group now aims at ensuring that extra patient serum samples are bio-banked for such analysis from ongoing and future trials.

      Reviewer 1 (Public Review):

      1) For this study to be significant, one would want to see a marked improvement over current biomarkers, in a robust and generalizable population. Unfortunately, this study falls short in these respects. First, the authors do not adequately discuss the prior literature. Even a fairly crude and old-fashioned blood-based biomarker such as neutrophil:lymphocyte ratio has quite good predictive and prognostic capability in R/M HNSCC

      Thank you for your suggestion. We have expanded the discussion to include an overview of current biomarkers. We also compared the predictive power of neutrophil:lymphocyte ratio (NLR) from two published meta-analysis to our risk signature 11,12. We used the median risk score to divide our original patient cohort into a high and low risk group. We then calculated the HRs and CI for both signatures at pre-treatment alone (HR = 4.1397 [95% CI: 1.975 - 8.676]) and for the combined signature (HR = 2.574 [95% CI: 1.336 - 4.96]). Both were higher than the published literature whilst only using the median as the cutoff. Mascarella, Mannard et al. published “NLR greater than the cutoff value was associated with poorer OS and DSS (HR 1.69; 95% CI 1.47-1.93; P < .001 and HR 1.88; 95% CI 1.20-2.95”, and Takenaka, Oya et al published : “The combined hazard ratio for OS in patients with an elevated NLR (range 2.04-5) was 1.78 (confidence interval [CI] 1.53-2.07”. We realize that we are stratifying patients based on PFS and not overall survival, which is an inherent limitation of the study, but the added preditive value of the signature relative to existing literature we humbly believe is too large to not be impacful.

      2) It is not clear to me that there is a compelling need to do better -- given that existing predictive biomarkers based on clinical nomograms or NLR are actually used in practice.

      We agree that clinical nomograms (based on clinicopathological factors) have been shown to be predictors of outcomes in HNSCC 13. However, whilst these models have been validated as prognostic biomarkers for overall survival and/or disease specific survival, they are not currently recommended in the cancer treatment guidelines nor universally used in the clinic. With the further validation performed on a cohort treated with an immune-checkpoint inhibitor, our multimodal signature describes new data to help understand the range of treatment responses and predict outcomes and could be used to guide treatment intensification, continuation and/or early termination in clinical practice or incorporated into future clinical trials. Moreover, in the resubmission we extend our work from predictive biomarker research to developing a better understanding of the interplay between the peripheral immune response to intra-tumoral immunity which we discuss in this letter as part of our response to the public evaluation summary part 3. Given the recent surge in literature focused on tumor immunity with the increased use of immune checkpoint blockers, we believe our work offers a strong contribution to the few papers in circulation that have attempted to link tumor immunity from the systemic level to the tumor tissue level.

      3) A large number (31 of 87) patients were not included due to lack of biomaterials. No analyses have been performed to examine the characteristics of these patients. It is unlikely that the collection of biomaterials has no correlation with disease characteristics, prognostic features, outcomes, or the analytes in this study. This exclusion -- akin to unequal censoring in clinical trials -- is likely to significant impact results. Given that the population enrolled in a phase II trial, and that sub-population of patients who survive long enough and are feeling well enough to submit to large volume blood draws on trial, would not necessarily represent the real world population of R/M HNSCC patients, a broader population is needed to justify conclusions about this assay having robust predictive value.

      We appreciate the reviewer’s concern on potential skewness of the data based on patient selection criteria. The median PFS of our 56-patient cohort used in the generation of the risk signature was 5.48 months as shown in supplementary table 1 in the original submission. This is in line with real-world treatment outcomes to the EXTREME Regimen (cetuximab with platinum-based therapy) as first line therapy for Recurrent/Metastatic Squamous Cell Carcinoma of the Head and Neck which was reported as 5 month by Sano et al in 2019 14. It is also very similar to the median PFS observed in the DIRECT study 15

      4) It is unclear why OS as a hard endpoint was not analyzed here. No explanation is provided, other than OS was not available, a statement that is difficult to understand, given that PFS was available, and overall survival is a component of PFS.

      Thank you. We admit that the absence of overall survival is an inherent limitation of the study. In the process of submitting this revision, we have once again requested this dataset from the sponsoring pharmaceutical company but were informed that they are unable to provide it. This is because reorganization of funding priorities within the company precludes them opening datasets from an already-published clinical trial. We are equally disappointed to not be able to obtain this data, but firmly believe that the ability of the signature to predict PFS (the primary endpoint of the trial, untainted by subsequent lines of treatment), as well as cross-validation against the contemporary EACH trial, is a testament to the signature’s strength.

      There is no validation set for the biomarker. The biomarker was trained and cross-validated using Bayesian techniques to reduce overfitting. This is a valid approach for training and cross-validation, but for the biomarker to be testable and interpretable, it requires assessment in an independent dataset. There is no statistical technique that I am aware of that generates informative biomarkers without an independent validation dataset

      We completely agree with the reviewer regarding the need to obtain a validation set. Obtaining patient samples from a similar cohort was difficult but we managed to validate the signature on a set of patients treated with an anti-PD-L1 monoclonal antibody in combination with cetuximab. Furthermore, the validation was performed using a limited numbers of covariates that were identified in the risk signature by the Bayesian model. These immune populations can be obtained by running a limited set of markers on flow cytometry. We were very happy to see that these limited immune based covariates strongly correlated with a worst disease response in an independent cohort using a different treatment modality. This furthers our hypothesis that changes in the immune populations are key to understanding response to systemic therapy. Fueled with the data from the validation cohort we furthered our analysis of the tissue from a total of 9 patients from the test cohort. Using imaging mass cytometry, we were able to identify how immune populations are mirrored at the tumoral level opening the horizon for new research. The data for the validation set are copied into this letter in response to point 2 of the public evaluation summary.

    1. Author Response

      Reviewer #1 (Public Review):

      Tarasov and colleagues provide data that extensively phenotypes TGAC8 mice, which exhibit a cAMP-mediated increase in cardiac workload prior to developing heart failure. The authors confirm data from prior studies, showing increased cardiac output mediated by changes in heart rate with similar ejection fraction. 

      The above is slightly incorrect as stated. Our results section stated that HR and EF were increased in TGAC8, but that stroke volume did not differ by genotype. Thus 30% increase in cardiac output in TGAC8 was attributable to the increased HR.

      The study is overall well-planned and the amount of data presented by the authors is impressive. The work nicely incorporates animal-level physiology (echocardiography data), tests for known canonical markers of hypertrophy, and then delves into an unbiased analysis of the transcriptome and proteome of LV tissue in bulk. The techniques and analyses in the study are adequately executed and within the realm of expertise of the Lakatta laboratory. This study is a necessary and crucial first step to extensively phenotype this mouse line and generate hypotheses for further work. 

      Reviewer #2 (Public Review): 

      Tarasov et al. present an impressive amount of work in their in-depth assessment of a murine model of chronic stress in a transgenic line with constitutively active AC/cAMP/PKA/Ca2+ signaling that spans cardiac structure, function, cellular architecture, gene and protein expression, mitochondrial function, energetics and more. Exploration of multiple cellular pathways throughout the manuscript and as summarized in Figure 16 help characterize this murine model and serves as a first step in using this model to understanding the effect of chronic stress on the heart. The conclusions of the manuscript are well-supported by the data, and I have the following comments: 

      Strengths: 

      1. The authors present echocardiographic, histologic, electrocardiographic, neurohormonal quantification, protein synthesis/degradation, mitochondrial, gene and protein expression profiling, and metabolism data in their assessment of this model. 

      2. The verification of increased transcripts of AC and PKA activation in this transgenic line provided validation for the model. 

      3. The pathway analyses for both gene and protein expression profiling help supports the authors' claim of the importance of differences noted in the various pathways between the transgenic line and controls. 

      4. The investigators posit that there is decreased wall stress and adequate energy production due to a shift in metabolism. 

      As written, this statement does not exactly reflect what we had intended to communicate in the paper. We did not posit, that LV wall stress was reduced in TGAC8, but that it must be reduced compared to WT on the basis of Laplace’s Law because of a substantial reduction of LV cavity volume. We also did not posit that energy production is due to a shift in metabolism, but rather, that adaptations in energy metabolism resulted in adequate energy production to meet, what appeared to us to be a marked increase in energy demand in TGAC8 vs WT, based on our observation that transcriptome and proteome gene ontology (GO) terms that differed in TGAC8 vs WT, covered nearly all biological processes and molecular functions within nearly all compartments of the LV myocardium.

      These findings would suggest that this model would be suitable for that of an athlete's heart, which is characterized by thickened left ventricular walls without a compromise in function. 

      Although the chronic increase in cardiac output in TGAC8 heart simulates that of an athlete’s heart during exercise, LV cavity volume at rest is larger in the endurance trained heart and this is associated with bradycardia. In these aspects, the TGAC8 heart differs from the endurance trained heart (perhaps because it does not have sufficient rest periods between bouts of exercise, as does the endurance trained heart). In the discussion section of the manuscript, we noted several features that differed between the TGAC8 vs the endurance trained heart. 

      However, the mice do develop heart failure after 1 year without a sense of mechanism despite the wealth of data provided. Are the authors able to comment on what changes described in this study of this transgenic line may be deleterious in the long run? 

      Heart failure in the long run, had first been described in the TGAC8 mouse by Mougenot et. al. (ref 10 in our manuscript) who performed numerous biochemical and biophysical measurements in TGAC8 and WT attributed the heart failure to be a manifestation of accelerated heart aging. We are in the midst of conducting a longitudinal study of cardiac structure and function in the TGAC8 vs WT as these mice age, along with additional non-biased multi-omics analyses in order to get an overview about which of adaptive pathways that are activated in TGAC8 heart at 3 months of age become faltered with advancing age and how changes in these pathways relate to the altered cardiac structure and function of the TGAC8 as age advances. Following that, we will focus on each of these pathways employing detailed mechanistic analyses. Our provisional hypothesis is that while AC8 activity will continue to be increased as age advances, its downstream signaling will begin to fail due to age-associated changes in proteostasis and in the expression of proteins, including those involved in energy metabolism.

      Weaknesses: 

      1.  As acknowledged by the investigators, this is a hypothesis-generating rather than hypothesistesting study. 

      Yes, we used a systems approach at first, in order to “open our eyes” so that we could get an overview of numerous changes that might have occurred in the TGAC8 heart in order to generate hypotheses that could later be tested by others and by us.”

      2.  The investigators posit that there is decreased wall stress and adequate energy production due to a shift in metabolism. These findings would suggest that this model would be suitable for that of an athlete's heart, which is characterized by thickened left ventricular walls without a compromise in function. However, the mice do develop heart failure after 1 year without a sense of mechanism despite the wealth of data provided. Are the authors able to comment on what changes described in this study of this transgenic line may be deleterious in the long run? 

      We have addressed these comments above in our response to your comment #4 under strengths.

      3.  Figure 5B is referenced to support the claim regarding beta adrenergic receptor desensitization, but the data show catecholamine levels in tissue. I would have expected receptor expression analysis to suggest up/downregulation of receptors at the membrane to support this claim. 

      Beta adrenergic receptor desensitization can occur due to changes in molecules that inhibit signaling that are at the receptor or at the signaling downstream of the receptor in the absence of changes in receptor number. Here is how we summed this up in our manuscript:  “Numerous molecules that inhibit βAR signaling, (e.g. Grk5 by 2.6 fold in RNASEQ and 30% in proteome; Dab2 by 1.14 fold in RNASEQ and 18% in proteome; and β-arrestin by 1.2 fold in RNASEQ and 14% in proteome) were upregulated in the TGAC8 vs WT LV (Table S.3, S.5 and S.9), suggesting that βAR signaling is downregulated in TGAC8 vs WT, and prior studies indicate that βAR stimulation-induced contractile and HR responses are blunted in TGAC8 vs WT.8,11… A blunted response to βAR stimulation in a prior report was linked to a smaller increase in L-type Ca2+ channel current in response to βAR stimulation in the context of increased PDE activity.13, 14 WB analyses showed that PDE3A and PDE4A expression increased by 94% and 36%, respectively in TGAC8 vs WT, whereas PDE4B and PDE4D did not differ statistically by genotype (Figure 16-supplement 1 A). In addition to mechanisms that limit cAMP signaling, the expression of endogenous PKI-inhibitor protein (PKIA), which limits signaling of downstream of PKA was increased by 93% (p<0.001) in TGAC8 vs WT (Table S.3). Protein phosphatase 1 (PP1) was increased by 50% (Figure 16-supplement 1 A). The DopamineDARPP-32 feedback on cAMP signaling pathway was enriched and also activated in TGAC8 vs WT (Figure 15), the LV and plasma levels of dopamine were increased, and DARPP-32 protein was increased in WB by 269% (Figure 16-supplement 1 A).

      Thus, mechanisms that limit signaling downstream of AC-PKA signaling (βAR desensitization, increased PDEs, PKI inhibitor protein, and phosphoprotein phosphatases, and increased DARPP32, cAMP (dopamine- and cAMP-regulated phosphoprotein)) are crucial components of the cardio-protection circuit that emerge in response to chronic and marked increases in AC and PKA activities (Figure 4 C, F).” 

      4. Changes in ion channel (e.g. KCNQ1 and KCNJ2) gene and protein expression were described but not validated by assessment of change in function. 

      Reviewer #3 (Public Review): 

      Tarasov et al have undertaken a very extensive series of studies in a transgenic mouse model (cardiomyocyte-specific overexpression of adenylyl cyclase type 8) that apparently resists the chronic stress of excessive cAMP signaling for around a year or so without overt heart failure. Based on the extensive analyses, including RNAseq and proteomic screening, the authors have hunted for potential "adaptive" or "protective" pathways. There is a wealth of information in this study and the experiments appear to have been carefully performed from a technical viewpoint. Many interesting pathways are identified and there is plenty of information where additional experiments could be designed. 

      General comments 

      1. Ultimately, this is a descriptive and hypothesis-generating study rather than providing directly proven mechanistic insights.

      As noted in response to Reviewer #2: “Yes, we used a systems approach at first, in order to “open our eyes” so that we could get an overview of numerous changes that might have occurred in the TGAC8 heart in order to generate hypotheses that could later be tested by others and by us.”

      -Given several prior studies reporting a detrimental effect of chronically increased cAMP signaling, what is it that is different in this model? Is it something specific about AC8? Is it to do with when (in life) the stress commences? 

      We believe it is, at least in part, due to something specific about the effects of the marked increased activity of AC8 perse, because adenylyl cyclase singling impacts nearly all aspects of our current knowledge of cell biology. Thus, due to the marked increase of AC and PKA activation in the TGAC8 heart, the transcriptome and proteome gene ontology (GO) terms that differ in TGAC8 vs. WT covered nearly all biological processes and molecular functions within nearly all compartments of the TGAC8 LV myocardium.

      - Is the information herein relevant to stress adaptation in general or is it just something interesting in this specific mouse model?

      In our opinion, AC8 mouse model is very relevant to stress adaptation in general, but this broad view has hardly ever been realized previously in the literature, because of the reductionist nature (by necessity) of mainstream biomedical research. For example, reports on cardiac specific overexpression of AC5 and AC6 never provided broader view on these mice and were focused only on a limited number of traits i.e., arrhythmogenesis, chronic pressure overload, contraction (Am J Physiol Heart Circ Physiol. 2015 Feb 1;308(3):H240-9; Am J Physiol Heart Circ Physiol. 2010 Sep;299(3):H707-12; Clin Transl Sci. 2008 Dec;1(3):221-7; Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9986-90; Am J Physiol Heart Circ Physiol. 2013 Jul 1;305(1):H1-8). 

      None of the pathways that are apparently activated were directly perturbed so their mechanistic role requires further study.

      We agree and have entitled a section of our discussion “Opportunities for Future Scientific Inquiry Afforded by the Present Results” to address this plainly.

      Specific 

      1. The strain of the mice and their sex needs to be stated as well as the exact age at which the various assays were performed.

      All assays were performed on 3-month-old males. This information was inadvertently not directly stated in the original submission.  

      2. The hearts of the Tg mice have more cardiomyocytes but which are smaller. Since there is no observed increase in proliferation of cardiomyocytes, how (or when) did this increase in cell number occur?   

      It is likely that an increase in number of cardiomyocytes may have occurred during the embryonic stage of development (8.5 dpc), when AC8 expression begins. Since submitting our manuscript we have found that the expression level of human AC8 (the type of AC8 employed in this transgenic model) increases markedly during the embryonic period when compared to endogenous AC8 and remains elevated in both the fetal and perinatal periods. 

      3. While the mice do not show an increased mortality up to 12 months of age, HR/CO/EF are poor indices of contractile function. Data on end-systolic elastance or perhaps echo-based LV strain indices which will be relatively load-independent would be useful.

      Numerous comprehensive hemodynamic measurements have been performed previously on this mouse. For example, Mougenot et. al (Ref 10 in our manuscript), based on invasive hemodynamics analysis concluded that contractile function in the TGAC8 heart was increased at both 2 and 12 months of age. But Doppler imaging of the heart in conscience mice, unmasked, myocardial dysfunction, informed by a reduction in systolic strain rate in both old TGAC8 and WT littermates. This is why they attributed the heart failure in TGAC8 at 12 months of age to be a manifestation of accelerated aging.

      We agree with your comment that end-systolic elastance ought to be measured in the TGAC8 but also end-diastolic elastance, and effective arterial elastance should be measured in order to quantify diastolic function and heart energetic coupling in the TGAC8.  

      4.  Quite a lot of conclusions are made relating to metabolism. However, this is entirely based on gene expression or protein levels. Given the substantial role of allosteric regulation in metabolic control, as well as the interconnectedness of metabolic pathways, ultimately any robust conclusions need to be based on an assessment of activity of key pathways. 

      We concur and have described some of the types of metabolic assessments in the last section of our discussion “Opportunities for Future Scientific Inquiry Afforded by the Present Results”: “… precisely defining shifts in metabolism within the cell types that comprise the TGAC8 LV myocardium via metabolomic analyses, including fluxomics.97 It will be also important that future metabolomics studies elucidate post-translational modifications (e.g. phosphorylation, acetylation, ubiquitination and 14-3-3 binding) of specific metabolic enzymes of the TGAC8 LV, and how these modifications affect their enzymatic activity”.

    1. Author Response

      Reviewer #1 (Public Review):

      In their manuscript, these authors present a novel geostatistical framework for modelling the complex animal-environment-human interaction underlying Leptospira infections in a marginalised urban setting in Salvador, Brazil.

      In their work, the authors combine human infection data and the rattiness framework of Eyre et al. (Journal of the Royal Society Interface, 2020) . They use seroconversion defined as an MAT titer increase from negative to over 1:50 or a four-fold increase in titer for either serovar between paired samples from cohort subjects. Whereas this is a commonly used measure of infection; the work would benefit from answering the question about how robust results are related to this definition of seroconversion.

      Thank you for your comment. We have acknowledged this on line 534 in the discussion by adding the following text: “A possible limitation of this study is the titre rise cut-off values used for classifying seroconversion and reinfection in the cohort that determine the sensitivity and specificity of the infection criteria. However, these criteria were used because they are the standard definitions for serological determination of infection that are commonly applied for leptospirosis and a wide range of other infections, and they enable the comparison of results with other previous leptospirosis studies.”

      The model framework relies on the concept of 'rattiness' previously defined by Eyre et al. (JRSI, 2020) and assumes conditional independence within its built up (equation (1)). Whereas this is a reasonable assumption, it would be good to discuss situations in which this assumption is questionable and what the implications are for applying the modelling framework to other settings.

      We have added the following text immediately after “is shown schematically in Figure 2” following equation (1) on line 225: “The conditional independence assumption in (1) is reasonable for a vector-borne disease or one that is transmitted indirectly, in which context the observed rat indices are to be considered as noisy indicators of the unobservable spatial variation in the extent to which the environment is contaminated with rat-derived pathogen. It would be more questionable for applications in which the disease of interest is spread by direct transmission from rat to human.”

      The authors provide an extensive model building exercise and investigate, in different ways, whether the model captures the necessary complexity (GAM smoothers - testing linearity, spatial correlation, etc). I believe the work would benefit from (1) a formal diagnostic investigation, if feasible; (2) providing guidelines on how model building should be performed.

      We have added a new Appendix 7 with diagnostic plots of randomized quantile residuals to check the rattiness-infection model fit with the human infection data and included the following text in Section 2.4 of the main text: “A formal diagnostic investigation of randomized quantile residuals is included in Appendix 7. We found no evidence in the diagnostic plots to suggest that there were issues with our modelling approach.”

      To supplement the R code that is publicly available for repeating all of the steps in this analysis, we have now also included a detailed step-by-step explanation of the model building process in Appendix 8 that outlines the key steps for building the rat and infection components of the model (variable selection and evaluation of residual spatial autocorrelation) and fitting and examining the joint rattiness-infection model. We have added the following text in Section 2.6 of the main text: “We also include a step-by-step explanation of the model building process to guide future users of the rattiness-infection framework in Appendix 8.”

      The authors are to be acknowledged for providing an extensive and thorough discussion of the different aspects of their work. Whereas the discussion is complete, I wonder whether the authors can give a brief example about how this model can be applied in a different setting.

      Thank you. We have added the following text on line 551 in the discussion: “The framework may have important applications beyond the study of zoonotic spillover, with the rattiness component replaced by other exposure measures e.g. mosquito density or ecological indices (such as pollution, where there are multiple, related measures of air or groundwater quality) to model associations with human or animal health outcomes.”

      Reviewer #2 (Public Review):

      Eyre et al. developed and applied a novel geostatistical framework for joint spatial modeling of multiple indices of pathogen (Leptospira) reservoir (rats) abundance and human infection risk. This framework enabled evaluation of infection risk at a fine spatial scale and accounted for uncertainty in the pathogen reservoir abundance estimates. The authors used data collected in two different field projects: (1) a rat ecology study in which three different approaches were used to detect rat presence "rattiness", and (2) a prospective community cohort study in which individuals were sampled during two different time periods to detect recent infections via seroconversion or a four-fold increase in anti-Leptospira antibody MAT titer. Univariable and then multivariable analyses were performed on these data to identify (1) the environmental variables that best predicted "rattiness", and (2) the demographic/social, environmental (household), occupational, and behavioral variables that best predicted human risk of infection. Once identified, the best predictors from (1) and (2) were included in a final, joint model to identify the significant predictors of both 'rattiness' and human infection risk. As a result of this study, the authors were able to detect spatial heterogeneity in leptospiral transmission to humans. They found that infection risk associated with increases in reservoir abundance differed by elevation, and that increases in reservoir abundance at high elevation were associated with a much higher odds ratio for infection than at low elevation. The authors suggest that this has to do with differences in how the infectious leptospires (shed by the rat reservoir) are dispersed in the environment. At high elevations, flooding is less frequent and thus rat shed leptospires are likely to stay where the rat deposited them. Whereas at lower elevations, flooding may play a large role in spreading leptospires more evenly across the landscape, reducing the importance of rat presence at smaller spatial scales. The final best model was then used to generate prediction maps of 'rattiness' as well as human infection risk at all locations within the study area (i.e. including those that lacked rat detection data and human infection data. This work represents an important advance in infection risk modeling as it explicitly incorporates estimates of reservoir abundance and the uncertainty surrounding these estimates into the infection risk assessment, and allows for modeling of infection risk at fine spatial scales. Findings from this study have important management implications at the authors' study site as it suggests that interventions directed at high elevations should be different from those designed to address infection risk at lower elevations. However these are broader implications, as this novel approach may be applied to other systems to enable identification of differences in infection risk for other pathogens at a fine spatial scale, predict infection risk more broadly, and facilitate intervention strategies targeted for the specific epidemiological and ecological conditions experienced by a population.

      This was a well-designed study. The field sampling approach was well balanced, well described and appropriate. Broadly the modeling framework is appropriate for the questions being asked and for the data being used. The variable and model selection approaches were clearly described and appropriate. Evaluation of the more detailed mathematical approach is outside of my area of expertise, so I am unable to comment on the validity of the approach.

      For the most part, the explanatory variables assessed in the different models were well described and justified, however there were some cases for which further explanation would have been helpful. For example, how did the authors determine which occupations to evaluate? Specifically, why traveling salesperson? What is the difference between open sewer within 10 m and unprotected from sewer?

      We have added the following additional text to Section 2.3.2 on line 297 to clarify the definition and reason for inclusion for these variables: “In the household environment domain, two variables were used to capture risk due to sewer flooding close to the household: i) the presence of an open sewer within 10 metres of the household location and ii) a binary `unprotected from open sewer' variable which identified those households within 10 metres of an open sewer that did not have any physical barriers erected to prevent water overflow. Three high-risk occupations were included in the occupational exposures domain as binary variables. Construction workers and refuse collectors have direct contact with potentially contaminated soil, building materials and refuse in areas that provide harbourage and food for rats. Travelling salespeople have regular and high levels of exposure to the environment (particularly during flooding events) as they move from house to house by foot. Two other binary occupational exposure variables were included that measured whether a participant worked in an occupation that involves contact with floodwater or sewer water.”

      I also had some concerns regarding the time-period of the rat ecology study used to determine abundance, potential fluctuations in rat abundance through time, and how this might align with sampling to detect infection in humans. Depending on the time scale of population fluctuation in rats as well as fluctuations in infection prevalence in rats, the abundances calculated from data from the ecology study may not be accurately reflecting true abundance (and therefore shedding and transmission risk) during the time period that a human may have been exposed. However, the authors do a nice job of addressing some of these issues in the discussion. They mention that infection prevalence in rats is consistently around 80% and that there don't appear to be seasonal fluctuations in human exposure risk in the study area.

      Thank you.

      Reviewer #3 (Public Review):

      The goal of the authors was to test how important local rat abundance is as a driver of Leptospira infection in humans.

      The authors approached this using a strong combination of datasets on human infection risk and rat abundance, across a spatial scale that is large enough to allow simultaneous assessment of multiple potentially important drivers of infection risk. This further enables the authors to develop infection prediction maps based on the fitted models.

      This study design is a major advance towards understanding link between rat abundance and human infection risk.

      Based on the top models tested in the study, the authors conclude that local rat abundance is indeed correlated with infection risk, and that this correlation is strongest at higher elevation.

      This is an impactful finding, but in my opinion it is not yet clear how robust and important this is, because of two reasons:

      (1) The infection risk data: while the actual infection risk data are not shown, the map shown in Figure 5B suggests that there is an infection hotspot that happens to be at high elevation. This raises the question of how strongly this single hotspot is driving the observed correlation between rat abundance and infection risk (which the authors find to be much stronger at high elevation than at lower elevations).

      We have added a new figure (Figure 4) earlier on in the article (we decided to add this here rather than to Figure 6 - formerly Figure 5 - to ensure that the map is large enough that points in Figure 4A are easily visible – please note that it is included as a larger and easier to view image in the main eLife template version) with the raw infection data overlaid on contour lines for the three elevation levels to provide the reader with a better overview of the raw data. This new Figure 4 shows that out of a total of 403 participants in the high elevation region there were 16 infections, of which only 5 (31%) were located in the large hotspot in Valley 3 (valleys are numbered 1 to 3 from west to east, see Figure 1A). In addition to the largest hotspot in the north of Valley 3, there are several other areas in the high elevation region with raised predicted infection risk values relative to their surroundings where there were also rattiness hotspots and infected participants in the raw data: fives cases (red and yellow infection risk areas in Figure 5B) on the western side of Valley 2; the two cases on the eastern edge of Valley 2; the two cases on the western edge of Valley 3; and the single case in the southwest of Valley 3. Other variables are also important drivers of infection risk and at several of these locations the contribution of rattiness increases infection risk significantly relative to the low-risk surrounding area (e.g. to 10% in areas where risk is closer to 1% or 2%) without reaching the more obviously visible high infection risk values closer to 20%. We believe that our statistical model provides a better test of whether there is a statistical association between rattiness and infection at high elevations than a visual examination, but that this is supported by the large number of observations in the high elevation area (403) and the distribution of infected and uninfected households, which demonstrates that the observed association is not only driven by the hotspot in Valley 2.

      (2) The statistical models: if I understand correctly, all tested models of infection risk include the variable rat abundance, and while the individual effect estimates for rat abundance are statistically significant (Table 3), the more important question of how the fit of a model without the rat abundance variables compares with those of the other tested models (shown in Supplementary Table S2) has not been addressed.

      These models were considered but were ranked outside of the top five models and for this reason were not reported in Table S2. We agree that showing the AIC of a model without rattiness in this table can more clearly demonstrate the improved fit of the model with rattiness. To do this we have added the highest ranked model without rattiness (M) to Table S2 and added a note to the table explaining the reason for its inclusion (“Model M was ranked outside of the top 5 models but is included here for reference to demonstrate the improvement in model fit when rattiness is included”). The AIC of M* was 532.13. This is substantially higher than the top five models (M1 = 523.14 and M5 = 525.04), justifying its inclusion in this model and in the joint rattiness-infection framework.

      Regardless of whether rat abundance is an important driver of human infection risk, this study is a major step in our understanding of the role of rats in the spread of leptospirosis, due to the strong combination of a unique combination of datasets and a spatial statistical modeling approach.

      Thank you.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript discusses evolutionary patterns of manipulation of others' allocation of investment in individual reproduction relative to group productivity. Three traits are considered: this investment, manipulation of others' investment, and resistance to this investment. The main result of the manuscript is that the joint evolution of these traits can lead to the maintenance of diversity through, as documented here, cyclic (or noisier) dynamics. Although there are some analytical results, this main conclusion is instead supported by individual-based simulations, which seem correctly performed (but for clonal populations, as emphasized below).

      There could be material for a good paper here but the organization of the manuscript makes it difficult to fully evaluate. The narrative is highly condensed, with the drawbacks that this often entails in terms of accurately conveying the results of a study, as illustrated here by the following issue.

      The population is apparently assumed to be clonal (more than just "haploid"), meaning that there is no recombination between the loci controlling the three traits. In the one case where this assumption is relaxed (quite artificially), the cyclic dynamics disappear (section 4.4 of the appendix). This is crucial information that cannot be appreciated in the main text.

      The paragraph at line 368 offers a simple explanation for the joint dynamics of traits. However, this explanation would hold identically for a sexual population and a clonal population, whereas these two cases seem to have completely different dynamics. Thus, there is something essential to explain these differences, that is missing from the given explanation.

      Yes, our model was asexual with no recombination. To address this comment, we carried additional simulations where recombination was allowed (Appendix 1— 4.8). We found that recombination does not change our results (predictions), and describe this on line 469-475. By assuming additive effects of traits and each traits having the same dispersal property, our haploid asexual model is also equivalent to a diploid sexual model (Taylor 1996; Day & Taylor 1998).

      This is especially important because the finding that the joint evolution of several traits can lead to some form of diversity maintenance is not surprising. As the discussion acknowledges (but the introduction seems to downplay), it is also well understood that manipulation and counter-adaptations to it can occur in many contexts and lead to the maintenance of diversity. For this reason, similar results in the present case are not surprising, and the main outcome of the study should be to provide a deeper understanding of the forces leading to the different outcomes in the current models.

      I do not see clearly what distinguishes "manipulative cheating" from other forms of manipulations that have been previously discussed in the literature (e.g, as cited lines 461). Couldn't this be clarified by some kind of mathematical criterion?

      Thanks for pointing out that there is room to improve the distinction between our model and previous models! We have added more description to explain the conceptual difference on line 187-193, and a new subsection in appendix to show these differences through mathematically examine the fitness formulations in previous models (Appendix 1—1.3).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper addresses an important question: whether the conduction velocity in white matter tracts is related to individual differences in memory performance. The authors use novel MRI techniques to estimate the "g-ratio" in vivo in humans - the ratio of the inner axon relative to the inner axon plus its outer myelin sheath. They find that autobiographical recall is positively related to the g-ratio in a specific white matter tract (the parahippocampal cingulum bundle) in a population of 217 healthy adults. This main finding is extended by showing that better memory is associated with larger inner axon diameters and lower neurite dispersion, which suggests more coherently organised neurites. The authors also argue that their results show that the magnetic resonance (MR) g-ratio can reveal novel insights into individual differences in cognition and how the human brain processes information.

      The study is exploratory in nature and the analyses were not pre-registered. The technique has not been used before to associate cognitive performance with MR estimates of conduction velocity in candidate white matter tracts. It is therefore unknown how strong any associations are likely to be and what sort of sample size might be needed to observe them. Nevertheless, if the technique proves to be reliable, then it certainly offers a valuable new tool to understand individual differences in cognitive abilities. However, brain structure to behavior associations are notoriously variable across studies and have been argued to require very large sample sizes to obtain reproducible results.

      We respectfully disagree that the study was exploratory. We had distinct aims and hypotheses from the outset. Our prime interest is in autobiographical memory, the hippocampus and its connectivity. This motivated our focus on three specific white matter tracts. We also planned from the time of study design to examine the MR g-ratio, and even contributed to refining the pre-processing pipeline for this approach, as reported in a previous paper (Clark et al., 2021, Frontiers in Neuroscience). Moreover, in the current manuscript we outlined well thought through possible outcomes and declared specific predictions.

      Regarding pre-registration, due to the scope of this work, the experiment was planned eight years ago, and data collection commenced seven years ago. At that time, formal pre-registration was not common practice. However, it has been a long-standing feature of our Centre that proposed studies and their analysis plans undergo rigorous internal peer review, including presentation to the whole Centre, before data acquisition can commence. The proposal for the research under consideration here was presented on 26th September, 2014.

      As noted in our response to the Editors’ Public Evaluation Summary above, someone has to be the first to report a novel result, and we believe that the depth and transparency of our approach permits confidence in the findings. Not least, and to reprise, because we employed the most widely-used and best-validated method of testing autobiographical memory recall that is currently available – Levine’s Autobiographical Interview. Our primary analyses were performed using the behavioural outcome measure from this test, the results of which were directly compared to those from a closely-matched control measure to test whether significantly larger effects were observed for our variable of interest. The potential for false positives was further reduced by extracting microstructure data from hypothesised tracts of interest (instead of performing whole brain voxel-wise analyses), with statistical correction performed on all structure-behaviour analyses. Moreover, we performed partial correlations with age, gender, scanner and number of voxels in a region of interest (ROI) as covariates. Complementary investigations were also conducted using other commonly-reported measures, providing supporting evidence. We report all analyses (and provide all the source data), including those finding no relationships. The consistent results throughout were associations between autobiographical memory recall ability and the microstructure of the parahippocampal cingulum bundle only. Moreover, thanks to the excellent suggestions of the Reviewers, the revised version reports additional analyses that allow us to further corroborate and interpret our findings.

      Our sample of 217 participants allowed for sufficient power to identify medium effect sizes when conducting correlation analyses at alpha levels of 0.01 and when comparing correlations at alpha levels of 0.05 (Cohen, 1992, Psychological Bulletin). While it has recently been suggested that thousands of participants are required in order to investigate brain structure-behaviour associations (Marek et al., 2022, Nature), other, more sophisticated, analyses suggest that samples of ~200 participants can be sufficient, in line with our estimates (Cecchetti and Handjaras, https://psyarxiv.com/c8xwe; DeYoung et al., https://psyarxiv.com/sfnmk). Given that our study was principled, well-controlled, analysed appropriately and produced very specific and consistent findings, we are confident that the findings are robust.

      The authors decided to analyse performance on a single task - the Autobiographical Memory Interview - and identified three candidate white matter tracts that connect the hippocampal region with other brain regions. While it is clear why these three tracts were chosen, it is less obvious why the authors chose to investigate associations with the Autobiographical Memory Interview and not other memory tests that were part of the battery of tests administered to the participants. It is reasonable to assume that something as general as the conduction velocity of a white matter tract would have an effect on memory ability across a range of tasks, so to single out one seems an unnecessarily narrow focus.

      Our main interest over many years, and hence the focus of this study, is autobiographical memory recall because it directly relates to how people function in real life. As noted above, autobiographical experiences occur in dynamic, multisensory, multidimensional, non-linear, ever-changing contexts; they involve actively engaging with the environment and other people; they are embodied; they span milliseconds to decades. Many of these features cannot be captured by laboratory-based episodic memory tests. This issue is increasingly being discussed (for example, see recent reviews by Nastase et al., 2020, NeuroImage; Mobbs et al., 2021, Neuron; Miller et al., 2022, Current Biology). It is further laid bare in McDermott et al.’s (2009, Neuropsychologia) meta-analysis of functional MRI studies which showed that laboratory-based and autobiographical memory retrieval tasks differ substantially in terms of their neural substrates. Consequently, we were not surprised to find that when we analysed laboratory-based memory test performance, there were no correlations with the MR g-ratio. Recall of vivid, detailed, multimodal, autobiographical memories may rely on inter-regional connectivity to a greater degree than simpler, more constrained laboratory-based memory tests. Therefore, as well as speaking to conduction velocity, these findings also contribute to wider discussions about real-world compared to laboratory-based memory tests. We thank the Reviewer for making the excellent suggestion to include these additional data, analyses and discussion points.

      The results of the study are interesting and highlight a key role of the parahippocampal cingulum bundle in autobiographical memory recall. The results are corrected for multiple comparisons across the three fiber tracts of interest and the recall of "external details" provides a nice control compared to the "internal details" which are the measure of interest. The main findings are extended to show that it is likely to be an increase in axon diameter and an increase in neurite coherency that characterize those individuals with better autobiographical recall. Despite these positives, it remains unclear whether memory recall, in general, is better in people with higher g-ratios in this tract (as implied in the Abstract), or if this effect is specific to scores on the Autobiographical Memory Interview.

      Our interest is in autobiographical memory, and so we employed the most widely-used and best-validated method of testing autobiographical memory recall that is currently available – Levine’s Autobiographical Interview. Not only does this test include a control measure, external details (as noted by the Reviewer), but we had independent raters score the autobiographical memory descriptions, and found that the inter-class correlation coefficients were very high (see Materials and Methods). Despite using this current, gold standard approach, at the request of the Reviewer we have now analysed data from eight additional laboratory-based memory tests. These are standard memory tests that are often used in neuropsychological studies: testing recall - the immediate and delayed recall of the Logical Memory subtest of the Wechsler Memory Scale IV, the immediate and delayed recall of the Rey Auditory Verbal Learning Test, the delayed recall of the Rey–Osterrieth Complex Figure; testing recognition memory - the Warrington Recognition Memory Tests for Words and Faces; testing semantic memory - the “Dead or Alive Test”. While these tests can assess some aspects of memory recall, they cannot be regarded simply as proxies for autobiographical memory recall, for the reasons we outlined in our response to the previous point. They do not capture key aspects of autobiographical memories. It is therefore all the more interesting that we found no associations between these laboratory-based memory tasks and the MR g-ratio of the parahippocampal cingulum bundle, in contrast to the relationship identified with autobiographical memory recall ability. Recall of vivid, detailed, multimodal, autobiographical memories may rely on inter-regional connectivity to a greater degree than simpler, more constrained laboratory-based memory tests. Therefore, as well as speaking to conduction velocity, these findings also contribute to wider discussions about real-world compared to laboratory-based memory tests. We thank the Reviewer once again for making the excellent suggestion to include these additional data, analyses and discussion points.

      Reviewer #2 (Public Review):

      In this study, Clark and colleagues tackle a very intriguing question: how differences in autobiographical recall abilities reflect in the human brain structure and function? To answer this question, they interviewed a large cohort of subjects and proceeded to acquire MRI data, specifically diffusion-weighted imaging and magnetization transfer data, to estimate the g-ratio, a measure of myelination deeply linked to conduction velocity. Looking at three specific white matter pathways of interest, all interconnecting the hippocampus with other brain structures, they studied the relationship between the g-ratio and the autobiographical recall abilities, together with many more measures from MRI. They found a significant positive association between the g-ratio of the parahippocampal cingulum bundle and the number of inner details from the interviews. These results can provide new potential directions to further study the underlying neural features beyond memory.

      I think that this is a very interesting article, it is well written, the methods are extensively explained, and the appendix provides further details for more expert readers. The authors put an effort into providing a comprehensive context in the introduction and in the discussion, and as a result, the paper seems overall quite suitable for both general and specialistic readerships.

      Thank you.

      The main issue I can currently see in the paper is that the mentioned relationship between g-ratio and recall abilities is then used to infer that better recall abilities are associated with higher conduction velocity and larger axons. The authors' line of reasoning is that given the hypothesized association, the increase in the g-ratio implies increases in myelin and axonal diameter. Despite this scenario being indeed possible given the current result, an increased g-ratio may also not indicate higher conduction velocity. In fact, the first potential inference would be that, without having any information on the axon size, the quantify of myelin can indeed be lower and as result, the conduction velocity would decrease. I understand that the authors expected higher conduction velocity associated with better autobiographical memory recall, but it is hard to see any experimental outcome that could have disproved this hypothesis: from the possible scenarios depicted in the introduction, any change in the g-ratio (and even not any change at all) could indicate higher conduction velocity. What would be then needed to corroborate one of these scenarios is some independent or complementary measure, which unfortunately is missing.

      The mentioned issue does not mean that the paper loses relevance - I think that it should focus on the very practical result, a change in myelination and microstructure, and discuss what are the potential implications, including the one that currently dominates the discussion section.

      Thank you for these comments and the opportunity to provide further clarification.

      First, we have now provided additional background information regarding the relationship between the MR g-ratio and conduction velocity. We explicitly note that while finding a significant relationship between the MR g-ratio and autobiographical memory recall suggests the existence of an association between autobiographical memory recall and parahippocampal cingulum bundle conduction velocity, it cannot speak to the direction of this association.

      Second, we have further noted that interpretation of the parahippocampal cingulum bundle MR g-ratio in relation to the underlying microstructure requires knowledge, or an assumption, about whether the associated change in conduction velocity is faster or slower. Given that faster conduction velocity is thought to promote better cognition (e.g. Brancucci, 2012; Dicke and Roth, 2016; Miller, 1994; Reed and Jensen, 1992), we interpreted our MR g-ratio findings under the assumption of faster conduction velocity, and now explicitly note in several places in the revised manuscript that this is an assumption.

      Third, we thank the Reviewer for the excellent suggestion that a complementary measure could help to further inform the findings. Consequently, we now also include additional analyses examining the relationship between the extent of myelination and autobiographical memory recall ability. This is possible using the magnetisation transfer saturation maps, which are optimised to assess myelination. Given our assumption of faster conduction velocity when interpreting our positive MR g-ratio correlations, then no relationship between parahippocampal cingulum bundle magnetisation transfer saturation and autobiographical memory recall would be expected. On the other hand, if conduction velocity is actually decreasing, then a negative correlation between magnetisation transfer saturation values and autobiographical memory recall ability would be observed. In fact, we found no relationship between parahippocampal cingulum bundle magnetisation transfer saturation and autobiographical memory recall. This suggests that myelin was not associated with autobiographical memory recall ability, supporting our assumption that relationships with the MR g-ratio were indicative of faster rather than slower, conduction velocity.

      We have now added these new data, analyses and discussion points to the revised manuscript.

      It would also be helpful to include some paragraphs on both interpretation and methodological issues when it comes to MRI-based microstructural imaging, which at the moment is lacking. This would provide a better picture of the results for a more general readership.

      We agree, and additional consideration of interpretational and methodological limitations have now been included in the manuscript.

      As one of the first works using an MRI-based microstructural measure of myelin, the g-ratio, to study cognition in a large cohort of subjects, I think this work will be a needed and significant step towards merging the neuroscience and MRI physics community - the methodology presented here is robust and could be used in many other applications.

      Thank you.

      Reviewer #3 (Public Review):

      The manuscript adds useful information about how structural properties of the brain are related to individual differences in autobiographical memory. A novel metric is used to assess features of white matter in tracts that are important for information exchange between the hippocampus and other brain regions. In one of these, the parahippocampal bundle, a relationship between the MR g-ratio and autobiographical memory recall is identified. This represents new and interesting information. The authors interpret the results in line with the theory that speed of signal transmission is important for cognitive function.

      Thank you for this positive summary.

    1. Author Response

      Reviewer #1 (Public Review):

      Rasicci et al. have developed a FRET biosensor that is designed to light up when cardiac myosin folds. This structure is extremely important to understand, and its link to the super-relaxed (SRX) state has not been fully shown. Their study provides a comprehensive review of the literature and provides compelling data that the 15 heptad+leucine zipper+GFP construct does function well and that the DCM mutant E525K has a similar IVM velocity despite a reduced ATPase compared with HMM. They rely on the ionic strength-dependent changes in the rate of MantATP release to argue that the E525K mutation stabilizes the 'interacting heads motif' (IHM) state, which makes logical sense.

      Strengths:

      Well written and comprehensive.

      Utilizes the appropriate fluorescence-based sensor for measuring the folding of the myosin structure. Provides a detailed range of techniques to support the premise of the study

      Weaknesses:

      Over-interpretation of the outcomes from this study means that the IHM and SRX are the same. Similar studies, e.g. Anderson 2018 and Chu 2021 support the opposite view that IHM and SRX are not necessarily the same, Anderson (and Rohde 2018) point out that S1 has some element of a reduced ATPase, this clearly cannot be due to folding of the molecule. Also, mavacamten was used in these studies to show that even S1 is inhibited suggesting that SRX and IHM are not connected. This is not to say that with enough supporting evidence that these observations cannot be over-ridden, it is just not clear that there is enough in this study to support this conclusion.

      We have revised our discussion to emphasize that our results support a model in which the SRX state is enhanced by formation of the IHM, but given the S1 and 2HP data the IHM may not be required for populating the SRX biochemical state (see page 8).

      I felt that the authors passed over the recent Chu 2021 paper too quickly, the Thomas group used a FRET sensor as well and provides a direct comparison as a technique, but with opposite conclusions. They also have supporting data in Rohde 2018 that their constructs were less ionic strength sensitive. It would be useful to understand what the authors think about this.

      We have discussed the Rohde and Chu papers in more detail in the discussion (see page 8). In the Rhode paper they used proteolytically prepared HMM and S1. Rohde found 20% SRX at all KCl concentrations in S1, while HMM shifted from 50% to 20% SRX in low and high salt conditions, respectively. Our results are different in terms of the absolute fraction of the SRX state but the trend is similar in terms of S1 being salt-insensitive and HMM being salt-sensitive. The difference could be proteolytic HMM, which is a longer construct, and proteolytic S1, which is prone to internal cleavage that can impact ATPase activity. Another difference could be the mixed isoform of mantATP used in previous studies and the single isoform of mantATP used on our study (see page 5)

      Reviewer #2 (Public Review):

      The paper by Rasicci et al. examines the impact of the DCM mutation E525K in beta-cardiac myosin on its function and regulation by autoinhibition. The role of the auto-inhibited state of beta-cardiac myosin in fine-tuning cardiac contractility is an active and exciting area of current research related to muscle biology and cardiomyopathies. Several studies in the past have linked the destabilization of the autoinhibited, super-relaxed (SRX) state of myosin to the pathogenesis of hypertrophic cardiomyopathy. This timely study provides one of the first few examples where the hypocontractile phenotype of a DCM mutation has been linked to the stabilization of the SRX state.

      One of the strengths here is the utilization of a wide variety of both pre-existing and novel biochemical and biophysical assays for the study. The authors have characterized a new two-headed long-tailed myosin construct containing 15-heptad repeats of the proximal S2 (15HPZ), which they show allows myosin to form the SRX state in vitro using single ATP turnover assays. The authors go on to compare the E525K and WT proteins using the 15HPZ myosin construct in terms of their steady-state actin-activated ATPase activity, in-vitro actin-sliding velocity and single ATP turnover measurements. These assays reveal that the predominant effect of this mutation is the stabilization of the SRX state which is maintained even at 150 mM salt concentration where the WT SRX is largely disrupted. This is an important observation because DCM mutations so far have been believed to only affect the force-generating capacity of myosin.

      One of the biggest strengths of this study is the attempt to develop a FRET-based approach to directly ask if the biochemical SRX state here correlates well with the structural IHM state, which is an important unresolved question in the field. The authors have designed a FRET pair (C-terminal GFP and Cy3ATP bound to the active site) that is sensitive to the relative position of the heads and the tail, allowing them to distinguish between the low-FRET closed IHM conformation and the no-FRET open conformation. Remarkably, the authors show that the salt dependence of the FRET efficiency values closely follows their results from the salt dependence of the percent SRX for both WT and E525K proteins. The authors then attempt to substantiate their FRET results by a direct visual analysis of the conformational states populated by both WT and E525K proteins at low salt using negative staining EM analysis. The authors have optimized conditions to allow the deposition of the IHM state on grids without adding the small molecule mavacamten, which was found to be necessary in an earlier study to visualize the closed state using EM. The authors conclude that the SRX state correlates well with the IHM state and that the E525K mutation indeed stabilizes the folded-back conformation of myosin.

      This study significantly strengthens the previously illustrated correlation between the SRX and IHM states and provides methodological advances (especially visualization of the IHM state by negative EM in the absence of cross-linking agents) that will be very useful to the field going forward. The observation that a DCM mutation can lead to stabilization of the folded back state is a novel insight that should spark interest in the field to test how broadly this applies to other DCM mutations. The conclusions of the paper are mostly supported by the data; however, some clarifications and qualifications are needed.

      Weaknesses:

      The extremely low enzymatic activity of the M2β 15HPZ myosins as compared to the WT S1 control (which is a historical control not assayed in parallel with the 15HPZ proteins), is concerning for the low protein quality of the 15HPZ myosins. The authors attribute the low kcat to the high proportion of SRX population in their ensembles. However, the DRX rates reported for the WT and E525K 15HPZ proteins in the single ATP turnover assay are ~3-4 fold lower than those of their S1 counterparts. These rates reflect basal turnover of ATP in the open state and thus should not be affected by the presence of the S2 tail, which leads to concerns about the 15HPZ protein activity. In addition, the very high percentage of stuck filaments in the in vitro motility assay for the 15HPZ constructs (despite the use of dark actin) is also concerning for significant amounts of enzymatically inactive protein.

      We thank the reviewer for pointing out the differences in the S1 and HMM DRX rates. We performed additional single turnover measurements with S1, adding two sets of measurements from one additional preparation (N=3), and we demonstrate that there is a significant increase in the DRX rates of WT S1 compared to WT HMM (see pages 4-5, Table 3, Figure 3- figure supplement 3). A faster rate in S1 was also reported in Rohde et al. 2018. Indeed, the DRX rates of E525K S1 are significantly higher than WT in S1, which we also now report in the results (see page 5, Figure 3 – figure supplement 3). We addressed the concerns about 15HPZ activity by performing NH4+ ATPase assays to demonstrate that the number of active heads was similar in S1 and 15HPZ HMM (see page 4). It is possible that the higher percentage of stuck filaments in the HMM motility is due to myosin heads in the IHM state on the motility surface, which generate a drag force by non-specifically interacting with actin, but further study is necessary to examine this question.

      The authors assert that the E525K mutation represents a new mechanism by which DCM-causing mutations lead to decreased contractility - by stabilizing the sequestered state rather than affecting motor function. However, there is no evaluation of the motor function (actin-activated ATPase activity or in vitro motility) of the E525K S1, which would reveal the effects of the mutation without confounding effects due to the sequestering of heads. Interestingly, in the single ATP turnover assay, the DRX rate of the E525K S1 is >2-fold higher than the WT control, suggesting that the mutation may have effects beyond stabilization of the SRX state. The conclusion that the E525K mutation's effect on myosin function is mediated via stabilization of the SRX state would be strengthened if the effects of the mutation on the motor domain alone were also known.

      We thank the reviewer for this suggestion. We performed actin-activated ATPase assays with WT and E525K S1 and found that E525K increases kcat and lowers KATPase, demonstrating enhanced intrinsic motor activity in the mutant S1 construct (see page 4, Figure 2B). This adds an interesting dimension to the manuscript because we report a mutant that enhances the intrinsic motor activity but stabilizes the SRX/IHM (see Discussion page 10). We did not perform in vitro motility, because this assay depends on the surface attachment strategy, and we would like to compare all constructs with the same attachment strategy using a C-terminal GFP tag (mutant and WT S1 and 15HPZ HMM). Therefore, we are making the S1 construct with a C-terminal GFP tag for this purpose, to be examined in a future study.

      While the authors show strong qualitative correlations between the SRX and IHM states using single ATP turnover, FRET, and EM experiments, attempts to quantitatively compare the fraction of heads in the IHM state using the various experimental approaches is problematic. For example, the R0 value of the FRET pair used here doesn't allow precise measurement of the distances being probed here to be made, but the distances are reported and compared to predicted distances. The authors report that the R0 for their FRET pair is 63 Å. Surprisingly the authors go on to use the steady-state FRET efficiency values to determine the average D-A distance (Fig 5B) which is 100 Å when all heads are in the IHM configuration and becomes larger than that when heads open. R0 of 63 Å allows a precise distance measurement to be made in the 31.5-94.5 Å range which corresponds to 0.5-1.5 R0. It is therefore technically incorrect to use the steady-state FRET efficiency values to determine the D-A distance here. Besides, there are several unknown factors here like orientation factor (κ2) which further complicate these calculations. Similarly, the quantification of IHM state molecules from the negative stain EM experiments is significantly hampered by the disruptive effect of the grid surface on the structure of the IHM state. The authors find that limiting the contact time with the grid to ~ 5s is necessary to preserve the IHM state.

      Despite that, only ~15% WT molecules were seen in the IHM state at low salt (Fig. 6B). In contrast, ~56% E525K molecules were in the IHM state. Both these proteins have similar SRX proportions (Fig. 3C) and similar FRET efficiency values (Fig. 5A) at this salt concentration. This mismatch highlights the problem arising due to not having a measure of the populations from the FRET data. It is not clear if the hugely different proportions of the IHM state in EM experiments are indicative of the relative stability of this state in the two proteins or a random difference in the electrostatic interactions of WT vs mutant with the grid. These experiments do not provide a correct idea of the %IHM in the two proteins. In the absence of any IHM population measurement, it is important to proceed with caution when quantitatively correlating the SRX and IHM.

      We thank the reviewer for pointing out that measuring precise distances by FRET can be difficult. We agree that the low FRET efficiency makes precise distance determination even more challenging. However, FRET is quite good at measuring a change in distance given a specific donor-acceptor pair. We feel our FRET biosensor clearly demonstrates FRET efficiencies that are salt-insensitive in E525K but a clear decrease in FRET at higher salt concentrations in WT. In order to compare the trend in the predicted FRET, based on the single turnover measurements, and the actual FRET we thought it was important to plot the two together on the same graph. We understand that this could have been misleading that we were reporting actual distances. We have now plotted the FRET efficiency instead of distance as a function of KCl concentration (Figure 5B), to prevent any confusion with reporting distances. In addition, we have emphasized that the data are plotted to allow for a comparison of the trend in the single turnover and FRET data (see page 6, 10, Figure 5B).

      We agree that it is important to proceed with caution when comparing the EM to the FRET and single turnover data. The EM does not give a quantitative estimate of the fraction of IHM molecules, due to the disruptive effect of the grid surface on protein conformation. However, it does provide direct (though qualitative) evidence that the conformation underlying SRX and enhanced FRET is the IHM, and it is consistent with our interpretation that the E525K mutation enhances FRET and SRX by stabilizing the IHM. To consolidate this result, we have performed EM experiments now with a total of 3 preparations of WT and mutant (see page 6-7 and Figure 6D). We find that while there is variability from experiment to experiment, likely because the grid surface is slightly different each time the experiment is performed, in all cases there was a ~4-fold higher fraction of folded molecules in the mutant. Since each WT/mutant experimental pair was studied in parallel, using identically prepared grids, the results provide further evidence that the mutant stabilizes the IHM. However, we agree that a quantitative, direct visual correlation of the SRX and IHM is not possible based on the current EM data.

      Finally, the utility of the methods described in the paper to the field would be greatly enhanced if they were described in more detail. As currently written, it would be difficult for others to replicate these experiments.

      Thank you for the comment. We have made significant changes in the methods to clarify the details of the experiments (see pages 11-14). In addition, we have added details to the results and figure legends.

    1. Author Response

      Reviewer #1 (Public Review):

      “This study investigates the dynamics of brain network connectivity during sustained experimental pain in healthy human participants. To this end, capsaicin was applied to the tongues of two cohorts of participants (discovery cohort, N=48; replication cohort, N=74). This procedure resulted in pain for several minutes. During sustained pain, pain avoidance/intensity ratings and fMRI scans were obtained. The analyses (i) compare the pain state with a resting state, (ii) assess the dynamics of brain networks during sustained pain, and (iii) aim to predict pain based on the dynamics of brain networks. To this end, the analyses focus on community structures of time-evolving networks. The results show that sustained pain is associated with the emergence of a brain network including somatomotor, frontoparietal, basal ganglia and thalamic brain areas. The somatomotor area of the tongue is particularly involved in that network while this area is decoupled from other parts of the somatomotor cortex. Moreover, the network configuration changes over time with the frontoparietal network decoupling from the somatomotor network. Frontoparietal-cerebellar connections were predictive of decreases of pain. Together, the findings provide novel and convincing insights into the dynamics of brain network during sustained pain.

      Strengths

      • The brain mechanisms of sustained pain is a timely and relevant topic with potential clinical implications.

      • Assessing the dynamics of sustained pain and relating it to the dynamics of brain networks is a timely and promising approach to further the understanding of the brain mechanisms of pain.

      • The study includes discovery and replication cohorts and pursues a cutting-edge analysis strategy.

      • The manuscript is very well-written and the results are visualized in an exemplary manner including a graphical outline and summary of the findings.”

      We thank the reviewer for the thoughtful summarization and evaluation of our study.

      “Weaknesses

      • It remains unclear whether the changes of brain networks over time simply reflect the duration of sustained pain or whether they essentially reflect different levels of pain intensity/avoidance.”

      We appreciate the editor and reviewer’s comment on this issue. With the current experimental paradigm, it is difficult to dissociate the pain duration from the level of pain because the delivery of oral capsaicin commonly induces initial bursting and then a gradual decrease of pain over time. That is, the pain duration is correlated with the pain intensity in our task.

      However, when we examined the time-course of the ratings at each individual level (as shown in Figure S2), the time duration explained 53.7% of the rating variance, R2 = 0.537 ± 0.315 (mean ± standard deviation). In addition, if we constrain the beta coefficient of the time duration to be negative (i.e., ratings should decrease over time), the explained variance decreases to 48.2%, R2 = 0.482 ± 0.457, leaving us enough variance (i.e., greater than 50%) for examining the distinct effects of time duration and ratings on the patterns of functional brain reorganization.

      Indeed, the two main analyses included in the manuscript—consensus community detection and predictive modeling—were designed to examine those two aspects of the task, i.e., time duration and pain avoidance ratings, respectively. First, through the consensus community detection analysis, we examined the community structure that changes over time, i.e., across the early, middle, and late periods (as shown in Figure 3). We then developed predictive models of pain avoidance ratings in the second main analysis (as shown in Figure 5).

      Though it is still a caveat that we cannot fully dissociate the effects of time duration versus pain ratings, we could interpret the first set of results to be more about time duration, while the second set of results is more about pain ratings.

      We now added a description of the implication of predictive modeling for isolating the effects of pain ratings. In addition, a discussion on the caveat of the current experimental design and relevant future direction.

      Revisions to the main manuscript:

      p. 25: Moreover, developing models to directly predict the pain ratings is helpful to complement the group-level analysis, because the changes in consensus community structure over the early, middle, and late periods only indirectly reflect the different levels of pain.

      p. 27: This study also had some limitations. First, with the current experimental paradigm, it is difficult to dissociate the pain duration from the level of pain because the delivery of oral capsaicin commonly induces initial bursting and then a gradual decrease of pain over time. Though we aimed to model the effects of pain duration and pain avoidance ratings with our two primary analyses, i.e., consensus community detection and predictive modeling, we cannot fully dissociate the impact of time duration versus pain ratings.

      “• Although the manuscript is very well-written it might benefit from an even clearer and simpler explanation of what the consensus community structure and the underlying module allegiance measure assesses.”

      We thank you for the suggestion. Now we added additional (but simple) descriptions of module allegiance and consensus community detection methods.

      Revisions to the main manuscript:

      pp. 8-9: Here, the consensus community means the group-level representative structures of the distinct community partitions of individuals. To determine the consensus community across different individuals and times, we first obtained the module allegiance (Bassett et al., 2011) from the community assignment of each individual. Module allegiance assesses how much a pair of nodes is likely to be affiliated with the same community label, and is defined as a matrix T whose element Tij is 1 when nodes i and j are assigned to the same community and 0 when assigned to different communities. This conversion of the categorical community assignments to the continuous module allegiance values allows group-level summarization of different community structures of individuals.

      p. 14: Here, high module allegiance indicates the voxels of two regions are likely to be in the same community affiliation, and vice versa.

      “• The added value of the assessment of the dynamics of brain networks remains unclear. Specifically, it is unclear whether the current analysis of brain networks dynamics allows for a clearer distinction between and prediction of pain and no-pain states than other measures of static or dynamic brain activity or static measures of brain connectivity.”

      The main goal (and thus, the added value) of the current study was to provide a “mechanistic” understanding of the brain processes of sustained pain, rather than the “prediction.” Even though we included the results from the predictive modeling, as in Figures 4-6, our focus was more on the interpretation of the model to quantitatively examine the functional changes in the brain, not on the maximization of the prediction performance.

      Indeed, maximizing the prediction performance was the main goal of our previous study (Lee et al., 2021), in which we developed a predictive model of sustained pain based on the patterns of dynamic functional connectivity. The model showed better prediction performances compared to the current study, but it was challenging to interpret the model because of the high dimensionality of the model and its features. In addition, functional connectivity itself provides only limited insight into how functional brain networks are structured and reconfigured over time.

      In this sense, the multi-layer community detection method has several advantages to achieving our goal. First, the community detection analysis allows us to summarize the complex, high-dimensional whole-brain connectivity patterns into neurobiologically interpretable subsystems. Second, the multi-layer community detection method allows us to study the temporal changes in community structure by connecting the same nodes across different time points.

      Now we added a description of the rationale behind the choice of the multi-layer community detection analysis over the conventional functional connectivity methods, and the added value of our study.

      Revisions to the main manuscript:

      p. 3: In this study, we examined the reconfiguration of whole-brain functional networks underlying the natural fluctuation in sustained pain to provide a mechanistic understanding of the brain responses to sustained pain.

      p. 7: In this study, we used this approach to examine the temporal changes of brain network structures during sustained pain, which cannot be done with conventional functional connectivity-based analyses (Lee et al., 2021).

      p. 27: However, the previous model provides a limited level of mechanistic understanding because of the high dimensionality of the model and its features. In addition, functional connectivity itself provides only limited insight into how functional brain networks are structured and reconfigured over time.

      Reviewer #2 (public Review):

      “The Authors J-J Lee et al., investigated cortical and subcortical brain networks and their organization in communities over time during evoked tonic pain. The paper is well-written, and the findings are interesting and relevant for the field. Interestingly, other than confirming well known phenomena (e.g., segregation within the primary somatomotor cortex) the Authors identified an emerging "pain supersystem" during the initial increase of pain, in which subcortical and frontoparietal regions, usually more segregated, showed more interactions with the primary somatomotor cortex. Decrease of pain was instead associated to a reconfiguration of the networks that sees subcortical and frontoparietal regions connected with areas of the cerebellum. The main novelty of the proposed analysis, lies in the resulting high performances of the classifier, that shows how this interesting link between frontoparietal network and subcortical regions with the cerebellum, is predictive of pain decrease. In summary, the main strengths of the present manuscript are: • Inclusion of subcortical regions: most of the recent papers using the Shaefer parcellation in ~200 brain areas1, do not consider subcortical areas, ignoring possible relevant responses and behaviors of those regions. Not only the Authors smartly addressed this issue, but most of their results showed how subcortical regions played a key role in the networks reconfiguration over time during evoked sustained pain.

      • Robust classification results: high accuracy obtained on training dataset (internal validation), using a leave-one-out approach, and on the available independent test dataset (external validation) of relatively large sample size (N=74).

      • Clarity in the description of aim and sub-aims and exhaustive presentation of the obtained results helped by appropriate illustrations and figures (I suggest less wording in some of them).

      • Availability of continuous behavioral outcome (track ball).”

      We appreciate the reviewer’s summary and positive evaluations.

      “Even though the results are mostly cohesive with previous literature, some of the results need to be discussed in relationship to recently published papers on the same topic as well as justifying some of the non-standard methodological procedures adding appropriate citations (or more detailed comments). The Authors do not touch upon the concept of temporal summation of pain, historically associated with tonic pain, especially when the study is finalized to better understanding brain mechanisms in chronic pain populations (chronic pain patients often exhibit increased temporal summation of pain2). I would suggest starting from the paper recently published by Cheng et al. that also shares most of the methodological pipeline3 to highlight similarities and novelties and deepen the comparison with the associated literature.”

      We thank the reviewer and editor for the comment on this important topic. Temporal summation of pain indicates progressively increased sensation of pain during prolonged noxious stimulation (Price, Hu, Dubner, & Gracely, 1977), and has been suggested as a hallmark of chronic pain disorders including fibromyalgia (Cheng et al., 2022; Price et al., 2002). In a recent study by Cheng et al. (2022), the authors induced tonic pain using constantly high cuff pressure and examined whether the participants experienced increased pain in the late period compared to the early period of pain. On the contrary, in our experimental paradigm, the capsaicin liquid initially delivered into the oral cavity is being cleaned out by saliva, and thus overall pain intensity was decreasing over time, not increasing (Figure 1B). Therefore, the temporal summation of pain may occur in a limited period (e.g., the early period of the run), but it is difficult to examine its effect systematically in our study.

      However, it is notable that Cheng et al.’s results overlap with our findings. For example, Cheng et al. reported the intra-network segregation within the somatomotor network and the inter-network integration between the somatomotor and other networks during the temporal summation of pressure pain in patients with fibromyalgia, which were similar to the findings we reported in Figure S9 and Figure 4. Although it is unclear whether these results reflect the temporal summation of pain, these network-level features shared across the two studies are likely to be an essential component of the sustained pain processes in the brain.

      Now we added a comment on the temporal summation of pain in the main manuscript.

      Revisions to the main manuscript (p. 26):

      Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      We thank the reviewer and editor for the information about this recent publication. Cheng et al. (2022) was not published at the time we wrote the manuscript, and we were surprised that Cheng et al. shares many aspects with our study, e.g., both used multilayer community detection and also reported similar findings, as described above.

      However, there were some differences between the two studies as well.

      First, the focus of our study was on the brain dynamics during the natural time-course of sustained pain from its initiation to remission in healthy participants, whereas the focus of Cheng et al. was on the temporal summation phenomenon of pain (TSP) and the enhanced TSP in patients with fibromyalgia patients. Because of this difference in the research focuses, our study and Cheng et al. are providing many nonoverlapping results and insights. For example, our study paid particular attention to the coping mechanisms of the brain (e.g., the network-level changes in the subcortical and frontoparietal network regions) and the brain systems that are correlated with the natural decrease of pain (e.g., the cerebellum in Figure 5). In contrast, Cheng et al. (2022) identified the brain connectivity and network features important for the increased TSP in fibromyalgia patients.

      Second, our great interest was in identifying and visualizing the fine-grained spatiotemporal patterns of functional brain network changes over the period of sustained pain. To utilize fine-grained brain activity information, we conducted our main analyses at a voxel-level resolution and on the native brain space, such as in Figures 2-3 and Figures S5, S7, and S8. With this fine-grained spatiotemporal mapping, we were able to identify small, but important voxel-level dynamics.

      We now cited Cheng et al. (2022) in multiple places and revised the manuscript accordingly.

      Revisions to the main manuscript (p. 26):

      Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      “Here the main significant weaknesses of the study:

      • The data analysis is entirely conducted on young healthy subjects. This is not a limitation per se, but the conclusion about offering new insights into understanding mechanisms at the basis of chronic pain is too far from the results. Centralization of pain is very different from summation and habituation, especially if all the subjects in the study consistently rated increased and decreased pain in the same way (it never happens in chronic pain patients). A similar pipeline has been actually applied to chronic pain patients (fibromyalgia and chronic back pain)3,4. Discussing the results of the present paper in relationship to those, could offer a more robust way to connect the Authors' results to networks behavior in pathological brains.”

      We are grateful for the opportunity to discuss the clinical implication of our study. First of all, we agree with the reviewer and editor that we cannot make a definitive claim about chronic pain with the current study, and thus, we revised the last sentence of the abstract to tone down our claim.

      Revisions to the main manuscript (p. 2, in the abstract):

      This study provides new insights into how multiple brain systems dynamically interact to construct and modulate pain experience, advancing our mechanistic understanding of sustained pain.

      However, as we noted above in E-4, some of our findings were consistent with the findings from a previous clinical study (Cheng et al., 2022), suggesting the potential to generalize our study to clinical pain conditions. In addition, we previously reported that a predictive model of sustained pain derived from healthy participants performed better at predicting the pain severity of chronic pain patients than the model derived directly from chronic pain patients (Lee et al., 2021), highlighting the advantage of the “component process approach.”

      The component process approach aims to develop brain-based biomarkers for basic component processes first, which can then serve as intermediate features for the modeling of multiple clinical conditions (Woo, Chang, Lindquist, & Wager, 2017). This has been one of the core ideas of the Research Domain Criteria (RDoC) (Insel et al., 2010) and the Hierarchical Taxonomy of Psychopathology (HiTOP) (Kotov et al., 2017). If the clinical pain of a patient group is modeled as a whole, it becomes unclear what is being modeled because of the multidimensional and heterogeneous nature of clinical pain (Melzack, 1999) as well as other co-occurring health conditions (e.g., mental health issues, medication use, etc.). The component process approach, in contrast, can specify which components are being modeled and are relatively free from heterogeneity and comorbidity issues by experimentally manipulating the specific component of interest in healthy participants.

      The current study was conducted on healthy young adults based on the component process approach. We used oral capsaicin to experimentally induce sustained pain, which unfolds over protracted time periods and has been suggested to reflect some of the essential features of clinical pain (Rainville, Feine, Bushnell, & Duncan, 1992; Stohler & Kowalski, 1999). Therefore, the detailed characterization of the brain processes of sustained pain will be able to serve as an intermediate feature of multiple clinical conditions in future studies.

      Now we added the discussion on the clinical generalizability issue in the discussion section.

      Revisions to the main manuscript:

      pp. 25-26: An interesting future direction would be to examine whether the current results can be generalized to clinical pain. Experimental tonic pain has been known to share similar characteristics with clinical pain (Rainville et al., 1992; Stohler & Kowalski, 1999). In addition, in a recent study, we showed that an fMRI connectivity-based signature for capsaicin-induced orofacial tonic pain can be generalized to chronic back pain (Lee et al., 2021). Therefore, a detailed characterization of the brain responses to sustained pain has the potential to provide useful information about clinical pain.

      p. 26: Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      “Vice versa, the behavioral measure used to assess evoked pain perception (avoidance ratings), has been developed for chronic pain patients and never validated on healthy controls5. It might not be an appropriate measure considering the total absence of pain variability in the reported responses over forty-eight subjects6,7.”

      We acknowledge that pain avoidance measures are not fully validated in the healthy population. Nevertheless, we used this measure in this study for the following two main reasons that outweigh the limitations.

      First, a pain avoidance rating provides an integrative measure that can reflect the multi-dimensional aspects of sustained pain. One of the essential functions of pain is to avoid harmful situations and promote survival, and the avoidance motivation induced by pain is composed of not only sensory-discriminative, but also cognitive components including learning, valuation, and contexts (Melzack, 1999). According to the fear-avoidance model (Vlaeyen & Linton, 2012), if the pain-induced avoidance motivation is not resolved for a long time and is maladaptively associated with innocuous environments, chronic pain is likely to develop, suggesting the importance and clinical relevance of pain avoidance measures. In addition, our experimental design is particularly suitable for the use of avoidance rating because the oral capsaicin stimulation is accompanied by the urge to avoid the painful sensation, but it cannot immediately be resolved similar to chronic pain. Moreover, capsaicin is sometimes experienced as intense but less aversive (or even appetitive) in some cases, e.g., spicy food craver (Stevenson & Yeomans, 1993). In this case, avoidance ratings can provide a more reasonable measure of pain compared to the intensity rating.

      Second, the avoidance measure provides a common scale on which we can compare different types of aversive experiences, allowing us to conduct specificity tests for a predictive model of pain. For example, a recent study successfully compared the brain representations of two types of pain and two types of aversive, but non-painful experiences (e.g., aversive auditory and visual experiences) using the same avoidance measure (Ceko, Kragel, Woo, Lopez-Sola, & Wager, 2022). These comparisons were possible because the avoidance measure provided one common scale for all the aversive experiences regardless of their types of stimuli.

      To provide a better justification for the use of the avoidance measure, we now included the specificity test results of our pain predictive models. More specifically, we tested our module allegiance-based SVM and PCR models of pain on the aversive taste and aversive odor conditions (Figure S13).

      Despite these advantages, the use of avoidance rating without thorough validation is a limitation of the current study, and thus future studies need to examine the psychometric properties of the avoidance rating, e.g., examining the relationship among pain intensity, unpleasantness, and avoidance measures. However, the current study showed that the predictive models derived with pain avoidance rating (Study 1) could be used to predict the pain intensity rating (Study 2). In addition, the overall time-course of pain avoidance ratings in Study 1 was similar to the time-course of pain intensity ratings in Study 2, providing some supporting evidence for the convergent validity of the pain avoidance measure.

      As to the following comment, “It might not be an appropriate measure considering the total absence of pain variability in the reported responses over forty-eight subjects,” there are pieces of evidence supporting that the low between-individual variability of ratings is due to the characteristics of our experimental design, not to the fact that we used the avoidance measure. As we discussed in more detail in our response to E-1, our experimental procedure based on capsaicin liquid commonly induces the initial burst of painful sensation and the subsequent gradual relief for most of the participants (Figure 1B, left). A similar time-course pattern of ratings was observed in Study 2 (Figure 1B, right), which used the pain “intensity” rating, not the pain avoidance rating. In addition, previous studies with a similar experimental design (i.e., intra-oral capsaicin application) (Berry & Simons, 2020; Lu, Baad-Hansen, List, Zhang, & Svensson, 2013; Ngom, Dubray, Woda, & Dallel, 2001) also showed a similar time-course of pain ratings with low between-individual variability regardless of the rating types (e.g., VAS or irritation intensity), confirming that this observation is not unique to the pain avoidance rating.

      Now we added descriptions on the small between-individual variability of pain ratings and the use of avoidance ratings.

      Revisions to the main manuscript:

      pp. 5-7: Note that the overall trend of pain ratings over time was similar across participants because of the characteristics of our experimental design, which has also been observed in the previous studies that used oral capsaicin (Berry & Simons, 2020; Lu et al., 2013; Ngom et al., 2001). However, also note that each individual’s time-course of pain ratings were not entirely the same (Figures S2 and S3).

      p. 26: However, there are also differences between the characteristics of capsaicin-induced tonic pain versus clinical pain. For example, clinical pain continuously fluctuates over time in an idiosyncratic pattern (Apkarian, Krauss, Fredrickson, & Szeverenyi, 2001), whereas capsaicin-induced tonic pain showed a similar time-course pattern across the participants—i.e., increasing rapidly and then decreasing gradually (Figure 1B). This typical time-course of pain ratings has been reported in previous studies that used oral capsaicin (Berry & Simons, 2020; Lu et al., 2013; Ngom et al., 2001).

      pp. 26-27: Note that Study 1 used a pain avoidance measure that is not yet fully validated in healthy participants. However, we chose to use the pain avoidance measure, which can provide integrative information on the multi-dimensional aspects of pain (Melzack, 1999; Waddell, Newton, Henderson, Somerville, & Main, 1993). It also has a clinical implication considering that the maladaptive associations of pain avoidance to innocuous environments have been suggested as a putative mechanism of transition to chronic pain (Vlaeyen & Linton, 2012). Lastly, the avoidance measure can provide a common scale across different modalities of aversive experience, allowing us to compare their distinct brain representations (Ceko et al., 2022) or test the specificity of their predictive models (Lee et al., 2021) (Figure S13). Although the psychometric properties of the pain avoidance measure should be a topic of future investigation, we expect that the pain avoidance measure would have a high level of convergent validity with pain intensity given the observed similarity between pain avoidance (Study 1) and pain intensity (Study 2) in their temporal profiles. The generalizability of our PCR model across Studies 1 and 2 also supports this speculation. However, there would also be situations in which pain avoidance is dissociated from pain intensity. For example, capsaicin can be experienced to be intense but less aversive or even appetitive in some contexts, such as cravings for spicy food (Stevenson & Yeomans, 1993). In addition, the gradual rise of avoidance ratings during the late period of the control condition in Study 1 would not be observed if the intensity measure was used. Future studies need to examine the relationship between pain avoidance and the other pain assessments and the advantage of using the pain avoidance measure.

      “• The dynamic measure employed by the Authors is better described from the term "windowed functional connectivity". It is often considered a measure of dynamic functional connectivity and it gives information about fluctuations of the connectivity patterns over time. Nevertheless, the entire focus of the paper, including the title, is on dynamic networks, which inaccurately leads one to think of time-varying measures with higher temporal resolution (either updating for every acquired time point, as the Authors did in their previous publication on the same dataset4, or sliding windows involving weighting or tapering8,9). This allows one to follow network reorganization over time without averaging 2-min intervals in which several different brain mechanisms might play an important role3,10,11. In summary, the assumption of constant response throughout 2-min periods of tonic pain and the use of Pearson correlations do not mirror the idea of dynamic analysis expressed by the Authors in title and introduction. I would suggest removing "dynamic" from the title, reduce the emphasis on this concept, address possible confounds introduced by the choice of long windows and rephrase the aim of the study in terms of brain network reconfiguration over the main phases of tonic pain experience.”

      Now we removed the word ‘dynamic’ from many places in the manuscript, including the title. In addition, we added a brief discussion on the reason we chose to use the long and non-overlapping windows for connectivity calculation.

      Revisions to the main manuscript (p. 8):

      Although the long duration of the time window without overlaps may obscure the fine-grained temporal dynamics in functional connectivity patterns, we chose to use this long time window based on previous literature (Bassett et al., 2011; Robinson, Atlas, & Wager, 2015), which also used long time windows to obtain more reliable estimates of network structures and their transitions.

      “• Procedure chosen for evoking sustained pain. To the best of my knowledge, capsaicin sauce on the tongue is not a validated tonic pain procedure. In favor of this argument is the absence of inter-subject variability in the behavioral results showed in the paper, very unusual for response to painful stimulations. The procedure is well described by the Authors, and some precautions like letting the liquid drying before the start of the scan, have helped reducing confounds. Despite this, the measures in figure 1B suggest that the intensity of the painful stimulation is not constant as expected for sustained pain (probably the effect washes out with the saliva). In this case, the first six-minute interval requires particular attention because it encapsulates the real tonic pain phase, and the following ones require more appropriate labels. Ideally the Author should cite previous studies showing that tongue evoked pain elicits a very specific behavioral response (summation, habituation/decrease of pain, absence of pain perception). If those works are missing, this response need to be treated as a funding rather than an obvious point.”

      We addressed this comment. Moreover, we could find previous studies that experimentally induced tonic pain through the application of capsaicin on the tongue (Berry & Simons, 2020; Boudreau, Wang, Svensson, Sessle, & Arendt-Nielsen, 2009; Green, 1991; Ngom et al., 2001), suggesting that our experimental procedure is in line with previous literature.

      Reviewer #3 (Public Review ):

      “In their manuscript, Lee and colleagues explore the dynamics of the functional community structure of the brain (as measured with fMRI) during sustained experimental pain and provide several potentially highly valuable insights into, and evaluate the predictive capacity of, the underlying dynamic processes. The applied methodology is novel but, at the same time, straightforward and has solid foundations. The findings are very interesting and, potentially, of high scientific impact as they may significantly push the boundaries of our understanding of the dynamic neural processes during sustained pain, with a (somewhat limited) potential for clinical translation.

      However (Major Issue 1), after reading the current manuscript version, not all of my doubts have been dissolved regrading the specificity of the results to pain. Moreover (Major Issue 2), some of the results (specifically, those related to the group level analysis of community differences) do not seem to be underpinned with a proper statistical inference in the current version of the manuscript and, therefore, their presentation and discussion may not be proportional to the degree of evidence. Next to these Major Issues (detailed below), some other, minor clarifications might also be needed before publications. These are detailed below or in the private part of the review ("Recommendations for the authors").

      Despite these issues, this is, in general, a high quality work with a high level of novelty and - after addressing the issues - it has a very high potential for becoming an important contribution (and a very interesting read) to the pain-research community and beyond.”

      We appreciate the reviewer’s thoughtful comments. We have revised the manuscript to address the Reviewer’s major concerns, as described below.

      “Major Issue 1:

      The main issue with the manuscript is that it remains somewhat unclear, how specific the results are to pain.

      Differences between the control resting state and the capsaicin trials might be - at least partially - driven by other factors, like:

      • motion artifacts

      • saliency, attention, axiety, etc.

      Differences between stages over the time-course might, additionally, be driven by scanner drifts (to which the applied approach might be less sensitive, but the possibility is still there ) or other gradual processes, e.g. shifts in arousal, attention shifts, alertness, etc.

      All the above factors might emerge as confounding bias in both of the predictive models.

      This problem should be thoroughly discussed, and at least the following extra analyses are recommended, in order to attenuate concerns related to the overall specificity and neurobiological validity of the results:

      • reporting of, and testing for motion estimates (mean, max, median framewise displacement or anything similar)

      • examining whether these factors might, at least partially, drive the predictive models.

      • e.g. applying the PCR model on the resting state data and verifying of the predicted timecourse is flat (no inverse U-shape, that is characteristic to all capsaicin trials).

      Not using the additional sessions (bitter taste, aversive odor, phasic heat) feels like a missed opportunity, as they could also be very helpful in addressing this issue.”

      We thank the reviewer for this comment on the important issue regarding the specificity of our results and the potential influences of noise. The effects of head motion and physiological confounds are particularly relevant to pain studies because pain involves substantial physiological changes and often causes head motion. To address the related concerns of specificity, we conducted additional analyses assessing the independence of our predictive models (i.e., SVM and PCR models) from head movement and physiology variables and the specificity of our models to pain versus non-painful aversive conditions (i.e., bitter taste and aversive odor) in Study 1.

      First, we examined the overall changes of framewise displacement (FD) (Power, Barnes, Snyder, Schlaggar, & Petersen, 2012), heart rate (HR), and respiratory rate (RR) in the capsaicin condition (Figure S11). For the univariate comparison between the capsaicin vs. control conditions (Figure S11A), the results showed that, as expected, the capsaicin condition caused significant changes in head motion and autonomic responses. The mean FD and HR were significantly higher, and the RR was lower in the capsaicin condition compared to the control condition (FD: t47 = 5.30, P = 2.98 × 10-6; HR: t43 = 4.98, P = 1.10 × 10-5; RR: t43 = -1.91, P = 0.063, paired t-test). In addition, the increased motion and autonomic responses were more prominent in the early period of pain (Figure S11B). The 10-binned (2 mins per time-bin) FD and HR showed a decreasing trend while the RR showed an increasing trend over time in the capsaicin condition. The comparisons between the early (1-3 bins, 0-6 min) vs. late (8-10 bins, 14-20 min) periods of the capsaicin condition showed significant differences both for FD and HR (FD: t47 = 6.45, P = 8.12 × 10-8; HR: t43 = 6.52, P = 6.41 × 10-8; RR: t43 = -1.61, P = 0.11, paired t-test). These results suggest that while participants were experiencing capsaicin tonic pain, particularly during the early period, head motion and heart rate were increased, while breathing was slowed down. Note that we needed to exclude 4 participants’ data in this analysis due to technical issues with the physiological data acquisition.

      Next, we examined whether the changes in head motion and physiological responses influenced our predictive model performance (Figure S12). We first regressed out the mean FD, HR, and RR (concatenated across conditions and participants as we trained the SVM model) from the predicted values of the SVM model with leave-one-subject-out cross-validation (2 conditions × 44 participants = 88) and then calculated the classification accuracy again (Figure S12A). The results showed that the SVM model showed a reduced, but still significant classification accuracy for the capsaicin versus control conditions in a forced-choice test (n = 44, accuracy = 89%, P = 1.41 × 10-7, binomial test, two-tailed). We also did the same analysis for the PCR model (10 time-bins × 44 participants = 440) and the PCR model also showed a significant prediction performance (n = 44, mean prediction-outcome correlation r = 0.20, P = 0.003, bootstrap test, two-tailed, mean squared error = 0.159 ± 0.022 [mean ± s.e.m.]) (Figure S12B). These results suggest that our SVM and PCR models capture unique variance in tonic pain above and beyond the head movement and physiological changes.

      Lastly, we examined the specificity of our predictive models to pain, by testing the models on the non-painful but aversive conditions including the bitter taste (induced by quinine) and aversive odor (induced by fermented skate) conditions (Figure S13). All the model responses were obtained using leave-one-participant-out cross-validation. The results showed that the overall model responses of the SVM model for the bitter taste and aversive odor conditions were higher than those for the control condition but lower than the capsaicin condition (Figure S13A). Classification accuracies for comparing capsaicin vs. bitter taste and capsaicin vs. aversive odor were all significant (for capsaicin vs. bitter taste, accuracy = 79%, P = 6.17 × 10-5, binomial test, two-tailed, Figure S13C; for capsaicin vs. aversive odor, accuracy = 83%, P = 3.31 × 10-6, binomial test, two-tailed, Figure S13E), supporting the specificity of our SVM model of pain. Similarly, the model responses of the PCR model for the bitter taste and aversive odor conditions were lower than the capsaicin condition, and their temporal trajectories were less steep and fluctuating compared to the capsaicin condition (Figure S13B). The time-course of the model responses for the control condition was flatter than all other conditions and did not show the inverted U-shape. Furthermore, the model responses of the bitter taste and aversive odor conditions did not show the significant correlations with the actual avoidance ratings (bitter taste: mean prediction-outcome correlation r = 0.05, P = 0.41, bootstrap test, two-tailed, mean squared error = 0.036 ± 0.006 [mean ± s.e.m.], Figure S13D; aversive odor: mean prediction-outcome correlation r = 0.12, P = 0.06, bootstrap test, two-tailed, mean squared error = 0.044 ± 0.004 [mean ± s.e.m.], Figure S13F), suggesting the specificity of PCR model to pain.

      Overall, we have provided evidence that our models can predict pain ratings above and beyond the head motion and physiological changes and that the models are more responsive to pain compared to non-painful aversive conditions.

      Now we added descriptions on the specificity tests to the main manuscript and also to the Supplementary Information.

      Revisions to the main manuscript (p. 20):

      Specificity of the module allegiance-based predictive models To examine whether the predictive models were specific to pain and the prediction performances were not influenced by confounding variables such as head motion and physiological changes, we conducted additional analyses as shown in Figures S11-13. The SVM and PCR models showed significant prediction performances even after controlling for head motion (i.e., framewise displacement) and physiological responses (i.e., heart rate and respiratory rate) (Figures S11 and S12) and did not respond to the non-painful but aversive conditions including the bitter taste and aversive odor conditions (Figure S13), supporting the specificity of our predictive to pain. For details, please see Supplementary Results.

      Revisions to the Supplementary Information (pp. 2-4):

      Specificity analysis (Figures S11-13) To examine whether the predictive models (i.e., SVM and PCR models) were specific to pain and not influenced by confounding noises, we conducted additional specificity analysis assessing the independence of the models from head movement and physiology variables and specificity of our models to pain versus non-painful aversive conditions (i.e., bitter taste and aversive odor) in Study 1. First, we examined the overall changes of framewise displacement (FD) (Power et al., 2012), heart rate (HR), and respiratory rate (RR) in sustained pain (Figure S11). For the univariate comparison between capsaicin vs. control conditions (Figure S11A), the results showed that, as expected, capsaicin condition caused significant changes in motion and autonomic responses. The mean FD and HR were significantly higher, and the RR was lower in the capsaicin condition compared to the control condition (FD: t47 = 5.30, P = 2.98 × 10-6; HR: t43 = 4.98, P = 1.10 × 10-5; RR: t43 = -1.91, P = 0.063, paired t-test). For the temporal changes of movement and physiology variables (Figure S11B), the results showed that the increased motion and autonomic responses are more prominent in the early period of pain. The 10-binned (2 mins per time-chunk) FD and HR showed decreasing trend while the RR showed increasing trend over time in capsaicin condition. Additional univariate comparisons between early (1-3 bins, 0-6 min) vs. late (8-10 bins, 14-20 min) period of capsaicin condition showed that differences were significant for FD and HR (FD: t47 = 6.45, P = 8.12 × 10-8; HR: t43 = 6.52, P = 6.41 × 10-8; RR: t43 = -1.61, P = 0.11, paired t-test). This suggests that while participants were experiencing tonic pain, particularly in the early period, motion and heart rate was increased but breathing was slowed. Note that we needed to exclude 4 participants’ data due to technical issues with physiological data acquisition. Next, we examined whether the head movement and physiological responses are the main driver of our predictive models (Figure S12). For all the original signature responses from SVM model (2 conditions × 44 participants = 88), we regressed out the mean FD, HR, and RR (concatenated across conditions and participants as the SVM model was trained) and calculated the classification accuracy (Figure S12A). Although the signature responses were controlled for movement and physiology variables, the SVM model still showed a high classification accuracy for the capsaicin versus control conditions in a forced-choice test (n = 44, accuracy = 89%, P = 1.41 × 10-7, binomial test, two-tailed). Similarly, for all the original signature responses from PCR model (10 time-bins × 44 participants = 440), we regressed out the 10-binned FD, HR, and RR (concatenated across time-bins and participants as the PCR model was trained) and calculated the within-individual prediction-outcome correlation (Figure S12B). Again, the PCR model showed a significantly high predictive performance (n = 44, mean prediction-outcome correlation r = 0.20, P = 0.003, bootstrap test, two-tailed, mean squared error = 0.159 ± 0.022 [mean ± s.e.m.]) while controlling for movement and physiology variables. These results suggest that our SVM and PCR models captures unique variance in tonic pain above and beyond the head movement and physiological changes. Lastly, we examined the specificity of our predictive models to pain, by testing the models onto the non-painful but tonic aversive conditions including bitter taste (induced by quinine) and aversive odor (induced by fermented skate) (Figure S13). All the signature responses were obtained using leave-one-participant-out cross-validation. The results showed that the overall signature responses of SVM model for bitter taste and aversive odor conditions were higher than those for control conditions, but lower than capsaicin condition (Figure S13A). Classification accuracy between capsaicin vs. bitter taste and vs. aversive odor were all significantly high (capsaicin vs. bitter taste: accuracy = 79%, P = 6.17 × 10-5, binomial test, two-tailed, Figure S13C; capsaicin vs. aversive odor: accuracy = 83%, P = 3.31 × 10-6, binomial test, two-tailed, Figure S13E), suggesting the specificity of SVM model to pain. Similarly, the temporal trajectories of the signature responses of PCR model for bitter taste and aversive odor conditions were not overlapping with that of the capsaicin condition (Figure S13B). Furthermore, the signature responses of bitter taste and aversive odor conditions do not have significant relationship with the actual avoidance ratings (bitter taste: mean prediction-outcome correlation r = 0.05, P = 0.41, bootstrap test, two-tailed, mean squared error = 0.036 ± 0.006 [mean ± s.e.m.], Figure S13D; aversive odor: mean prediction-outcome correlation r = 0.12, P = 0.06, bootstrap test, two-tailed, mean squared error = 0.044 ± 0.004 [mean ± s.e.m.], Figure S13F), suggesting the specificity of PCR model to pain. Overall, we have provided evidence that the module allegiance-based models can predict pain ratings above and beyond the movement and physiological changes, and are more responsive to pain compared to non-painful aversive conditions, which suggest the specificity of our results to pain.

      “Major Issue 2:

      Another important issue with the manuscript is the (apparent) lack of statistical inference when analyzing the differences in the group-level consensus community structures (both when comparing capsaicin to control and when analysing changes over the time-course of the capsaicin-challenge).

      Although I agree that the observed changes seem biologically plausible and fit very well to previous results, without proper statistical inference we can't determine, how likely such differences are to emerge just by chance.

      This makes all results on Figs. 2 and 3, and points 1, 4 and 5 in the discussion partially or fully speculative or weakly underpinned, comprising a large proportion of the current version of the manuscript.

      Let me note, that this issue only affects part of the results and the remaining - more solid - results may already provide a substantial scientific contribution (which might already be sufficient to be eligible for publication in eLife, in my opinion).

      Therefore I see two main ways of handling Major Issue 2:

      • enhancing (or clarifying potential misunderstandings regarding) the methodology (see my concrete, and hopefully feasible, suggestions in the "private part" of the review),

      • de-weighting the presentation and the discussion of the related results.

      I believe there are many ways to test the significance of these differences. I highlight two possible, permutation testing-based ideas.

      Idea 1: permuting the labels ctr-capsaicin, or early-mid-late, repeating the analysis, constructing the proper null distribution of e.g. the community size changes and obtain the p-values. Idea 2: "trace back" communities to the individual level and do (nonparametric) statistical inference there.”

      We appreciate this important comment. We did not conduct statistical inference when comparing the group-level consensus community affiliations of the different conditions (Figure 2) or different phases (Figure 3) because of the difficulty in matching the community affiliation values of the networks to be compared.

      For example, let us assume that the 800 out of 1,000 voxels of community #1 and 1,000 out of 4,000 voxels of community #2 in the control condition are commonly affiliated with the same community #3 in the capsaicin condition. To compare the community affiliation between two conditions, we should first match the community label of the capsaicin condition (i.e., #3) to that of the control condition (i.e., #1 or #2), and here a dilemma occurs; if we prioritize the proportion of the overlapping voxels for the matching, the common community should be labeled as #1, whereas if we prioritize the number of the overlapping voxels for the matching, the label of the common community should be #2. Although both choices look reasonable, none of them can be a perfect solution.

      As the example above, it is impossible to exactly match the community affiliation of the different networks. We must choose an imperfect criterion for the matching procedure, which essentially affects the comparison of network structure. This was the main reason that we limited our results of Figures 2-3 to a qualitative description based on visual inspection. Moreover, the group-level consensus community structures in Figures 2-3 are not a simple group statistic like sample mean; they were obtained from multiple steps of analyses including permutation-based thresholding and unsupervised clustering, which could further complicate the interpretation of statistical tests.

      Alternatively, there is a slightly different but more rigorous approach to the comparisons of the community structures, which is the Phi-test (Alexander-Bloch et al., 2012; Lerman-Sinkoff & Barch, 2016). Instead of direct use of the community labels, this method converts the community label of each voxel into a list of module allegiance values between the seed voxel and all the voxels of the brain (i.e., 1 if the seed and target voxels have the same community label and 0 otherwise). This allows quantitative comparisons of voxel-level community profiles between different conditions without an arbitrarily matching of the community labels. We adopted this Phi-test for our analyses to examine whether the regional community affiliation pattern is significantly different between (i) the capsaicin vs. control conditions and (ii) the early vs. late periods of pain (Figure S6), which correspond to the main findings of the Figures 2 and 3 in our manuscript, respectively.

      More specifically, to compare the group-level consensus community structures between the capsaicin vs. control conditions and the early vs. late periods, we first obtained a seed-based module allegiance map for each voxel (i.e., using each voxel as a seed). Then, we calculated a correlation coefficient of the module allegiance values between two different conditions for each voxel. This correlation coefficient can serve as an estimate of the voxel-level similarity of the consensus community profile. Because module allegiance is a binary variable, these correlation values are Phi coefficients. A small Phi coefficient means that the spatial pattern of brain regions that have the same community affiliation with the given voxel are different between the two conditions. For example, if a voxel is connected to the somatomotor-dominant community during the capsaicin condition and the default-mode-dominant community during the control condition, the brain regions that have the same community label with the voxel will be very different, and thus the Phi coefficient will become small. Moreover, the Phi coefficient can be small even if a voxel is affiliated as the same (matched) community label for both conditions, when the spatial patterns of the same community is different between conditions.

      To calculate the statistical significance of the Phi coefficient, we conducted permutation tests, in which we randomly shuffled the condition labels in each participant and obtained the group-level consensus community structure for each shuffled condition. Then, we calculated the voxel-level correlations of the module allegiance values between the two shuffled conditions. We repeated this procedure 1,000 times to generate the null distribution of the Phi coefficients, and calculated the proportion of null samples that have a smaller Phi coefficient (i.e., a more dis-similar regional community structure) than the non-shuffled original data.

      Results showed that there are multiple voxels with statistical significance (permutation tests with 1,000 iterations, one-tailed) in the area where the community affiliations of the two contrasting conditions were different (Figure S6). For example, the frontoparietal and subcortical regions for the capsaicin vs. control (c.f., Figure 2), and the frontoparietal, subcortical, brainstem, and cerebellar regions for the early vs. late period of pain (c.f., Figure 3) contain voxels that survived after thresholding with FDR-corrected q < 0.05, suggesting the robustness of our main results.

      Particularly, the somatomotor and insular cortices showed statistical significance in the permutation test, and this may reflect the large changes in other areas that are connecting to the somatomotor and insular cortices across different conditions. The statistical significance was also observed in the visual cortex, which was unexpected. We interpret that the spatial distribution of the visual network community is too stable across conditions, and thus the null distribution from permutation formed a very narrow distribution of Phi coefficients. Therefore, a small change in the community structure could achieve statistical significance.

      Now we added descriptions on the permutation tests.

      Revisions to the main manuscript:

      p. 9: Permutation tests confirmed that the community assignment in the frontoparietal and subcortical regions showed significant changes between the capsaicin versus control conditions (Figure S6A).

      p. 13: Permutation tests further confirmed that the community assignment in the frontoparietal, subcortical, and brainstem regions showed significant changes between the early versus late period of pain (Figure S6B).

      pp. 36-37: Permutation tests for regional differences in community structures. To test the statistical significance of the voxel-level difference of consensus community structures (Figures 2 and 3), we performed the following Phi-test (Alexander-Bloch et al., 2012; Lerman-Sinkoff & Barch, 2016). First, for each given voxel, we compared the community label of the voxel to the community label of all the voxels, generating a list of voxel-seed module allegiance values that allow quantitative comparison of voxel-level community profile (e.g., [1, 0, 1, 1, 0, 0, ...], whose element is equal to 1 if the seed and target voxels were assigned to the same community and 0 otherwise). Next, a correlation coefficient was calculated between the module allegiance values of the two different brain community structures (i.e., capsaicin versus control, and early versus late). This correlation coefficient is an estimate of the regional similarity of community profiles (here, the correlation coefficient is Phi coefficient because module allegiance is a binary variable). To estimate the statistical significance of the Phi coefficient, we performed permutation tests, in which we randomly shuffled the labels and then obtained the group-level consensus community structures from the shuffled data. Then, the Phi coefficient between the module allegiance values of the two shuffled consensus community structures was calculated. We repeated this procedure 1,000 times to generate the null distribution of the Phi coefficient for each voxel. Lastly, we examined the probability to observe a smaller Phi coefficient (i.e., a more dissimilar community profile) than the one from the non-shuffled original data, which corresponds to the P-value of the permutation test. All the P-values were one-tailed as the hypothesis of this permutation test is unidirectional.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript by Chen et al., the authors use live-cell single-molecule imaging to dissect the role of DNA binding domains (DBD) and activation domains (AD) in transcription factor mobility in the nucleus. They focus on the family of HypoxiaInducible factors isoforms, which dimerize and bind chromatin to induce a transcriptional response. The main finding is that activation domains can be involved in DNA binding as indicated by careful observations of the diffusion/reaction kinetics of transcription factors in the nucleus. For example, different bound fractions of HIF-1beta and HIF2alpha are observed in the presence of different binding partners and chimeras. The paradigm of interchangeable parts of transcription factors has been eroded over the years (the recent work of Naama Barkai comes to mind, cited herein), so the present observations are not unexpected per se. Yet, the measurements are rigorous and wellperformed and have the important benefit of being in living cells. Enthusiasm is also dampened by the exclusive use of one technique and one analysis to reach conclusions.

      In the revised manuscript we complement the single molecule imaging experiments with genomic approaches, including Cut&Run and RNA-seq, that largely confirm our main conclusions derived from the SPT results. 

      Reviewer #2 (Public Review):

      The authors raise the very important question how different transcription factors with similar in vitro DNA sequence specificity are able to achieve distinct binding profiles associated with distinct functions. They use hypoxia inducible factors (HIF) as model system and combine live cell single-particle tracking with comprehensive genetic and chemical perturbations to study the mechanisms underlying isoform-specific gene regulation. Their main experimental readout is the distribution of diffusion coefficients of a molecular species, extracted from a population of single-particle trajectories. From this distribution, the authors extract the fractions of immobile and mobile molecules as well as the peak diffusion coefficient of the mobile fraction. They find that in addition to the structured DNA binding domain and the dimerization interface of HIF-1a and HIF-2a, the C-terminus of those factors, which includes intrinsically disordered regions and an activation domain, contributes to modulating the bound fraction of HIF-1b and the HIF-a isoforms. In particular, the C-terminus of HIF-2a mediates a higher bound fraction than the one of HIF-1a. This finding is important as it demonstrates that separating HIF into distinct domains that each have clearly defined functions is an oversimplification. Rather, a more holistic view seems suitable, in which all parts of HIF contribute to nuclear diffusion and binding.

      The conclusions drawn on the bound fractions and the nuclear dynamics of HIF isoforms are mostly backed up by data and proper controls. However, some controls are missing and some aspects of data analysis need to be clarified and extended. Moreover, the authors fail to answer their initial question, as the experimental readout does not contain information on the DNA sequences involved in the binding events.

      Experimental controls:

      For some imaging experiments, the authors use cell lines where endogenous HIF-1b or HIF-2a was fused to a N-terminal HaloTag by CRISPR/Cas editing. These cell lines are comprehensively controlled for proper functionality of the edited transcription factors, including expression levels, cellular localization and DNA binding. However, differential expression compared to unedited levels is not quantified and only Halo-HIF-2a is tested for functional gene transcription.

      To confirm that the tagged proteins still maintain normal function in driving target gene expression, we performed RNA-seq on WT cells, HaloTag-HIF-2α KIN and Halo-HIF-1β KIN cells, and show that gene expression on these edited cells do not differ significantly from unedited WT cells (Figure 1—figure supplement 3B, C).

      Other experiments include overexpression of exogenously expressed factors. For those, the authors give statements such as "expressed from a relatively strong ... promoter" and "weakly expressed", but do not provide any control of the amount of overexpression. Quantifying the expression levels will be important, as some of the author's experiments demonstrate a strong dependency of results on expression level. 

      We have now included Western Blot results showing L30-driven expression of all HIF variants in comparison with KIN levels (Fig 4—Figure Supplement 1). However, we note that cells stably expressing the HIF variants are polyclonal and Western Blotting is a bulk assay only able to assess the population average. As such, Western blot analysis may not reflect the actual expression level in the individual cells used in the imaging experiments. To properly control HIF expression at the individual cell level, we instead monitored the protein concentration in each cell and only chose to image cells with similar fluorescence level, as measured by localization density (Fig 4—Figure Supplement 1 and see detailed discussion in Appendix 2).

      Moreover, the authors do not provide any control for proper functionality of domainswap mutants.

      We now include RNA-seq results demonstrating that WT cells over-expressing HIF-α

      WT and domain swap variants (Halo-HIF-1α, Halo-HIF-1α/2α, Halo-HIF-2α, Halo-HIF2α/1α) can activate their specific target genes, confirming that all these variants are also transcriptionally active. (See Figure 6A, B, Figure 6—figure supplement 2 - increased binding of wild type or domain-swapped HIF to several gene loci or neighboring regions coincide with increased transcription levels of these genes, and Figure 7 - HIF expressing cells with same HIF-IDR co-cluster in their mRNA transcription profile).

      The authors further state that they use a high illumination power of 1100 mW. Such high laser power might be detrimental to cells and the authors should control whether this laser power induces any artifacts.

      We agree that a high illumination power (indispensable to achieve high signal-to-noise ratio and detect single molecules) may be detrimental to cells in the long run. However, we only took 1 movie with < 2000 frames for each cell. With a 5-ms frame rate, the total imaging duration per cell was under 10 seconds. Cells are unlikely to respond to any stimulus/damage in such a short time. Moreover, we used stroboscopic illumination instead of continuous illumination, with only 1-ms laser exposure for each 5-ms frame. The total integrated laser exposure is thus only 2 seconds. In addition, all imaging was done with a red laser (633 nm), which has a relatively low phototoxicity. Finally, the 1100 mW is the output from the laser box, but the actual laser power density used for imaging were measured to approximately 2.3 kW/cm2 at 633 nm (Graham et al., 2021). Such an imaging scheme is very unlikely to generate phototoxicity artifacts within the short time window of our measurements. Lastly, we are comparing results across all conditions with the exact same imaging set-up, so any artifact should be accounted and controlled for. We do consider fast SPT a terminal, end-point experiment, where each cell is only imaged once and never re-used.

      Data analysis:

      Distributions of diffusion coefficients greatly vary between individual cells (e.g. Fig. 2A and B, Fig. S3A and C, Fig. S4E). Unfortunately, the authors do not explain whether this variation is a real cell-to-cell variation, or rather reflects variation of their analysis method, potentially due to a low number of single particle tracks per cell. 

      We agree with the reviewer that the cell-to-cell variation we observed could be due to a low number of trajectories collected for each cell. In fact, sampling small numbers of trajectories allows us to identify protein species with unique diffusion coefficients, which might be lost if we just looked at a large population. Also, the fact that the diffusion coefficient distribution varies between cells does not mean that a particular cell only contains the more prevalent species that was detected. Here we are not trying to determine whether proteins in each cell indeed behave differently or whether the observed variation in the diffusion coefficient distribution is simply an effect of the limited trajectories collected in each cell. We instead analyzed data collected from many cells combined to get a better estimation of the population behavior. We have modified our text to make this important point clear to the readers. 

      Moreover, the bound fraction of HIF-1b differs between two independent measurements including three biological replicates each (Fig. 5 C and F). This raises the concern that not enough data enter each biological replicate, or not enough replicates are considered.

      Unfortunately, the number of cells that could be measured in our current setup is limited. It takes approximately 1 hour to collect 20 cells per sample, including staining, washing, looking for cells with desired expression level, and acquiring movies. For experiments with multiple conditions (>12), 20 cells per sample is the upper limit that can fit into a single day. 

      To address the question of what is the minimum number of cells/replicates needed we included in Figure 2—figure supplement 3 - the result of a bootstrapping analysis. We used data collected from a total of 243 cells of the same cell line, from over 11 replicates as the “population” and performed a bootstrapping analysis to identify the source of variation. We have also included appendix 1 with a detailed discussion. Our results showed that cell-to-cell variation contributes most to the total variation of the data, followed by day-to-day (replicate-to-replicate) variation. However, sampling over 800 trajectories, and from over 60 cells, imaged in 3 replicates well approximates the “population value” (bound fraction calculated from 243 cells from over 11 replicates). As a result, in each figure we always used over 60 cells from 3 replicates to generate the reported parameters. Although this approach still gives variable numbers from figure to figure, the variations seen for the same cell line are much smaller compared to the differences observed between different cell lines/conditions. 

      The authors compare the bound fractions among various mutants and experimental conditions. However, the peak diffusion is not, or only descriptively, evaluated. Thus, it is not clear whether the main effect of a mutation or chemical treatment is to change the bound fraction, or rather the diffusion coefficient of the mobile fraction. 

      Since there might be multiple mobile populations (defined as the fraction with a diffusion coefficient > 0.5 μm2/sec), the mean diffusion coefficient can change while the mode (peak) diffusion coefficient stays the same and vice versa. Because of such complexity in the mobile population, we prefer to use descriptive words to report the trend for the change instead of reporting exact values. However, as requested, we have added peak diffusion coefficient information to relevant figures as bar plots. We have also included in Table 1 a summary of mean and mode diffusion coefficient estimated for moving molecules in all relevant figures for reader’s reference. Note that the diffusion coefficient estimation is on a log scale, and the larger the diffusion coefficient, the lower the resolution (e.g, there is 1-grid of difference both between 2.63 and 2.75, and between 9.55 and 10).

      Conclusions:

      The authors provide data that highlight a potential role of the intrinsically disordered domain of HIF in modulating the bound fraction of these transcription factors. They further claim that the intrinsically disordered domains have a main contribution to this bound fraction. However, the autors do not quantify how this contribution relates to those of the DNA binding domain or the dimerisation interface. Changes in bound fraction estimated from the data in e.g. Fig. 3C, Fig. 4C, Fig. 5C and F rather hint to a dominant effect of dimerisation, followed by DNA binding and a smaller contribution of the intrinsically disordered domain. The authors should quantify the relative changes of the bound fraction for all mutants and experimental conditions, to clarify the importance of the contribution of the intrinsically disordered domain.

      It would be ideal if we could quantify what percent of the bound fraction is contributed by dimerization interface, DBD and IDR, respectively. However, it is very likely that these different domains do not act independently of each other in terms of binding to chromatin fibers. In practice, it is very difficult to dissect and quantify these effects independently. For example, we did try to express HIF-1α and 2α with their IDR completely deleted; however, because the protein-degradation signals are within the IDRs, these deletions caused massive stabilization of these proteins, making it impossible to find cells that express these forms at similar levels as the full-length counterpart. As a result, although these IDR-deleted HIF-α show greatly reduced binding, we did not include the results in the paper because the loss of binding could also be due to the overall higher protein expression levels, leading to large unbound fractions. Regarding the DBD mutants, they only have 1 mutation, so it is hard to tell whether the remaining binding in Figure 5B is due to some residual binding affinity of HIF-α (HIF-α only partially lost its binding affinity), or is due to binding through its partner HIF-1β (HIF-α completely lost binding affinity, but can still bind through dimerization with HIF-1β). All we can safely conclude from Figure 5B is that HIF-α DBD is required for optimal binding, but we cannot determine how much exactly it contributes to binding. We thus argue that, given the interdependence of the different protein domains, the reviewer’s request is not experimentally feasible.

      The authors state that the intrinsically disordered domains of HIF determine their differential binding specificity to chromatin. However, the experiments provided do not allow for such a conclusion. In particular, measuring changes in the bound fractions is not sufficient. Such a conclusion requires a method that is able to inform about the DNA sequences involved in HIF binding, for example chromatin immunoprecipitation.

      As requested, we have included new Cut&Run and RNA-seq results in the revised manuscript showing HIF-α-IDR-specific binding and gene activation.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report a public browser in which users can easily investigate associations between PGSs for a wide range of traits, and a large set of metabolites measured by the Nightingale platform in UKBB. This browser can potentially be used for identifying novel biomarkers for disease traits or, alternatively, for identifying novel causal pathways for traits of interest.

      Overall I have no major technical concerns about the study, but I would encourage the authors to revisit whether they can find a more compelling example that can better showcase the work that they have done. I understand that this is partly a resource paper but I think the resource itself can have more impact if the paper provides a clearer use-case for how it can drive novel biological insight.

      Many thanks for your comments. We have undertaken a new application of bi-directional Mendelian randomization to demonstrate how users may use this approach to disentangle whether associations in our atlas likely reflect either causes or consequences of PGS traits/diseases. This example is described on page 9:

      ‘For example, we applied Mendelian randomization (MR) to further evaluate associations highlighted in our atlas with triglyceride-rich very low density lipoprotein (VLDL) particles. For instance, both VLDL particle average diameter size and concentration were associated with the PGS for body mass index (BMI) (Beta=0.04, 95% CI=0.033 to 0.046, P<1x10-300 & Beta=0.012, 95% CI=0.006 to 0.019, P=2.7x104 respectively) and coronary heart disease (CHD) (Beta=0.026, 95% CI=0.019 to 0.032, P<1x10-300 & Beta=0.035, 95% CI=0.028 to 0.042, P<1x10-300 respectively). Conducting bi-directional MR suggested that the associations with average diameter of VLDL particles are likely attributed to a consequence of BMI and CHD liability as opposed to the size of VLDL particles having a causal influence on these outcomes (Supplementary Table 6). In contrast, MR analyses suggested that the concentration of VLDL particles increases risk of CHD (Beta=1.28 per 1-SD change in VLDL particle concentration, 95% CI=1.25 to 1.65, P=2.8x10-7) which may explain associations between the CHD PGS and this metabolic trait within our atlas.’

      and discussed in the discussion on page 21:

      ‘We likewise conducted bi-directional MR to demonstrate that associations between the CHD PGS and VLDL particle size likely reflect an effect of CHD liability on this metabolic trait. In contrast, the association between the CHD PGS and VLDL concentrations are likely attributed to the causal influence of this metabolic trait on CHD risk, suggesting that it is the concentration of these triglyceride-rich particles that are important in terms of the aetiology of CHD risk as opposed to their actual size. We envisage that findings from our atlas, as well as other ongoing efforts which leverage the large-scale NMR data within UKB, should facilitate further granular insight into lipoprotein lipid biology.’

      PGS construction: It's unclear how well the PGS work. Should the reader prefer the stringent or lenient PGS? Perhaps there could be some validation with traits that have decent sample sizes in UKBB. Was there any filtering to remove traits with few GWS hits, low sample sizes, or low SNP heritability as these are unlikely to produce useful PGSs?

      An example of validation was previously included for the chronic kidney disease PGS and its association with circulating creatinine, although this has now been removed due to the feedback you provided in your comments below. However, we have now provided the weights for all of the PGS included in our web atlas should users want to use these scores for prediction purposes (page 7):

      ‘The specific weights for clumped variants used in all PGS can be found at https://tinyurl.com/PGSweights.’

      On page 8 we have mentioned that in this work we have used a more lenient threshold to facilitate endeavours in a ‘reverse gear Mendelian randomization’ framework. However, the option to use the more stringent threshold remains an option for users interested in this as an alternative:

      ‘In this paper, we have discussed findings using PGS that were derived using the more lenient criteria (i.e., P<0.05 & r2<0.1), although all findings based on both thresholds can be found in the web atlas.’

      ‘Specifically, we believe our findings can facilitate a ‘reverse gear Mendelian randomization’ approach to disentangle whether associations likely reflect metabolic traits acting as a cause or consequence of disease risk (Holmes and Davey Smith, 2019) as illustrated using triglyceride-rich very low density lipoprotein (VLDL) particles in the next section.’

      We have not filtering based on other criteria such as the number as SNPs given that certain scores, despite only been constructed using few SNPs, may still provide useful to users. For example, our score for ‘Drinks per day’ based on the more stringent threshold (i.e. P<5x10-8) consists of only 6 SNPs. However, one of these is rs1229984, a missense variant located at the alcohol dehydrogenase ADH1B gene region and known to be a strong predictor of alcohol use (e.g. https://pubmed.ncbi.nlm.nih.gov/31745073/).

      Reviewer #2 (Public Review):

      The authors set out to create an atlas of associations between phenome-wide polygenic scores and circulating lipids, fatty acids, and metabolites. To do so, they utilize GWAS from 129 traits available in the OpenGWAS database to derive polygenic (risk) scores (PGS) along with the recently released NMR metabolomics data containing 249 biomarkers (and ratios) in ~120,000 UK Biobank participants. The authors create a publicly available web portal containing PGS to NMR biomarker associations:

      http://mrcieu.mrsoftware.org/metabolites_PGS_atlas/.

      The strength of this study is in the comprehensive nature of the atlas, containing associations for 129 traits phenome-wide, the large sample size of the UK Biobank NMR data, and the use of PGS for prioritising molecular traits for follow-up experiments, which is an emerging area of interest (International Common Disease Alliance, 2020; Ritchie et al., 2021a). To our knowledge this study is the first to explore this for circulating metabolites.

      In its current form the atlas has several limitations, which should be straightforward to address. Notably, results in the current atlas may be confounded by (1) technical variation in the NMR data (Ritchie et al., 2021b), and (2) major biological determinants of biomarker concentrations, including body mass index, fasting time, and statin usage.

      Firstly, thank you for the suggestion to use your ‘ukbnmr’ R package to help remove technical variations from the UK Biobank NMR metabolites data. We have applied it to remove outliers and variation in the individual data due to (1) the duration between sample preparation and sample measurement, (2) position of samples on shipment plates, (3) different equipment (spectrometers) used. This meant that we needed to re-run our entire analysis pipeline for this project from scratch to the updated dataset. Results do not appear to have drastically changed, although nonetheless we have updated results from all downstream analyses in our online web atlas using this updated dataset provided by ‘ukbnmr’.

      Secondly, the reviewer is correct that biological factors, such as body mass index (BMI) and statin usage, are indeed strongly correlated with metabolites levels. However, we are not able to adjust for such biological factors directly in our analyses, given that they are potential colliders in the causal relationship between diseases/traits and metabolites. Statin usage may be caused by both the high genetic liability to coronary artery disease as well as abnormal lipoprotein lipid levels. Likewise, obesity (and changes in BMI) may result from a high genetic predisposition to cardiometabolic disorders and disrupted metabolism. Thus, adjusting for statin usage and BMI will induce collider bias (https://jamanetwork.com/journals/jama/fullarticle/2790247), which creates spurious associations between the disease/trait PGS and metabolites.

      To better illustrate this issue, we have added additional text on page 14 to justify this study design decision as well as added a new figure (Figure 3) to help demonstrate this clearly to the readers. Fasting time on the other hand we believe is unlikely to act as a collider and was adjusted as a covariate in all linear regression models in this work. This is mentioned on page 25.

      …Further, association results for two (of the 129) PGSs, systolic blood pressure (SBP) and diastolic blood pressure (DBP), are invalid (vastly inflated) as the GWASs used to construct these PGSs included UK Biobank samples.

      Many thanks for your suggestion. We have now removed the SBP and DBP PGS from our atlas due to overlapping samples in UKB. Furthermore, our colleagues at the University of Bristol have notified us that the Glioma GWAS data obtained from the OpenGWAS platform was uploaded with incorrect effect alleles. This PGS has also been subsequently removed from the atlas. Additionally, we removed the Alzheimer’s disease (without APOE) PGS because the pleiotropic effect of lipid associated genes is now systematically examined using lipid gene excluded PGS.

      To demonstrate how one might use these PGS to NMR biomarker associations to prioritise (or deprioritise) findings for follow-up, the authors select a biomarker of interest, glycoprotein acetyls (GlycA), to perform bi-directional Mendelian randomization to orient the direction of causal effects between GlycA and traits of associated PGS. However, the conclusions of this analysis are hampered by the heterogeneous nature of the GlycA biomarker, which captures the levels of five proteins in circulation (Otvos et al., 2015; Ritchie et al., 2019), making it a difficult target to appropriately instrument for Mendelian randomization analysis. This, however, does not detract from the broader point the authors make: that PGS can help prioritize molecular traits for experimental follow-up.

      We have now conducted further sensitivity analyses to evaluate the genetically predicted effects of each of the five proteins in the reference you have provided. This is discussed on page 11:

      ‘We also conducted further sensitivity analyses given that the NMR signal of GlycA is a composite signal contributed by the glycan N-acetylglucosamine residues on five acute-phase proteins, including alpha1-acid glycoprotein, haptoglobin, alpha1-antitrypsin, alpha1-antichymotrypsin, and transferrin (Otvos et al., 2015). Using cis-acting plasma protein (where possible) and expression quantitative trait loci (pQTLs and eQTLs) as instrumental variables for these proteins (Supplementary Table 12) did not provide convincing evidence that they play a role in disease risk for associations between PGS and GlycA (Supplementary Table 13). The only effect estimate robust to multiple testing was found for higher genetically predicted alpha1-antitrypsin levels on gamma glutamyl transferase (GGT) levels (Beta=0.05 SD change in GGT per 1 SD increase in protein levels, 95% CI=0.03 to 0.07, FDR=3.6x10-3), although this was not replicated when using estimates of genetic associations with GGT levels from a larger GWAS conducted in the UK Biobank data (Beta=1.6x10-3, 95% CI=-6.9 x10-3 to 0.01, P=0.71). For details of pleiotropy robust analysis and replication results see Supplementary Table 14.’

      There are also several important limitations to the study which cannot be addressed, which the authors discuss appropriately in the paper. First, the NMR data does not provide a comprehensive view of the metabolome - it is heavily focused on lipids and fatty acids. Many small metabolites in circulation cannot be measured by NMR spectroscopy, and further insights must wait for data from molecular profiling efforts planned or underway in UK Biobank (e.g. mass spectrometry). Second, the authors restricted analysis to participants of European ancestries. This a pragmatic analysis choice given (1) the PGSs were derived from GWAS performed in European ancestries, (2) PGS associations are particularly susceptible to confounding from genetic stratification and differences in environment, and (3) the very small sample sizes for which NMR data is currently available in UK Biobank participants. Finally, although a large sample size, UK Biobank is not a random sample of the population: healthy adults are over-represented, meaning PGS to metabolite associations may be different in disease cases or less healthy individuals.

      Overall this study has strong potential, with straightforward to address limitations, and the resulting atlas will provide a useful characterisation of the relationships between NMR biomarkers and polygenic predisposition to various traits and diseases, which can be used by domain experts to prioritise biomarkers or traits for experimental follow-up.

      Reviewer #3 (Public Review):

      Fang et al. created an atlas for associations between the genetic liability of common risk factors or complex disorders and the abundance of small molecules as well as the characteristics of major apolipoproteins in blood. The whole study is well executed, and the statistical framework is sound. A clear strength of the study is the large array of common risk factors and disease analyzed by means of polygenic risk scores (PGS). Further, the development of an open access platform with appealing graphical display of study results is another strength of the work. Such a reference catalog can help to identify novel biomarkers for diseases and possible causative mechanisms. The authors further show, how such a systematic investigation can also help to distinguish cause from causation. For example, an inflammatory molecule readily measured by the NMR platform and strongly associated in observational studies, is likely to be a consequence rather than a cause for common complex diseases.

      However, in its current form, the study suffers from some weakness that would need to be addressed to improve the applicability of the 'atlas'. This includes a distinction of locus-specific versus real polygenic effects, that is, to what extent are findings for a PGS driven by strong single genetic variants that have been shown to have dramatic impact on small molecule concentrations in blood.

      Thank you for your suggestions to help refine our work. In line with this comment, we have repeated all analyses 1) after applying the ‘ukbnmr’ R package as recommending by reviewer #2 to remove technical variations and outliers and 2) conducted sensitivity analyses to remove an established list of lipid gene loci from PGS construction. Full results can be interrogated in the web atlas to evaluate whether PGS association may be driven by locus-specific effects at these regions, which may be particularly informative given the representation of lipoprotein lipid metabolites on the NMR panel. Findings are reported on page 19:

      ‘The polygenic nature of complex traits means that the inclusion of highly weighted pleiotropic genetic variants in PGS may introduce bias into genetic associations within our atlas. To provide insight into this issue, we constructed PGS excluding variants within the regions of the genome which encode the genes for 14 major regulators of NMR lipoprotein lipids signals which captured 75% of the gene-metabolite associations in the Finnish Metabolic Syndrome In Men (METSIM) cohort (Gallois et al., 2019). For details of these genes see Supplementary Table 5).

      For PGS with these lipid loci excluded, anthropometric traits such as waist-to-hip ratio (N=209), waist circumference (N=206) and body mass index (N=205) still provided strong evidence of association with the majority of metabolic measurements on the NMR panel based on multiple testing corrections. Elsewhere however, the Alzheimer’s disease PGS, which was associated with 60 metabolic traits robust to P<0.05/19 in the initial analysis including these lipid loci (Supplementary Table 17), provided no convincing evidence of association with the 249 circulating metabolites after excluding the lipid loci based on the same multiple testing threshold (Supplementary Table 18). Further inspection suggested that the likely explanation for this attenuation of evidence were due to variants located within the APOE locus which are recognised to exert their influence on phenotypic traits via horizontally pleiotropic pathways (Ferguson et al., 2020).’

      …Further, it is unclear how much NMR spectroscopy adds over and above established clinical biomarkers, such as LDL-cholesterol or total triglycerides. This is in particular important, since the authors do not adequately distinguish between small molecules, such as amino acids, and characteristics of lipoprotein particles, e.g., the cholesterol content of VLDL, LDL or HDL particles, the latter presenting the vast majority of measures provided by the NMR platform. Finally, the study would benefit from more intriguing or novel examples, how such an atlas could help to identify novel biomarkers or potential causal metabolites, or lipoprotein measures other than the long-established markers named in the manuscript, such as creatinine or lipoproteins.

      To address these comments, we have added a new example focusing on the granular measures of VLDL particles provided by the NMR data (on top of the examples listed at the start of the response to reviewer document), which as the review points out is one of its strengths of the measures generated by this platform over long-established biomarkers (page 21):

      ‘We likewise conducted bi-directional MR to demonstrate that associations between the CHD PGS and VLDL particle size likely reflect an effect of CHD liability on this metabolic trait. In contrast, the association between the CHD PGS and VLDL concentrations are likely attributed to the causal influence of this metabolic trait on CHD risk, suggesting that it is the concentration of these triglyceride-rich particles that are important in terms of the aetiology of CHD risk as opposed to their actual size. We envisage that findings from our atlas, as well as other ongoing efforts which leverage the large-scale NMR data within UKB, should facilitate further granular insight into lipoprotein lipid biology.’

    1. Author Response

      We appreciate the thoughtful and thorough critique provided by the two reviewers, and generally agree with their assessment. The revised submission will address the issues they raise. In particular, we agree that the framework of the paper should be broadened to include bacteria and the deep literature associated with coincidental selection.

    1. Author Response

      Evaluation Summary:

      The work by Volante et al. studied a new plasmid partition system, in which the authors discovered that four or more contiguous ParS sequence repeats are required to assemble a stable partitioning ParAB complex and to activate the ParA ATPase. The work reveals a new plasmid partitioning mechanism in which the mechanic property of DNA and its interaction with the partition complex may drive the directional movement of the plasmid.

      Thank you for the kind evaluation. But we wonder about the description of the pSM19035 partition system we studied here as “a new plasmid partition system”. This system itself is quite old. The editor might have meant “new” as a subject of a research, but plasmid partition systems involving RHH-ParB proteins have been studied by number of groups for some time, including the Alonso Lab, which has worked on the pSM19035 partition system number of years prior to our current collaboration for this paper. Therefore, we wonder if the term “new” is the most appropriate.

      Reviewer #1 (Public Review):

      This is a very thorough biochemical work that investigated the ParABS system in pSM19035 by Volante et al. Volante et al showed convincingly that a specific architecture of the centromere (parS) of pSM19035 is required to assemble a stable/functional partition complex. Minimally, four consecutive parS are required for the formation of partition complex, and to efficiently activate the ATPase activity of ParA. The work is very interesting, and the discovery will allow the community to compare and contrast to the more widespread/more investigated canonical chromosomal ParABS system (where ParB is a sliding CTPase protein clamp, and a single parS site is often sufficient to assemble a working partition complex). All the main conclusions in the abstract are justified and supported by biochemical data with appropriate controls. A proposed multistep mechanism of partition complex assembly and disassembly (summarized in Fig 6) is reasonable. Perhaps the only shortcoming of this work is that the team does not yet get to the bottom of why four consecutive parS are needed.

      Thank you for the kind evaluation. The last point is an important one. We would like to continue to test our current model to either obtain stronger supporting evidence or come up with better alternative model.

      *Reviewer #2 (Public Review):

      ParBs come in two variations, RHH and HTH. In this study, the authors examine the in vitro behavior of the RHH system, which is less studied. Two activities were carefully monitored; ATPase activation and ParA removal from DNA. The system is quite complex, but the authors have done a good job of examining parameter space. One question concerns the physiological relevance. Can this be assessed by uncoupling ParA/ParB expression (making it inducible with IPTG from the chromosome, for example) and testing plasmids with the various constructs?

      This is an excellent point; we agree this a shortcoming of the current study. As described in response to “Essential Revisions”, we very much wanted to include an experiment testing in vivo plasmid stability for different size parSpSM sites in this paper, and we put a significant effort. However, we encountered certain technical issues with the approach we tried, and we failed to obtain conclusive data in timely fashion before we run out of time. Although, we had preliminary data, which appeared to be consistent with the notion that shorter parS sites are non-functional and full-size parS sites are functional, the experiment had certain flaw, which we could not rectify immediately to our satisfaction. Therefore, we decided to postpone this part of the project and plan for broader physiological evaluation of the parSpSM sequence arrangements in near future. In the revision, we mentioned at the beginning of discussion that in vivo functional test of parSpSM site requirements still remains to be examined.

      The authors appear to suggest that the requirement for at least 4 ParB binding sites is due to the inability of ParBs of this type to spread inferring that for the ParB-HTH multiple ParBs bound to ParS are required. Has this been tested in this system?

      ParB spreading has been shown to be essential for the HTH-ParB to perform its role in partition function. We clarified the importance of HTH-ParB spreading for partition function on lines 44-45.

      In any event, another major difference between the two systems is that a peptide corresponding to the N-ter of ParB is sufficient to bind DNA indicating this type of ParB does not have to be bound to DNA to stimulate ParA. It would have been useful if the authors had commented on this.

      There seems to be a mistype here. “N-ter of ParB is sufficient to bind DNA indicating ……” is incorrect. Perhaps this was meant to be “N-ter of ParB is unable to bind DNA, indicating ……” This is not a qualitative difference between the HTH- and RHH-ParBs: the N-terminal ParA interacting peptides of HTH-ParBs also can activate their cognate ParA ATPase without parS DNA binding, and parS-dependency of ATPase activation for HTH-ParBs appears to be significantly less stringent compared to the case for RHH-ParB we report here. ParBpSM1-27 , which cannot bind parSpSM, could only stimulate ParApSM ATPase to at most 1/10 of the full size ParBpSM in the presence of active parSpSM. We clarified this on lines 156-157, and also added discussion about this contrast between the HTH- and RHH-ParBs and possible implications on lines 458-467.

      Reviewer #3 (Public Review):

      Drs. Volante, Alonso, and Mizuuchi presented a milestone experimental finding on how the distinct architecture of centromere (ParS) on bacterial plasmid drives the ParABS-mediated genome partition process. Rather than driven by cytoskeletal filament pushing or pulling as its eukaryotic counterpart, the genome partition in prokaryotes is demonstrated to operate as a burnt-bridge Brownian Ratchet, first put forward by the Mizuuchi group. To drive directed and persistent movement without linear motor proteins, this Brownian Ratchet requires two factors: 1) enough bonds (10s' to 100s') bridging the PC-bound ParB to the nucleoid-bound ParA to largely quench the diffusive motion of the PC, and 2) the PC-bound ParB 'kicks" off the nucleoid-bound ParA that can replenish the nucleoid only after a sufficient time delay, which rectifies the initial symmetry-breaking into a directed and persistent movement. Although the time delay in ParA replenishment is established as a common feature across different bacteria, the binding properties of PC-bound ParB vary greatly, which begs the question of how Brownian Ratcheting adapts to different cellular milieu to fulfill the functional fidelity.

      The finding in this work presented a new but important twist in the Brownian Ratchet paradigm. The authors showed that in the pSM19035 plasmid partition system, only four contiguous ParB-binding repeats in ParS are required for the ParA-ParB interactions that drive the PC partition. In other words, only four chemical bonds are needed for the PC partition. Crucially, the authors further demonstrated that distinct orientation (configuration?) of the ParB-binding repeats is required for this fidelity by their state-of-art biochemistry and reconstitution experiments. The authors then elaborated on a possible mechanism of how the smaller number of PC-bound ParB can drive directed and persistent PC movement by interacting with nucleoid ParA. If I understand correctly, in their proposed scheme, due to their specific orientation (configuration?), when two of the ParS-bound ParB molecules bind to the two nucleoid-bound ParA molecules there arises a torsional/distortional stress. Consequently, the thermal fluctuations preload the forming bonds, triggering the dissociation of the two ParB molecules from the PC. And the remaining PC-bound ParBs may kick off the ParAs that have a time delay in DNA-rebinding, while ParB molecules will replenish the ParS to initiate the next round. In this proposal, the key conceptual leap is that not only the substrate but the cargo remodels to underlie the Brownian Ratcheting.

      We thank the reviewer for kind evaluation of our work. The model proposed is highly speculative at this point. Despite it may appear rather detailed in order to account for the unexpected findings, we consider it only a working hypothesis to be revised or replaced by a better model in future. We thank for many useful suggestions, which we will follow in our revision.

    1. Author Response

      REVIEWER #1 (PUBLIC REVIEW):

      The study by Monterisi et al. reports that loss-of-function mutations in metabolic pathways do not necessarily have a negative impact on cancer growth. The authors suggest that small solutes transferred via gap junction channels formed between wild-type cells and cells express mutants defective in metabolic pathways rescue the metabolic-deficient cancer cells. Through the examination of multiple human cell lines with several advanced means to determine gap junction coupling, Cx26 was identified as a major connexin molecule involved in medicating gap junction coupling between colorectal cancer (CRC) cells. The gene mutations of three metabolic gene mutations were investigated for major metabolic function of the cell, pH regulation, glycolysis and mitochondrial function.

      Strengths: The paper tests a new hypothesis that the mutations that inactivate key metabolic pathways do not incur functional deficits in cancer cells expressing the mutants due to their gap junction coupling to wild type cells.

      From microarray data they identified multiple connexins expressed in various CRC cells. Several advanced analyses were used to assess gap junction coupling in CRC cells including fluorescence recovery after photobleaching (FRAP). The extent of permeability at steady state was evaluated using CellTracker dyes and coupling coefficients were determined. They also used flow-cytometry to study dye transfer, which will provide a quantitative, dynamic means for study cell coupling. The data showed that knocking down Cx26 could greatly reduce diffusive exchange in most of the CRC cells tested.

      The study focused on three metabolic genes, Na+/H+ exchanger NHE1, a regulator of intracellular pH, a glycolytic gene, ALDOA and mitochondrial respiration gene, NDUFS1. These genes were knocked out in the selected CRC cells highly expressing these genes. The co-culture studies were well executed with fluorescence-markers distinguishing the WT and knockout cells and well-defined readouts such as intracellular pH, media pH, glucose/lactate levels and mitochondrial O2 consumption and glycolytic acid.

      The experiments in general were well designed and conducted, and the data supported the conclusions. The paper is also logically written and figures were well presented providing clear graphic illustrations.

      Thank you for recognising the strengths and novelty of our findings.

      Weaknesses: Although the hypothesis is innovative, no clear justification is provided that illustrates the scenario representing the clinical situation. The remaining questions include: What kind of somatic mutations in cancer cells has little impact on their growth and progression?

      We have now added in vivo data (Fig 8) and revised the Introduction and Discussion to address this point. Briefly, the broader clinical relevance our findings relates to the notion of essential genes and their negative selection. We show that connexin-dependent coupling can rescue a genetic deficiency, provided the mutation-carrying cell can access wild-type neighbours for the missing function. This rescue effect is limited to processes that handle solutes that can pass via connexin channels, i.e. metabolic processes. As such, sporadic loss-of-function mutations in “essential genes” may not produce a functional deficit in human cancers. We demonstrate rescue extensively in vitro, and now in a xenograft model.

      We argue that our work can explain why certain metabolic genes are essential in vitro, but not in vivo. In monolayers of mutated cells, diffusion across gap junctions cannot rescue the mutant phenotype, because there is no wild-type cell available to supply the missing function. In contrast, mutations in vivo will arise sporadically and wild-type cells are typically available to couple onto the mutation-bearing cell, providing it with functional rescue. Thus, only in the former case would the lethality of essential genes emerge.

      Indeed, many notable studies have found genes of various metabolic pathways to be essential for growth in vitro. Such genes would be expected to undergo negative selection in vivo, but this is exceedingly rare according to multiple observations. By demonstrating metabolic rescue in co-cultures (i.e. a setting closer to the tumour) and (now) in xenografts, our work provides an explanation for this apparent paradox. Indeed, cells such as NDUFS1-negative SW1222 grow very, very slowly in culture compared to wild-type cells and require regular media changes to keep pH alkaline. However, coupling onto wild-type cells can rescue knock-out cells in vitro and in vivo. We argue that this finding explains why loss-of-function mutations in NDUFS1 (and similar genes) do not undergo negative selection in human tumours (despite in vitro predictions).

      The three proteins selected for this study were chosen to represent very distinct types of solute-handling processes. We illustrate our point in a (new) summary figure in Fig 8.

      What types of WT cells, within the tumour cells or with neighbouring normal cells? Whether the current experimental design closely recapitulates the scenario in vivo?

      Indeed, we find that stromal fibroblasts may also support cancer cells via gap junctions, as this is essentially the same concept (i.e. coupling onto a cell with wild-type genes). However, we feel that expanding our present submission to fibroblasts would make the volume of data exceeding large. Also, the methods we use for fibroblasts are different, and require a full manuscript on its own. For example, there is the issue of how to control for the radically different growth rates of fibroblasts and cancer cells. We chose co-cultures of WT and genetically altered CRC cells so that the co-cultures are of the same background, with just one element changing (i.e. the metabolite-handling gene). This makes our data easy to interpret, and thus strengthens our case. Our in vitro experiments were performed on monolayers, where cells can make contacts in 2D. In vivo, these contacts will spread in all dimensions, thus connectivity is likely to be even more significant. If anything, monolayers probably under-estimate the importance of connectivity, but this preparation is more accessible for studying cell-to-cell communication.

      We recognise the importance of adding in vivo data to firm our conclusions. To that end, we have analysed xenografts established from co-cultures of wild-type DLD1 and NDUFS1-KO SW1112 cells on one flank of a mouse, and Cx26-KO DLD1 and NDUFS1-KO SW1112 cells on the other flank. This experiment tested whether Cx26-dependent connections between mitochondrially-defective NDUFS1 KO SW1222 cells and respiring DLD1 cells (on left flank only) are able to stimulate growth of the former (GFP-tagged). Indeed, NDUFS1-deficient cells grew faster when rescued by Cx26-expressing DLD1 cells. In contrast, their growth decelerated when DLD1 cells were Cx26-negative. We include these experiments and their controls in Fig 8.

      The readouts for co-culturing for glycolytic ALDOA and NDUFS1 knockout are only cell mass, without determining the more relevant markers, glucose/lactate and mitochondrial O2 consumption and glycolytic acid production.

      Our readouts are two-fold: total biomass and the size of the genetically altered compartment of co-cultures (GFP). We can therefore follow the relative growth of KO cells, which is essential for describing their growth (dis)advantage. We appreciate other markers are informative. Indeed, we characterised KO and WT cells in terms of O2 consumption and acid production in Fig 7. However, it would not be possible to measure glucose consumption selectively in GFP-positive KO cells of a co-culture, as the assays available for this measure ensemble rates for the entire population of cells (e.g. in a single well). Nonetheless, we believe that biomass as a readout is highly relevant to cancer, and we hope the reviewer concurs with us.

      The study needs to include cells without functional gap junctions like the characterized negative control RKO cells.

      This is an excellent suggestion, and we have added data for RKO cells to several figures. As expected, these do not form a syncytium and cannot rescue genetic defects in co-cultured cells. New data are shown in Fig 3G-H, Fig 6-supp2 and Fig 7H, adding to existing RKO controls in Fig 2A/B. Briefly, RKO cells do not exchange CellTracker dyes in monolayers (Fig 3F/G), cannot rescue cells that are ALDOA-deficient (Fig 6-supp2), and cannot rescue NDUFS1-deficient cells (Fig 7H). We also added Cx26-KO DLD1 cells to the CellTracker experiments in Fig 3.

      REVIEWER #2 (PUBLIC REVIEW):

      This paper is a logical extension of the 50 year-old concept of the "bystander effect" in tumours, wherein the effects of anti-tumour chemotherapeutics extend beyond the cells that take them up due to spread through gap junctions to adjacent cells. In this case, however, the authors have creatively realized that the reverse might also occur, and that tumour cells with otherwise fatal mutations in essential metabolic pathways can be rescued by their neighbours through passage of the missing metabolites through gap junctions. This can explain why mutations in other critical pathways, such as protein synthesis and transporters, are selected against in rapidly growing tumours, but others in equally critical pathways of glycolysis, electron transport, etc. are not, despite these genes having been demonstrated to be essential in in vitro KO studies (where all cells in the plate have the critical gene knocked out). A series of elegant experiments are used to test this proposal in several colorectal cancer (CRC) cell lines using three examples - pH regulation (defective Na+/H+ exchanger NHE1), glycolysis (defective Aldolase A (ALDOA)) and oxidative phosphorylation (defective Complex 1 - NDUFS1).

      Thank you for these positive comments. We have added key references to the bystander effect in the Introduction, and explain how our findings build on these milestones.

      The authors first determine the levels of different Cx proteins expressed in each cell line, and determine that for most Cx26 and 31 are dominant, although come lines have a subset of cells with high Cx43 expression. They then use Cell Tracker Green to pre-label cells and use FRAP as a means to measure how well the cell population is coupled. This is a useful measurement but is significantly over-interpreted by the authors as a "permeability" in uM/min. This is not really a permeability, which requires knowledge of the concentration gradient of the permeant species, relative cell volumes, etc. Rather it is a rate of fluorescent recovery that is presumably correlated with, but not quantitatively related to, levels of coupling.

      Thank you for this comment. We would like to explain why we believe our FRAP experiments are able to estimate permeability in units of um/s. The rate of recovery of a solute in a cell following its “destruction” (here, photobleaching) is given as follows:

      dCcell/dt = p⋅P(Ccell-Csurround) … [1]

      Where subscripts ‘cell’ and ‘surround’ refer to the cell and its neighbours. P is the permeability of the barrier between these two compartments, and p is the ratio of the surface area of the barrier (i.e. membrane) to volume of the bleached cell. Within a “bleached” cell, we measure fluorescence.

      Since fluorescence (F) is proportional to concentration (C), we can substitute:

      C = α⋅F

      where α is a constant of proportionality. Thus, the rate of recovery (L.H.S. of equation [1]) becomes:

      dC/dt = d(α⋅F)/dt = α⋅dF/dt … [2]

      And the R.H.S. of equation [1] is re-written as: P⋅(Ccell-Csurround) = P⋅(α⋅Fcell-α⋅Fsurround) = α⋅P⋅ (Fcell-Fsurround) … [3]

      Putting [2] and [3] together,

      dFcell/dt = p⋅P⋅(Fcell-Fsurround)

      Prior to photobleaching, there are no (net) gradients, thus initial Fcell and Fsurround are equal.

      Thus, we can re-write the equation in terms of normalized fluorescence (f=F/F0):

      dfcell/dt = p⋅P⋅(fcell-fsurround)

      P can therefore be expressed as:

      P = dfcell/dt / (p⋅ (fcell-fsurround))

      Here, dfcell/dt is measured from the fluorescence recovery time course and fcell-fsurround is measured experimentally (in fact, bleaching in the cell is set to 50%, thus this takes the value of 0.5 by default). We can approximate the monolayer as a network of cuboidal cells. The cell’s volume is thus ‘area’ times ‘height’, and the cell’s surface (at which it contacts its neighbors) is the ‘cell’s perimeter’ times ‘height’. Thus, for the bleached cell,

      p = perimeter × height / area × height = perimeter / area.

      The perimeter and area can be measured from the acquired fluorescence images. Thus, we can describe permeability using data obtained from image stacks. We appreciate that this method makes certain geometrical approximations, but we believe these are not unreasonable. We explain the assumptions and calculations in Appendix 1. More information about the method is published by us in https://pubmed.ncbi.nlm.nih.gov/28368405/. Of course, we accept that these calculations are less accurate than, say, electrical conductance measurements, and to that end, we added a note of caution to the main text.

      This fluorescent recovery is shown to be sensitive to siRNA KO of Cx expression, but strangely its reduction is only correlated with KD of Cx26 in the 5 cell lines examined. KD of Cx43 (in LOVO cells) and Cx31 in all 5 cell lines had no effect or in some cases seemed to increase the rate of recovery (DLD1 and SNU1235). This is a notable finding, yet the authors choose to completely ignore it and continue with Cx26 KDs in studies of specific metabolite transfers. Some discussion should be included as to why KD pf these Cxs has no effect or causes an apparent increase in coupling of the cells.

      The effectiveness of GJB2 knockdown in ablating ensemble connectivity is most likely a reflection that Cx26 is likely the dominant conductance inherited from the parent epithelium. Other isoforms are expressed, but in most CRCs cells, these do not produce major coupling, as GJB2 knockdown was sufficient to uncouple many CRCs. These observations justify our choice of connexin for studying metabolic rescue functionally. These findings are also consistent with the good correlation between ensemble connectivity and GJB2 levels.

      Our data show a trend that GJB3 (Cx31) KD in DLD1 and SNU1235 cells and of GJA1 (Cx43) KD in LOVO cells produce an increase in coupling. However, when analysed by hierarchical (nested) analysis, these effects are not statistically significant, and for that reason we did not elaborate on these trends in the original submission. The apparent increase in conductivity in cells treated with GJA1 or GJB3 siRNA could reflect a compensatory response to the ablation of a specific message, closer contacts between cells allowing Cx26 to strengthen its connections, or a shift away from heterotypic channels involving Cx26 and Cx31/Cx43, towards homotypic Cx26. We did not see any consistent change in the intimacy of cell-cell contacts. We now performed western blots for connexins to probe for compensatory changes (see Fig 2-supp1). In comparison to wild-type cells, expression of Cx31 was not changed by GJB2 (Cx26) or GJA1 (Cx43) knockdown in DLD1 cells. GJB2 KO DLD1 cells did not induce expression of the other major isoform, Cx43. Also, in DLD1 cells, KD of GJB3 or GJA1 did not substantially change Cx26 levels. Similarly, KD of GJB3 did not affect Cx43 levels. In GJA1-high C10 cells, KD of GJB3 did not alter Cx43 levels, although a small decrease was observed with GJB2 KD on Cx43. Also in C10 cells, KD of GJB2 and GJA1 did not induce an increase in Cx31 levels.

      We agree that complex interactions between connexin genes are possible, but we feel that a molecular study of Cx gene regulation would fall outside the scope of the present manuscript. Our findings point to a prominent role of Cx26 in metabolic rescue, and to strengthen this point, we show that Cx26-negative cells that express other connexins (e.g. C10 cells or NCIH747 cells) cannot rescue ALDOA-deficient counterparts or NDUFS1-KO SW1222 cells (new data in Fig 6 and 7). We share the Reviewer’s enthusiasm about the interplay between connexins and will endeavour to study this further in the near future.

      Rather than just focus on acute transfer of dye between cells, the authors develop a system using 50/50 mixes of cells labelled with two junctionally permeant dyes and measured the degree of mixing at equilibrium (48 hours). This is presented as a "coupling coefficient", but how it is calculated, and its significance is not well described, and does not correlate with the historical use of this term in the literature. Nonetheless, the studies do seem to demonstrate a good degree of equilibration, although it would have been informative to determine of the cells that do not exchange dyes express Connexins. To document that this equilibration requires gap junctions, the authors employ low density cultures, which significantly decrease dye exchange. However, in at least one cell line (SW1222) dye exchange is only reduced by <50%, indicating a very high background to this assay. This is not addressed.

      Thank you for these comments. We agree that our description of the method was inadequate, and we have added the necessary information in Appendix 1. We have also added information about actual confluency and restructured the figure. We also added new data for RKO cells and DLD1-Cx26 KO cells, i.e. two negative controls (Fig 3H). We pondered about the best name for describing the numerical output of the method, and concluded that “coupling coefficient” is reasonable (provided we improve our description of it) because it is dimensionless, and like many coefficients has a finite range (here, zero to one). With further explanation, we hope this terminology is acceptable. The issue with SW1222 cells is that both low- and high-seeding densities produce clusters of cells. Even though overall cell numbers were different in high and low seeded cultures, actual connectivity within “islands” of cells remained high, hence their similar coupling coefficients (see Fig 3E). Indeed, this CRC line is unusual in this behaviour, so we only present data from the higher density.

      The most compelling part of the study is the use of reporters to directly demonstrate a role of Cx26 coupling of cells to rescue cells with mutations of the three genes mentioned above when mixed with normal neighbours. This case was most convincing in the cases of ALDOA and NDUFS1, with the data for the pH regulation requiring more explanation for full understanding of the data shown (e.g. Figs 7 G and H).

      Thank you for this comment. Studies of pHi regulation provide a unique opportunity to obtain single-cell resolution (unlike e.g. glycolytic assays). We took advantage of this, and therefore the figure on pHi presents a greater depth of analysis. Nonetheless, we agree the pH data need further explanation. We have expanded the text, and also added a bar plot of data on day 7, which now provides a clearer illustration of the rescue effect. This form of presentation was also adopted for ALDOA and NDUFS1 experiments in the subsequent figures.

      Overall, the study does a credible job of demonstrating that Cx26 coupling of CRC cells serves to rescue cells with mutations in critically necessary metabolic pathways, presumably due to transfer of metabolites from surrounding wt cells. However, some of the results indicate this is not a simple process where all connexins behave similarly, and some effort should be made to investigate if Cx31 and 43, which do not seem to play the same roles in maintaining cell coupling as Cx26, also play any role in such metabolic rescue.

      Thank you for this comment. We have addressed this by selecting three additional cell lines for study: RKO – a cell line with no major Cx expression; C10 – a cell line that expresses Cx43, but very low levels of Cx26; NCIH747 – a cell line that expresses Cx31, but low levels of Cx26. These additional experiments cover lines that are GJB2 (Cx26)-low/negative to test whether metabolic rescue is best achieved with Cx26. Our new data show that these cells are unable to rescue metabolic defects (new data provided in Fig 6H/I, Fig 7H, and Fig 6-supp2). These findings strengthen our case for a major role of Cx26, at least in CRC networks. Indeed, recent analyses by Robert Gatenby and colleagues have shown that mutations in GJB2 (Cx26) are exceedingly rare in cancer (a property not shown for other connexins genes). This is interpreted to mean that Cx26 plays a particularly prominent role, ostensibly for metabolic rescue.

      REVIEWER #3 (PUBLIC REVIEW):

      Strengths of the study include that it appears to be a careful and well thought out set of experiments. The analysis and treatment of multiplexed data is also sophisticated. For the most part, the work is clearly and logically described, as well as well illustrated. In general, the authors achieved their experimental goals, and the methods while not entirely new, do provide new twists and augmentations that should be useful to the field. A general weakness is that this is not entirely a new story. Instead, it is a variant of one of the oldest concepts in the field of gap junction biology i.e. "Metabolic cooperation". The term "Metabolic cooperation" (i.e., as mediated by gap junctions) was not mentioned by the authors, but it is a long-established and foundational concept in the field. Indeed, in a classic paper by Gilula and colleagues published in 1972, the experimental approach used was similar to that of the study in hand. These earlier authors showed how transformed cell lines with deficiencies in hypoxanthine metabolism can be "rescued" by "metabolic cooperation" in co-culture with metabolically competent cells via passing a gap junctional permeant molecule. This and other relevant papers were not cited. More importantly, the extant literature places the onus on the authors to explain and convince reviewers why this study is more than an incremental step.

      We apologise for not quoting these important and classical references. We have now added these works to our reference list (quoted in Introduction). At the time of these seminal discoveries, Loewenstein and colleagues made a case that connexins are absent in cancer, and this belief persisted for many decades. More recently, the role of gap junctions in cancers has garnered attention. With new gene manipulations (e.g. CRISPR/Cas9) and imaging techniques and improved xenografting, it is now possible to precisely study the impact of GJ on cancer metabolism. Moreover, we have a wide panel of cancer cell lines to study, and identify the prominent role of Cx26. We highlight that our study is the first to offer a mechanistic explanation for the absence of negative selection in cancer, a phenomenon which was not known in the 1970s. To strengthen our novelty, we now add in vivo data to Fig 8 that confirm in vitro findings.

    1. Author Response

      Reviewer #1 (Public Review):

      1. “The major weakness of the study is that with the interpretation of the results. The changes in tractography, behavior and TBM are what would be expected following lesions of the neostriatum”

      We appreciate this comment and would like to offer clarification. We respectfully disagree that the pattern of results presented in this manuscript are akin to what would be expected following striatal lesions. In NHPs, striatal lesions typically cause more extreme phenotypes than what we observed in our 85Q-treated animals. In macaques, bilateral putamen lesions can result in phenotypes that include seizures, inappetence, hyper-aggression, and other severe features.  This strongly impacts clinical scores and can make it unfeasible to care for the animals for multiple years. For these reasons, recent NHP HD lesion models have used only unilateral putamen lesions coupled with bilateral caudate lesions to model HD (as in the recent paper by Lavisse et al, 2019). Of additional relevance is that even the cognitive effects of these striatal lesions are more severe than what we observed in our 85Q-treated animals: for example, Lavisse reported reduced performance on similar “prefrontal” cognitive tasks by ~50%, whereas our AAV-HTT model exhibited only ~10% reductions in working memory. This mild, but significant, change in cognitive performance and motor function seen in our 85Q animals is much more akin to that which is observed in the early stages of HD.

      2. “The results have been interpreted as showing a progressive model, although evidence that there is progression is limited”...“begs the question as to whether or not the 85Q-lesioned monkeys would recover to a level similar to the 10Q animals if left for another 12 months”

      At the request of Reviewer 1, we added an additional 30-month timepoint and re-ran all of the analyses to include these new data.  All of the behavioral and neuroimaging data were re-analyzed with this final timepoint included (see Lines 125-141, 146-163, 173-194, 228-255, 270-294, 314-345). Additionally, due to the unidirectional nature of our hypothesis and on the advice of our bio-statistician, we applied one-tailed tests to the planned comparisons in this revision. To address the Reviewer’s point directly: 85Q-treated animals showed minimal evidence of functional recovery between the 20- and 30-months timepoints on the behavior tasks. In particular, working memory deficits measured with SDR and fine motor skills measured with Lifesaver Retrieval did not improve between 20- and 30-months (Figure 1C and 1F). Additionally, neurological rating scores in group 85Q remained consistently elevated (in the 5-7 range) between the 20- and 30-month timepoint. Taken together, we feel confident that these results do not show evidence of any significant functional recovery, out to 2.5 years (30-months). In terms of the longitudinal trajectories of the behavioral measures, we appreciate the Reviewer’s feedback regarding the use of the term ‘progressive’ and have tempered our language appropriately. We removed all instances of the word progressive/progressed except in the context of the motor rating scores, which show a significant Group x Timepoint interaction and demonstrate a clear progression.

      3. “The whole manuscript is written as though this is a genetically-relevant progressive model of HD. But the animals are normal, and so there is no genetic context relevant to HD”

      We thank Reviewer 1 for this comment. We recognize that viral-based animal models of HD, including the model characterized here, are not as genetically similar to the human condition compared to some of the other modeling approaches currently under investigation (ex. knock-in and gene editing). Limitations of the AAV-based HTT85Q model include: 1) vector packaging restrictions that prohibit expression of full-length HTT, 2) the use of a CAG promoter vs. an endogenous promoter that leads to overexpression of the transgene, 3) the use of cDNA versus genomic DNA excludes introns and therefore lacks the ability to produce alternatively spliced variants (ex, Exon 1), 4) the use of a mixed CAG-CAA repeat may preclude the possibility of somatic instability and 5) expression of HTT that is restricted to specific brain regions and cell types. All of these important limitations have been added to the discussion section in this re-submission (Lines 503-517).

      Despite these limitations, we feel that this AAV2:AAV2.retro-HTT85Q based model has some features that make it genetically-relevant to human HD including: 1) the expression of an N-terminal fragment of human HTT (N171), 2) the N-terminal fragment bears a pathological PolyQ expansion (85Q), 3) the expressed mHTT fragment forms neuronal aggregates that can be detected in the nucleus, 4) mHTT fragments are expressed in many of the same brain regions where aggregates are detected in human HD cases, with both regional and sub-regional specificity (ex. higher expression in anterior vs posterior cortical regions and expression primarily limited to deep cortical layers V/VI) and 5) expression of mHTT fragments in these regions leads to many of the same pathological and behavioral changes observed in HD patients.  Importantly, expression of the N-terminal portion of HTT allows for the evaluation of HTT lowering therapeutics that target first 3 exons (ASOs, miRNAs, zinc finger repressors, CRISPR-based therapies, etc), which cannot be evaluated in lesion-based models.

      4. “The authors state in the Abstract that the injection resulted in "robust expression of mutant huntingtin in the caudate and putamen". These data are not in the manuscript.”

      Evidence of mHTT expression in the caudate and putamen, as well as several other brain regions, via immunohistochemical and immunofluorescent staining is now included in the manuscript. Please see additions to the methods, results and discussion sections regarding these findings, as well as a new Figure 5, (see Lines 347-376, 756-788). Additionally, further details regarding an associated PET imaging study in this same cohort of animals using a mHTT aggregate-binding radioligand has been added to the discussion, (see Lines 437-443). Please also see response #13 (below).

      5. “The authors chose to use a fragment of the HD gene, with a very long repeat that is seen only in juvenile patients”

      Comments regarding the need to use a fragment of the HTT gene, versus the full-length gene, due to packaging constraints of the viral vector, were added to the discussion in the context of limitations (Lines 503-517), and also discussed above in response #3.  The choice to use a CAG repeat length of 85 (83 pure CAGs followed by a CAA/CAG cassette -see response #17 below for further details), was based off previous studies wherein similar CAG repeat lengths were used to create animal models of HD over the past several decades. Interestingly, while CAG repeat lengths in patients with adult-onset HD typically range from ~40-60, longer repeat lengths (>60) are typically required in animal models of HD to elicit pathological and behavioral manifestations of disease: transgenic, knock-in and viral vector-based rodent models (ranging from 72-150 CAGs), OVT73 transgenic sheep model (73 CAGs), transgenic and knock-in minipig models (ranging from 85-150 CAGs), transgenic and viral vector-based macaque models (ranging from 82-103 CAGs). See Ramaswamy et al, 2007 and Howland et al, 2021 for thorough reviews of these models.

      6. “For their cognitive testing, the authors used a task (delayed non-match to sample) that measures object recognition and familiarity. Before surgery, only 11/17 of the animals were successfully trained to complete this task. It is not clear how useful the data are when only 64% of the animals can be included.”

      We appreciate the Reviewer’s concerns and have decided to conservatively remove this data from the revised manuscript.

      7. “It is not clear how this monkey model will be useful for developing either disease biomarkers or therapeutic strategies for HD (as stated in the abstract)”. “The authors state that they hope the model will become a widely used resource. This seems an unlikely scenario, given the limitations of the current study and the challenges associated with using monkeys. They say that a major advantage of their technique is being able to generate large numbers of monkeys. But this is not a relevant argument if the usefulness of the model to investigate HD is not proven.”

      We thank the reviewer for requesting clarification on these important points. We believe that this model will be useful for developing therapeutic strategies because the HTT85Q-treated macaques express mutant HTT, along with HTT aggregates, in several key brain regions that are affected in human HD, along with undergoing regional gray matter atrophy and white matter microstructural alterations that correlate well with behavioral dysfunction. Studies currently under review elsewhere also show reduced dopamine neurotransmission and regional hypometabolism via PET imaging in this model. Together, or individually, these imaging and behavioral changes can serve as outcome measures when screening potential therapies. Possible therapeutic interventions that are amenable to screening in this model are included in the discussion.

      Regarding biomarker development, we have already engaged in PET imaging biomarker development in this model in collaboration with the CHDI foundation and the Molecular Imaging Center at the University of Antwerp, evaluating a candidate radioligand that binds to aggregated mHTT. See #13 below for a more detailed description of this PET study, including recent data showing its ability to bind to aggregated species of mHTT in several brain regions in this same cohort of HTT85Q macaques that correspond to 2B4 and em48 IHC staining (a manuscript describing these results has been prepared for submission and the PDF is included for the reviewers to peruse).

      The authors do envision this AAV-based macaque model becoming a resource for the HD research community. While this model does have certain limitations (now detailed in the Discussion), we respectfully assert that all of the HD animal models, both small and large, each have their own important limitations to consider when deciding on which to use to screen therapeutics. Selecting a specific animal model based on the individual scientific questions being asked will be required, and employing a combination of models may be an even more prudent strategy.

      While NHP research presents unique challenges (cost, housing requirements and recent challenges in availability, among them), we believe that viral vector-based NHP models could be more accessible to the HD research community compared to some of the other established large animal models; in that they may able to be readily created at contract research organizations (CROs), in addition to various academic research institutions. There are now many CROs that exist in the US, and elsewhere around the world, that have developed specific expertise in MRI-guided, intracranial delivery of AAVs into the NHP brain (including the caudate and putamen), in the context of assessing therapeutic interventions for a variety of neurological disorders (HD, PD, and MSA, among others). Most of these same CROs also have expertise in NHP imaging (MRI/DTI) and behavioral assessments across multiple domains. It seems feasible that AAV-mediated HD macaques could be produced in sufficient numbers to appropriately power therapeutic studies, using the outcome measures established in the current study.

      Reviewer #2 (Public Review):

      1. “The major weaknesses are the manner in which the data is presented”

      We replotted all of the figures with improved color palettes and larger font sizes to make them easier to read. We also added additional details throughout the results section to aid in clarity and improve readability.

      2. “The authors would benefit from talking more about their model in the introduction and including references to some key points. For example, there has been critical new data in the field showing the importance of poly (CAG) in disease, not necessarily poly(Q), and the community will want to know (and not be required to look up), the nature of the transgene. Is it a pure CAG repeat? A mixed repeat? If it is pure, do they see or could they measure somatic expansion in the various brain regions impacted? How does that data match the phenotypes seen? Since this is a transgene, there is no possibility for the exon1/intron1 splicing variant to appear - how does this impact their interpretation”

      Further details regarding the transgene have been added to the Viral Vector Section of the Methods (Lines 531-550). The repeat is not pure and contains a single CAA interruption. The glutamine encoded repeat for HTT85Q contained 83 pure CAG repeats, followed by a single CAA/CAG cassette, while the glutamine encoded sequence for HTT10Q contained 8 pure CAG repeats followed by a single CAA/CAG cassette. Both constructs contained a proline stretch distal to the glutamine repeat in the following allelic conformation where QT represents the total glutamine length:

      HTT85Q: QT\=85, (CAG)83(CAACAG)1(CCGCCA)1(CCG)7(CCT)2

      HTT10Q: QT\=10, (CAG)8(CAACAG)1(CCGCCA)1(CCG)7(CCT)2

      There are plans to probe for somatic expansion in various brain regions, including the caudate and putamen, as well as several distal cortical regions. That analysis is ongoing and not in the scope of the present manuscript; however, these analyses are now mentioned in the discussion section (lines 540-560), as well as a discussion on the ability to either remove or duplicate the CAA/CAG cassette to potentially increase or decrease the rate of disease progression, respectively, based on the work of Ciosi et al. 2019. Additionally, Reviewer 2 is correct in that the lack of intronic sequences in the transgene precludes the formation of splicing variants, such as the exon1/intron1 variant, which we know is pathological based on the work of Bates et al. This drawback has been added to the discussion, along with other limitations of this viral vector-based model (Lines 503-517).

      3. “What about RAN translations? Is RAN translation noted at all in this over-expression model? How does that contribute (or not) to the progressive phenotype they see in their NHPs?”

      We are also curious regarding the assessment of toxic protein products from RAN translation of the expanded repeat sequence in this model. These studies are planned, and the results of these assays will be included in a future manuscript describing other ongoing post-mortem evaluations in this model.

    1. Author Respose

      Reviewer #1 (Public Review):

      This manuscript reports a new genetically encoded neuronal silencer BoNT-C. They show that it fully blocks neurotransmission in two classes of Drosophila motor neurons (Is and 1b; tonic and phasic, respectively). They also update a GCaMP postsynaptic reporter SynapGCaMP to express GCaMP8f instead of 6f. They selectively silence 1b or 1s neurons to disambiguate the neurotransmission properties of each neuron. Finally, they show that silencing either 1b or 1s neurons does not induce heterosynaptic structural or functional plasticity (only neuron ablation triggers plasticity). The data are convincing. The new silencing tool will be widely used.

      We thank this reviewer for his positive assessment of our study and for highlighting the utility of the new silencing tool presented in this study.

      Reviewer #2 (Public Review):

      The conclusions of this paper are properly supported by the provided data.

      Overall this work opens a new window to examine novel aspects of heterosynaptic structural and functional plasticity.

      We also thank this reviewer for his positive assessment of our study and for putting the importance of our findings in context.

      Reviewer #3 (Public Review):

      The strength of the manuscript by Han et al. is the comprehensive characterization of BoNT-C, showing that it truly abolishes all evoked and mini responses without structural alteration of the NMJ. Based on this, the authors then show that ablation of all neurotransmission in either Ib or Is does not cause any compensatory changes (neither functional nor structural) in the 'other' (i.e. looking at Is when silencing Ib or looking at Ib when silencing Is).

      The weakness of the manuscript lies in the modest gain over the previous work. Specifically, Aponte-Santiago had already shown that many parameters are not changed (in Ib when Is is perturbed, or in Is when Ib is perturbed), including that 'the Is terminal failed to show functional or structural changes following loss of the coinnervating Ib input' (quote from 2020 paper). Hence, the only major difference is that Han et al now show that Ib also does not really change when Is is silenced. Aponte-Santiago also clearly showed a ~50% EJP reduction when Is or Ib are perturbed alone, and adding these two equals wild type. The highly emphasized finding of Han et al. that (quote) ' composite values of Is and Ib neurotransmission can be fully recapitulated by isolated physiology from each input' quite obviously follows from the one key finding that one does not affect the other, as mentioned above in the strengths. The wording is a bit odd, but really adding Is (with Ib perturbed) and Ib (with Is perturbed) inputs is really not adding much over either the main finding nor the previous work.

      We thank this reviewer for his/her/their assessment of our study and for highlighting the strengths in characterizing the impact of BoNT-C expression at the NMJ. We also understand and appreciate the criticisms raised. It is important to note from the outset that the motivation and central goal of this study was not primarily to mechanistically dissect heterosynaptic plasticity between tonic and phasic motor inputs at the Drosophila NMJ. Rather, it was to develop an approach that would, for the first time, enable accurate isolation of complete neurotransmission from entire MN-Is or MN-Ib NMJs (both miniature and evoked transmission). By the reviewer’s own admission, we were entirely successful at achieving this central goal in our comprehensive characterization of BoNT-C.

      Next, the reviewer raises the valid question about whether this achievement is a significant advance over previous work, and discusses recent experimental findings regarding heterosynaptic plasticity at the fly NMJ. We want to emphasize here that having a tool that is capable, for the first time, of accurately discriminating complete transmission from Is vs Ib alone is a major advance, one that it is not clear the reviewer sufficiently appreciates. As summarized in Fig. 1, no previous attempts have been successful in accurately isolating synaptic transmission between Is vs Ib synapses. In particular, no previous approach was capable of isolating miniature activity from Is vs Ib, and as we show in our manuscript, miniature events exhibit major differences between the two inputs. Thus, without isolating miniature transmission, one cannot know baseline synaptic function in Is vs Ib nor whether heterosynaptic functional plasticity has been induced. Further, we detail major confounds with some of the previous approaches the reviewer alludes to in prior studies, including selective optogenetic stimulation.

      Finally, the reviewer discusses at length recent findings regarding heterosynaptic plasticity and questions whether the new insights revealed by BoNT-C provides a sufficient advance. In particular, the reviewer refers to previous work published in 2020 and 2021, where important initial insights into Is vs Ib structure and transmission after differential manipulations to either input was reported. The reviewer appears to believe that it was settled in these studies that no heterosynaptic functional plasticity was induced.

      However, a critical point that the reviewer appears not to appreciate is that while the two previous studies on heterosynaptic plasticity at the Drosophila NMJ were able to assess structural plasticity (AponteSantiago et al., 2020; Wang et al., 2021), no accurate or quantitative conclusions can be made about heterosynaptic functional plasticity from these studies. This is due to the authors not knowing what baseline synaptic function is at Is vs Ib (miniature frequency, miniature amplitude, and evoked transmission), so that in their manipulations they cannot accurately determine whether any functional changes are observed after their manipulations. Further complicating the interpretation of the previous studies is that at the muscle 1 NMJ (2020 study), like the muscle 4 NMJ (2021 study), ~30% of these NMJs fail to be innervated by a Is input in wild-type larvae. This major confound makes it difficult to know how or whether adaptive plasticity is induced in wild-type NMJs with or without Is innervation (since, interestingly, evoked transmission does not appear to change in wild-type m1 or m5 NMJs with or without a Is input), and then to determine whether any heterosynaptic plasticity is induced. Indeed, we have also struggled with how to accurately determine whether synaptic function changes compared to baseline throughout our studies at earlier stages, despite the fact that the muscle 6/7 NMJ we use in our study does not suffer from the variable Is innervation confounds observed at muscle 4 (Wang et al., 2021) and muscle 1 (Aponte-Santiago et al., 2020).

      Respectfully, we contend that the only way one can accurately and quantitatively determine baseline synaptic transmission (miniature amplitude, frequency, evoked, quantal content), and whether any changes are observed following manipulations to Is or Ib, is to fully and accurately recapitulate wild type (blended Is+Ib) neurotransmission from isolated Is vs Ib transmission. This is why we believe the data shown in Fig. 7 (and also Fig. S7 in the revised manuscript) is so important. It is true that numerous previous studies established relative and qualitative differences between Is vs Ib (miniature events are larger at Is relative to Ib, Is drives larger depolarization in response to single synaptic stimulation over Ib, etc). However, in no case did previous studies accurately assess baseline Is vs Ib synaptic function from entire inputs, and therefore could not conclude with certainty whether heterosynaptic functional plasticity was induced.

      On a different but somewhat similar topic, UAS-BoNT-C is not a new tool. I am a bit put off by the wording ' We have developed a botulinum neurotoxin, BoNT-C...'. More on this and the way the previous BoNT-C paper (Backhaus et al., 2016) is cited in the detail comments below in the recommendations for the authors.

      We understand these points raised by the reviewer. Our BoNT-C transgenic line is indeed a new tool, the only one in which synaptic transmission has ever been electrophysiologically characterized and shown to completely silence synaptic transmission in Drosophila. That being said, in retrospect, we can appreciate that the term “developed” might imply a level of innovation that reasonable people can disagree about. We have therefore elected to change the apparently offensive wording to “We have employed a botulinum neurotoxin, BoNT-C…” in the abstract of the revised manuscript.

      Additionally, the manuscript does not really dive into an analysis of phasic versus tonic functions (that's just a correlation with the Is and Ib dominant modes of function).

      We absolutely agree that selective silencing by BoNT-C now enables a rigorous study of tonic vs phasic neurotransmission at MN-Is vs MN-Ib NMJs, but that in the current manuscript we have not focused on this interesting question. We have adopted the convention the field has used to classify MN-Is and MN-Ib subtypes based on their apparent firing modes as “phasic” vs “tonic”, but like previous studies, we have not analyzed these functional distinctions on a deeper level. Although the focus of the current manuscript was to establish the properties of BoNT-C and highlight its utility as a tool for the field, we are now in the process of preparing an entirely new manuscript focused on just this reviewer’s question about the differences in tonic vs phasic synaptic physiology. This eight-figure manuscript will be entitled “Electrophysiological properties and nanoscale distinctions that define tonic vs phasic glutamatergic synapses” and is focused on the central question raised by the reviewer - how and why synaptic transmission differs between tonic vs phasic inputs. While this interesting question is outside the scope of the current manuscript, we will submit this new manuscript within the next few months, which is based on new experimental insights now enabled by selective BoNT-C silencing established in the current manuscript.

      Finally, since the authors show that loss of Is or Ib function does not cause any change in the other, we are left to wonder what actually DOES cause heterosynaptic plasticity. TNT or rpr DO cause some heterosynaptic plasticity and they also DO cause some structural changes - but whether the structural changes themselves are important here remains unclear. Substantial progress would have been to take the starting point that BoNT-C does not cause heterosynaptic plasticity, and then identify the signal that does (is it morphology? or signaling between Is and Ib? Or with the muscle?).

      We certainly agree with the reviewer that understanding how heterosynaptic plasticity is induced is an important question and worthy of additional investigation. As stated above, the focus of our current study was to establish the tool, BoNT-C, that will now enable a variety of fascinating and important future studies, both at understanding how and why synaptic strength differs between tonic vs phasic synapses and also how heterosynaptic plasticity signaling occurs at the NMJ. It required substantial time and experimental effort to establish that BoNT-C works to cleanly silence transmission without inducing structural and functional plasticity in the current manuscript (Figures 1-7 and several supplemental figures). Respectfully, we believe it is unreasonable to expect all of this data to be relegated to a “starting point” to then go on and probe heterosynaptic plasticity in more detail, all compressed into a single paper.

      It appears this reviewer is particularly interested in heterosynaptic plasticity, which we agree is a fascinating topic. First, we should clarify that in our experiments, TNT expression does NOT induce any heterosynaptic structural or functional plasticity (see Figures 6 and Table S2), at least in our studies at m6/7, m12/13, and m4 NMJs. Rather, TNT expression alters synaptic structure in the neuron in which it is expressed (“intrinsic structural plasticity”, Fig. 6), but does not induce any changes to the convergent input. Hence, the only evidence for actual heterosynaptic plasticity is the rather minor adaptations in synaptic structure and function observed following ablation of Is motor inputs (Fig. 6 and 8).

      In addition to the important insights revealed by BoNT-C in accurately distinguishing tonic vs phasic transmission outlined above, it appears that the reviewer does not fully appreciate the mechanistic constraints that the new BoNT-C tool reveals about heterosynaptic signaling. We would therefore like to highlight the key insights our study has revealed specifically about heterosynaptic plasticity. First, we show that at the muscle 6/7 NMJ, loss of MN-Ib completely eliminates Is innervation – this was not the finding reported in the 2020 study (Ib ablation was not reported in the 2021 study). Rather, AponteSantiago et al. 2020 reported that elimination of Ib did not trigger compensatory changes in active zone or bouton numbers of the Is input, no were compensatory increases in the Is EPSP reported. This may be due to the confounding variable Is innervation at the muscle 1 and muscle 4 NMJs used in the previous studies. Second, to what extent miniature transmission changes after manipulating activity from Is vs Ib could not be accurately assessed in previous studies because spontaneous activity persists following TNT expression as does innervation following rpr.hid expression. Third, and perhaps most important, our study is the only one that can demonstrate no heterosynaptic functional plasticity is induced by the physical presence but functional silencing of neurotransmitter release between tonic vs phasic inputs at NMJs with consistent innervation by both Is and Ib inputs.

      It is clear to us now that we did not do a sufficient job of emphasizing these advances our study has now revealed about the baseline and heterosynaptic relationships between Is vs Ib. We have added additional details throughout the revised manuscript to ensure these insights are highlighted in an effort for the reader to better appreciate the importance of this study.

      Overall, while an initial reading of the manuscript sounded rather exciting, a deeper analysis of the work in context of the literature of the last few years diminishes my enthusiasm for the novelty and progress provided.

      We have responded to the major criticisms raised by this reviewer above and hope that he/she/they can more fully appreciate the importance of the new tool we developed, the impact it will have on the field in opening new studies on tonic vs. phasic transmission, and establishing the rules of heterosynaptic plasticity between convergent tonic and phasic inputs on common targets.

    1. Author Response

      Reviewer #1 (Public Review):

      It should also be noted that their immunohistochemical studies of human fetal tissue for TBX5 and PTK7 are not convincing. There appears to be widespread staining of multiple cell types, suggesting either very broad expression of both genes or poor specificity of the primary antibodies.

      We appreciate the reviewer’s comment that the immunohistochemistry staining does not provide definitive evidence for the functional importance of TBX5 and PTK7 in PUV, however these images do confirm that the proteins are ‘in the right place at the right time’ during normal human urinary tract development. We have updated the discussion on page 19, line 441-445 to emphasise this. To further support a putative role for these proteins in urinary tract development we have added additional images from a second human embryo at the same gestation which confirms these distinct patterns of staining (Figure 8 – figure supplement 1 on page 14, lines 313-317). Even if these proteins can also be detected in other tissues or cell types, this does not detract from this idea, as in other locations the proteins may have redundant or different roles. 

      PUVs have not been described as a clinical manifestation of disease associated with mutations of either gene in humans.

      The reviewer is correct that rare variants affecting TBX5 and PTK7 have not previously been associated with PUV. They have however been associated with other developmental anomalies (as stated in the discussion on page 18, line 408-411 and page 19, line 434-437) confirming a clear role in embryonic development for both these genes.

      The fact that rare variant association testing did not identify an increased burden of rare, likely deleterious variants in these two genes (although with limited power in this cohort) suggests that PUV is not driven by ultra-rare, highly penetrant alleles in these genes. However, the identification of common and low-frequency variants using GWAS suggests a complex mode of inheritance for PUV likely in combination with maternal_/in utero_ factors. As with other complex traits, these signals provide potential insights into the underlying biology of this disease as opposed to the diagnostic implications of conventional monogenic gene discovery associated with purely Mendelian conditions. A paragraph on the Mendelian/complex trait implications of the findings of the study has been incorporated into the discussion (page 21-22, line 594-502).       

      Discuss how variants in either gene or in the patterns of structural variants that they found associated with PUV intersect with sex to result in this exclusively male condition.

      The fact that PUV is a uniquely male disease is most likely the result of differences in urethra and bladder development and length differences in urethra between males and females. Sex hormones may also potentially result in tissue-specific differences in gene expression (Ober, Loisel, and Gilad 2008). We have added a paragraph into the discussion to clarify this (page 20, line 454-463) as well as clarified the results of the chromosome X and sex-specific analyses (page 7, lines 149-155; see also Reviewer 2, point 5 below) as suggested. 

      Reviewer #2 (Public Review):

      Major:

      1. The replication study is problematic given that different genotyping methods are used for cases (targeted KASP) versus controls (WGS). This may introduce differential bias. Moreover, the ancestry of the control cohort (UK-based) does not seem to be well matched to the cases (predominantly German and Polish), and the lack of genome-wide data for the cases precludes proper adjustment for population stratification. The case-control design is also imbalanced in the replication study. The authors should reconsider their replication strategy to include a more balanced cohort with ancestry-matched controls and uniform genotyping. As an alternative, genome-wide genotyping of the replication case cohort would significantly enhance the study and should be considered.

      Many thanks to the reviewer for their valuable comments regarding the replication study case-control cohort. While different sequencing technologies were used to compare allele counts at the lead variants in the replication study (KASP genotyping for cases vs WGS for controls), both techniques exhibit > 99.5% accuracy and are subjected to variant level quality control metrics. Only individuals with reliably called genotypes were included in the replication analysis. This has been clarified in the methods section (page 30, line 693).

      We were able to obtain genome-wide genotyping data for 204 of the 395 European cases in the replication cohort. While (despite sustained effort on our part) we were unable to analyze this data jointly with the control cohort in the 100KGP due to enforced limitations on data sharing, we were able to demonstrate similar ancestry of the replication study cases and controls:  we performed PCA on a set of ~80,000 overlapping autosomal, high-quality, LD-pruned variants with MAF > 10% and projected the cases and controls separately onto (the same) data from the 1000 Genomes Project (Phase 3) labelled by ‘population’ (Figure 5). This clearly demonstrates that both cohorts have homogeneous European ancestry, as stated now in the results (page 8, lines 166-168).

      We note with thanks the reviewer’s comments regarding the case-control imbalance in the replication study which can sometimes result in a type 1 error. To address this, the case control ratio was reduced from 1:27 to 1:10.5 by including only the 4,151 male controls from the cancer cohort of the 100KGP. The results remained significant for both lead variants and have been updated in the manuscript (page 8, line 162-176; Table 2).

      When the number of controls was reduced to 500 males (a case:control ratio of 1:1.3), rs10774740 (TBX5 locus) remained significant demonstrating that case-control imbalance was not driving the observed signal (P\=9.9x10-3; OR 0.77; 95% CI 0.63-0.94). rs144171242 (PTK7 locus) however did not reach significance due to insufficient power (P\=0.06; OR 2.24; 95% CI 0.93-5.36). For a rare variant such as rs144171242 (MAF ~ 1%), a replication study with 500 controls is only powered to detect association with large effect size (OR > 3.5). A case:control ratio of ~1:10 is therefore optimal to maximize power to detect association, while minimizing unnecessary noise from excess controls. This has been added to the results section of the manuscript (page 8-9, lines 178-184).

      2. I am reassured that the TBX5 signal remains genome-wide significant in European-only analysis. However, the signal at PTK7 appears much less robust - it has borderline statistical significance (especially given that the authors test for all rare and common variants across the genome) and is represented by a single variant with a relatively rare risk allele that is differentially distributed by ancestry. Therefore, I would like to see more information for this specific signal:

      Information on the depth of coverage and the quality of the top variant

      This has been incorporated into the manuscript for both lead variants (Page 7, lines 142-145). For rs144171242 at the PTK7 locus, the meanDP was 29.34 and the meanGQ was 75.59.

      Information if the top PTK7 variant remain genome-wide significant after application of genomic control. Of note, the calculation of genomic inflation is dependent on sample size - lambda of 1.05 may represent an underestimate given low power of the cohort, and this point deserves at least a comment. Some methods correcting lambda for sample size have been proposed, and the authors should consider applying these methods.

      We appreciate the reviewer’s comments that the value of lambda may be affected by sample size and have added a comment to this in the manuscript (Page 7, line 136-137). Despite extensive searching, we were unable to find a recent published example of how to correct lambda for sample size and would be grateful if the reviewer could suggest a reference for this.

      To answer the reviewer’s specific question, application of genomic control to the lead variant at PTK7 results in P\=4.37x10-8 which remains below the threshold for conventional genome-wide significance. However, while the genomic inflation factor provides a reasonable indication of possible confounding by population structure, there are recognized limitations to applying it as a corrective factor as it assumes that all variants are confounded i.e., the same correction is applied irrespective of differences in population allele frequency which can be insufficient for some variants and lead to a loss of power in others. Furthermore, in addition to sample size, lambda can vary with heritability and disease prevalence (Yang et al. 2011) and its use for correction can therefore be too conservative and reduce power to detect significant associations. In this manuscript we therefore chose to use the mixed model approach (as part of SAIGE – detailed in the methods on page 28, lines 647-648), which has largely superseded older methods such as genomic control, to robustly correct for both population structure and cryptic relatedness and minimize false positive associations (Shin and Lee 2015).

      This locus requires more robust replication as discussed above. If more robust replication study is not possible, additional functional studies could provide more evidence in support of this locus.

      Please refer to point 1 regarding the revised and more robust evidence of replication. 

      3. There is no validation of sensitivity and specificity of SV detection by variant size or type (e.g. inversions, deletions, duplications). Also, since burden differences are not replicated independently, the authors should stress the exploratory nature of these analyses.

      We appreciate the reviewer’s comment that there is no independent validation of SV detection (e.g., by microarray or long-read sequencing) and this was reported as a limitation of our study in the discussion (page 22-23, line 520-524). However, one of the main strengths of this study is the use of clinical-grade WGS data where all samples have been sequenced on the same platform and undergone variant calling using the same bioinformatics pipeline. This essentially eliminates confounding due to differences in data generation and processing and the sensitivity and specificity of SV detection will therefore be the same for both cases and controls.

      We agree with the reviewer that the SV analyses have not yet been replicated independently and, as they suggest, have stressed the exploratory nature of the findings in the discussion (page 21, line 491-493).

      In the discussion (especially second paragraph, but also throughout), the authors overemphasize multi-ancestry nature of their study. The reality is that the included non-Europeans are very small in numbers (18 SAS cases, 11 AFR cases, and 14 admixed cases). I would suggest for the authors to specifically state these case counts and make it clear that expanded efforts to recruit non-Europeans are still needed given these very low numbers.

      We appreciate the reviewer’s comment about the overemphasis on the multi-ancestry nature of the study and the small absolute numbers of individuals included, however as a proportion of the cohort, a third of the cases are non-European: 14% are of South Asian ancestry, 8% are of African ancestry and 11% are admixed. This breakdown comprises a greater proportion of non-white European ancestry individuals than the UK as a whole (DOI: 10.5257/census/aggregate-2001-2), where the discovery cohort was based. This provides evidence that our study eliminates at least some of the Euro-centric bias present in existing genetic and genomic literature, at least as far as the UK population is concerned. Clearly, global studies fairly representing all populations would be needed to address this issue perfectly. The case counts were reported in Table 1 but we have now referenced the low absolute numbers and included the reviewer’s suggestion about expanding efforts to recruit non-European populations in the main text (page 22, line 518-520). We have also edited paragraph two of the discussion in response to the reviewer’s comments (page 17, line 387-398).   

      Supplemental figure 2 -provide case-control counts in each ancestral group (Y axis).

      These have been added to the figure legend of Figure 6 – supplemental figure 4 (previously Figure 5 - supplemental figure 2).

      Supplemental figure 3 is misleading since allelic frequencies in the cases are pooled and are not available individually for all depicted populations.

      Figure 5 - supplemental figure 3 has been removed and replaced by Figure 6 – supplemental figure 3 to show only the individual case, control and gnomAD AF by ancestry for AFR, SAS and EUR population groups instead of using the pooled allele frequencies.

      5. I did not see details of chr. X analysis. This is important given that the case group involves only Males and control group involves both Males and Females. Also, please explain how sex was used as a fixed effect (as stated in the methods) given that the case cohort is 100% male.

      We thank the reviewer for their insightful comments. Sex was used as a covariate (or fixed effect) to control for the anatomical differences in development of the urethra (and in utero hormonal changes) between the sexes in the control cohort (clarified in the methods, page 28, lines 651-653). Given the PheWAS findings (page 13, line 292-297) reveal an association between the lead variant near TBX5 and female genital prolapse and urinary incontinence, this suggests that while women do not develop PUV (due to differences in urethral development) they may manifest other lower urinary tract phenotypes. In theory, removing the female individuals from the control cohort should therefore strengthen the association as the signal would not be diluted by ‘affected’ women (i.e., those with potentially unknown lower urinary tract phenotypes). We tested this by performing a sex-specific male-only GWAS and found that the strength of association at both lead variants increased. The results of this have been added to the manuscript (page 7, line 149-155).

      The results of the chromosome X rare variant analysis are shown on the Manhattan plot (Figure 9), with no significant genes identified. We have added chromosome X to the mixed-ancestry and European GWAS as suggested (with no significant results) and the Manhattan and Q-Q plots have been updated in Figure 2 and Figure 6. The number of analyzed variants in each analysis has also been updated accordingly.

    1. Author Response

      Reviewer #2 (Public Review):

      Feeding behaviour in C. elegans has been extensively studied over decades. Several methods  of measuring feeding exist, but none can directly measure both pumping and locomotion  behaviour in freely-moving worm populations. The authors have developed a new  imaging-based method for automated detection of pharyngeal pumping events in freely moving

      C. elegans populations, and can thus simultaneously measure pumping and locomotion  behaviour in tens of worms, at a single-worm, single-pump resolution that is not possible with  previous methods. This user-friendly method can be applied to several research directions, such  as large-scale foraging, behavioural coordination, and high throughput screening.

      The authors designed their new method to be broadly applicable and user-friendly, for easy  adaptation in other research labs. However, adding direct evidence to show that "the method is  relatively insensitive to the optical instrument used" will better support this claim of wider  application.

      We appreciate the reviewer’s suggestion to show evidence that our method will also work on  data acquired on different microscopes. We now present data obtained on a second  epi-fluorescent microscope, which was downscaled and analyzed in Fig. 1H-J.

      The authors carefully benchmarked their new method against expert annotations and existing  results from previous methods, to both validate their method and reveal additional advantages.  They also assessed potential pitfalls of the method such as by examining the effect of  fluorescence imaging on the behavioural outcome, albeit only at the timescale of minutes. The  effect of longer-term fluorescence imaging should be further explored, which is relevant for  large-scale foraging experiments that the authors discussed. It could be helpful to determine the  maximum total exposure for the method to still be valid, both in terms of pump detection (which  could be sensitive to photobleaching) and behavioural modulation (which could be sensitive to  higher phototoxicity).

      We thank the reviewer for this comment. In response to their comment and related comments  by the other reviewers, we have provided bleaching curves and evidence of long-term imaging  to show the potential of the methods for longer scale assays. We found that with our illumination  intensity (see methods), bleaching was significant at a time scale of ~1h. We then added  triggered illumination and could extend the recording time to ~5 h (Methods). Additionally, we  perform a supplementary control for viability of worms exposed to continuous light (not  triggered) for 5 hrs. We do not observe any apparent phototoxic effect.

      Overall, the manuscript is well-written and the results are clearly presented both in terms of  statistics and interpretation. Methodological details are well-documented and openly accessible.

      We thank the reviewer for their positive view of our work and their appreciation for our efforts to  document both data and software.

      Reviewer #3 (Public Review):

      In this manuscript, the authors present a method for simultaneous assessment of pharyngeal  pumping (feeding) and locomotion in many C. elegans simultaneously. In this technique,  imaging of the fluorescent labeled pharynx provides a measure of velocity and pumping rate,  through analysis of the spatial variations in fluorescence.

      The technique is clearly described, well-validated, and yields some novel results. It has the  advantage that it can be performed using microscopes found in many C. elegans laboratories.

      We appreciate that the reviewer recognizes the wide applicability of the method across many C.  elegans  laboratories.

      Some limitations of the method include its reliance on fluorescence imaging, which is a  hindrance to genetic analysis, computational intensiveness, and phototoxic effects of  fluorescence excitation that are not fully explored in the manuscript.

      The authors show the utility of their method by assessing pharyngeal pumping and motor  behavior (1) during development, (2) in the presence or absence of food, and (3) in the  presence of two mutations affecting feeding.

      Although I understand these are proof-of-principle demonstrations, I still came away feeling  underwhelmed by these examples. I did not see any results here that could not have been  obtained fairly easily with conventional techniques.

      We appreciate the constructive criticism of the reviewer and highlight in the revised version the  fact that using conventional techniques such studies would require tens of hours of experiment  time. We would like to emphasize the comparisons in Table 1 where we show other methods  and their current limitations. Obtaining a dataset such as in Figure 3 which comprises a total of  34 worm-hours of pumping observation from unrestrained animals is to our knowledge currently  impractical with competing methods. We would like to remind the reviewer that, using our  method we were able to reveal bimodal distributions within a population as illustrated, for  instance, in Fig. 3F, 4B, and 4F. These observations are not possible when the single worm  resolution is not accessible or when large statistics are not feasible as it happens with previous  methods.

      Given these limitations, I feel the method's eventual impact in the field will be relatively small.

      In this study, we present a method allowing performing behavioral studies on worm populations  at high throughput and reduced costs. Such a technique opens the door to many laboratories  that can not do EPG recordings or microfluidics due to the technical difficulties, or that want to  study animals in their normal plate context. We also would like to emphasize that there are already more than 1500 strains containing myo-2  promoter transgene available on CGC, which  would be amenable to our imaging approach. These transgenic strains form broad classes of  interest, such as thermotolerance, ER stress resistance, aging and neural-circuit specific genes.

      Pharyngeal pumping has also been used as a read-out for pharmacological screens, for  example, bacteria pre-loaded with pharmacological agents are tested for their effect on  pharyngeal pumping rate. Pharaglow offers a high-throughput and sensitive method to measure  the pumping rate. This will benefit the field who use C. elegans  pumping for pharmacological  screens, and pave the way for the researchers who plan to use but are hindered by existing  techniques.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors evaluate the involvement of the hippocampus in a fast-paced time-to-contact estimation task. They find that the hippocampus is sensitive to feedback received about accuracy on each trial and has activity that tracks behavioral improvement from trial to trial. Its activity is also related to a tendency for time estimation behavior to regress to the mean. This is a novel paradigm to explore hippocampal activity and the results are thus novel and important, but the framing as well as discussion about the meaning of the findings obscures the details of the results or stretches beyond them in many places, as detailed below.

      We thank the reviewer for their constructive feedback and were happy to read that s/he considered our approach and results as novel and important. The comments led us to conduct new fMRI analyses, to clarify various unclear phrasings regarding our methods, and to carefully assess our framing of the interpretation and scope of our results. Please find our responses to the individual points below.

      1) Some of the results appear in the posterior hippocampus and others in the anteriorhippocampus. The authors do not motivate predictions for anterior vs. posterior hippocampus, and they do not discuss differences found between these areas in the Discussion. The hippocampus is treated as a unitary structure carrying out learning and updating in this task, but the distinct areas involved motivate a more nuanced picture that acknowledges that the same populations of cells may not be carrying out the various discussed functions.

      We thank the reviewer for pointing this out. We split the hippocampus into anterior and posterior sections because prior work suggested a different whole-brain connectivity and function of the two. This was mentioned in the methods section (page 15) in the initial submission but unfortunately not in the main text. Moreover, when discussing the results, we did indeed refer mostly to the hippocampus as a unitary structure for simplicity and readability, and because statements about subcomponents are true for the whole. However, we agree with the reviewer that the differences between anterior and posterior sections are very interesting, and that describing these effects in more detail might help to guide future work more precisely.

      In response to the reviewer's comment, we therefore clarified at various locations throughout the manuscript whether the respective results were observed in the posterior or anterior section of the hippocampus, and we extended our discussion to reflect the idea that different functions may be carried out by distinct populations of hippocampal cells. In addition, we also now motivate the split into the different sections better in the main text. We made the following changes.

      Page 3: “Second, we demonstrate that anterior hippocampal fMRI activity and functional connectivity tracks the behavioral feedback participants received in each trial, revealing a link between hippocampal processing and timing-task performance.

      Page 3: “Fourth, we show that these updating signals in the posterior hippocampus were independent of the specific interval that was tested and activity in the anterior hippocampus reflected the magnitude of the behavioral regression effect in each trial.”

      Page 5: “We performed both whole-brain voxel-wise analyses as well as regions-of-interest (ROI) analysis for anterior and posterior hippocampus separately, for which prior work suggested functional differences with respect to their contributions to memory-guided behavior (Poppenk et al., 2013, Strange et al. 2014).”

      Page 9: “Because anterior and posterior sections of the hippocampus differ in whole-brain connectivity as well as in their contributions to memory-guided behavior (Strange et al. 2014), we analyzed the two sections separately. “

      Page 9: “We found that anterior hippocampal activity as well as functional connectivity reflected the feedback participants received during this task, and its activity followed the performance improvements in a temporal-context-dependent manner. Its activity reflected trial-wise behavioral biases towards the mean of the sampled intervals, and activity in the posterior hippocampus signaled sensorimotor updating independent of the specific intervals tested.”

      Page 10: “Intriguingly, the mechanisms at play may build on similar temporal coding principles as those discussed for motor timing (Yin & Troger, 2011; Eichenbaum, 2014; Howard, 2017; Palombo & Verfaellie, 2017; Nobre & van Ede, 2018; Paton & Buonomano, 2018; Bellmund et al., 2020, 2021; Shikano et al., 2021; Shimbo et al., 2021), with differential contributions of the anterior and posterior hippocampus. Note that our observation of distinct activity modulations in the anterior and posterior hippocampus suggests that the functions and coding principles discussed here may be mediated by at least partially distinct populations of hippocampal cells.”

      Page 11: Interestingly, we observed that functional connectivity of the anterior hippocampus scaled negatively (Fig. 2C) with feedback valence [...]

      2) Hippocampal activity is stronger for smaller errors, which makes the interpretationmore complex than the authors acknowledge. If the hippocampus is updating sensorimotor representations, why would its activity be lower when more updating is needed?

      Indeed, we found that absolute (univariate) activity of the hippocampus scaled with feedback valence, the inverse of error (Fig. 2A). We see multiple possibilities for why this might be the case, and we discussed some of them in a dedicated discussion section (“The role of feedback in timed motor actions”). For example, prior work showed that hippocampal activity reflects behavioral feedback also in other tasks, which has been linked to learning (e.g. Schönberg et al., 2007; Cohen & Ranganath, 2007; Shohamy & Wagner, 2008; Foerde & Shohamy, 2011; Wimmer et al., 2012). In our understanding, sensorimotor updating is a form of ‘learning’ in an immediate and behaviorally adaptive manner, and we therefore consider our results well consistent with this earlier work. We agree with the reviewer that in principle activity should be stronger if there was stronger sensorimotor updating, but we acknowledge that this intuition builds on an assumption about the relationship between hippocampal neural activity and the BOLD signal, which is not entirely clear. For example, prior work revealed spatially informative negative BOLD responses in the hippocampus as a function of visual stimulation (e.g. Szinte & Knapen 2020), and the effects of inhibitory activity - a leading motif in the hippocampal circuitry - on fMRI data are not fully understood. This raises the possibility that the feedback modulation we observed might also involve negative BOLD responses, which would then translate to the observed negative correlation between feedback valence and the hippocampal fMRI signal, even if the magnitude of the underlying updating mechanism was positively correlated with error. This complicates the interpretation of the direction of the effect, which is why we chose to avoid making strong conclusions about it in our manuscript. Instead, we tried discussing our results in a way that was agnostic to the direction of the feedback modulation. Importantly, hippocampal connectivity with other regions did scale positively with error (Fig. 2B), which we again discussed in the dedicated discussion section.

      In response to the reviewer’s comment, we revisited this section of our manuscript and felt the latter result deserved a better discussion. We therefore took this opportunity to extend our discussion of the connectivity results (including their relationship to the univariate-activity results as well as the direction of these effects), all while still avoiding strong conclusions about directionality. Following changes were made to the manuscript.

      Page 11: Interestingly, we observed that functional connectivity of the anterior hippocampus scaled negatively (Fig. 2C) with feedback valence, unlike its absolute activity, which scaled positively with feedback valence (Fig. 2A,B), suggesting that the two measures may be sensitive to related but distinct processes.

      Page 11: Such network-wide receptive-field re-scaling likely builds on a re-weighting of functional connections between neurons and regions, which may explain why anterior hippocampal connectivity correlated negatively with feedback valence in our data. Larger errors may have led to stronger re-scaling, which may be grounded in a corresponding change in functional connectivity.

      3) Some tests were one-tailed without justification, which reduces confidence in the robustness of the results.

      We thank the reviewer for pointing us to the fact that our choice of statistical tests was not always clear in the manuscript. In the analysis the reviewer is referring to, we predicted that stronger sensorimotor updating should lead to stronger activity as well as larger behavioral improvements across the respective trials. This is because a stronger update should translate to a more accurate “internal model” of the task and therefore to a better performance. We tested this one-sided hypothesis using the appropriate test statistic (contrasting trials in which behavioral performance did improve versus trials in which it did not improve), but we did not motivate our reasoning well enough in the manuscript. The revised manuscript therefore includes the two new statements shown below to motivate our choice of test statistic more clearly.

      Page 7: [...] we contrasted trials in which participants had improved versus the ones in which they had not improved or got worse (see methods for details). Because stronger sensorimotor updating should lead to larger performance improvements, we predicted to find stronger activity for improvements vs. no improvements in these tests (one-tailed hypothesis).

      Page 18: These two regressors reflect the tests for target-TTC-independent and target-TTC-specific updating, respectively. Because we predicted to find stronger activity for improvements vs. no improvements in behavioral performance, we here performed one-tailed statistical tests, consistent with the direction of this hypothesis. Improvement in performance was defined as receiving feedback of higher valence than in the corresponding previous trial.

      4) The introduction motivates the novelty of this study based on the idea that thehippocampus has traditionally been thought to be involved in memory at the scale of days and weeks. However, as is partially acknowledged later in the Discussion, there is an enormous literature on hippocampal involvement in memory at a much shorter timescale (on the order of seconds). The novelty of this study is not in the timescale as much as in the sensorimotor nature of the task.

      We thank the reviewer for this helpful suggestion. We agree that a key part of the novelty of this study is the use of the task that is typically used to study sensorimotor integration and timing rather than hippocampal processing, along with the new insights this task enabled about the role of the hippocampus in sensorimotor updating. As mentioned in the discussion, we also agree with the reviewer that there is prior literature linking hippocampal activity to mnemonic processing on short time scales. We therefore rephrased the corresponding section in the introduction to put more weight on the sensorimotor nature of our task instead of the time scales.

      Note that the new statement still includes the time scale of the effects, but that it is less at the center of the argument anymore. We chose to keep it in because we do think that the majority of studies on hippocampal-dependent memory functions focus on longer time scales than our study does, and we expect that many readers will be surprised about the immediacy of how hippocampal activity relates to ongoing behavioral performance (on ultrashort time scales).

      We changed the introduction to the following.

      Page 2: Here, we approach this question with a new perspective by converging two parallel lines of research centered on sensorimotor timing and hippocampal-dependent cognitive mapping. Specifically, we test how the human hippocampus, an area often implicated in episodic-memory formation (Schiller et al., 2015; Eichenbaum, 2017), may support the flexible updating of sensorimotor representations in real time and in concert with other regions. Importantly, the hippocampus is not traditionally thought to support sensorimotor functions, and its contributions to memory formation are typically discussed for longer time scales (hours, days, weeks). Here, however, we characterize in detail the relationship between hippocampal activity and real-time behavioral performance in a fast-paced timing task, which is traditionally believed to be hippocampal-independent. We propose that the capacity of the hippocampus to encode statistical regularities of our environment (Doeller et al. 2005, Shapiro et al. 2017, Behrens et al., 2018; Momennejad, 2020; Whittington et al., 2020) situates it at the core of a brain-wide network balancing specificity vs. regularization in real time as the relevant behavior is performed.

      5) The authors used three different regressors for the three feedback levels, asopposed to a parametric regressor indexing the level of feedback. The predictions are parametric, so a parametric regressor would be a better match, and would allow for the use of all the medium-accuracy data.

      The reviewer raises a good point that overlaps with question 3 by reviewer 2. In the current analysis, we model the three feedback levels with three independent regressors (high, medium, low accuracy). We then contrast high vs. low accuracy feedback, obtaining the results shown in Fig. 2AB. The beta estimates obtained for medium-accuracy feedback are being ignored in this contrast. Following the reviewer’s feedback, we therefore re-run the model, this time modeling all three feedback levels in one parametric regressor. All other regressors in the model stayed the same. Instead of contrasting high vs. low accuracy feedback, we then performed voxel-wise t-tests on the beta estimates obtained for the parametric feedback regressor.

      The results we observed were highly consistent across the two analyses, and all conclusions presented in the initial manuscript remain unchanged. While the exact t-scores differ slightly, we replicated the effects for all clusters on the voxel-wise map (on whole-brain FWE-corrected levels) as well as for the regions-of-interest analysis for anterior and posterior hippocampus. These results are presented in a new Supplementary Figure 3C.

      Note that the new Supplementary Figure 3B shows another related new analyses we conducted in response to question 4 of reviewer 2. Here, we re-ran the initial analysis with three feedback regressors, but without modeling the inter-trial interval (ITI) and the inter-session interval (ISI, i.e. the breaks participants took) to avoid model over-specification. Again, we replicated the results for all clusters and the ROI analysis, showing that the initial results we presented are robust.

      The following additions were made to the manuscript.

      Page 5: Note that these results were robust even when fewer nuisance regressors were included to control for model over-specification (Fig. S3B; two-tailed one-sample t tests: anterior HPC, t(33) = -3.65, p = 8.9x10-4, pfwe = 0.002, d=-0.63, CI: [-1.01, -0.26]; posterior HPC, t(33) = -1.43, p = 0.161, pfwe = 0.322, d=-0.25, CI: [-0.59, 0.10]), and when all three feedback levels were modeled with one parametric regressors (Fig. S3C; two-tailed one-sample t tests: anterior HPC, t(33) = -3.59, p = 0.002, pfwe = 0.005, d=-0.56, CI: [-0.93, -0.20]; posterior HPC, t(33) = -0.99, p = 0.329, pfwe = 0.659, d=-0.17, CI: [-0.51, 0.17]). Further, there was no systematic relationship between subsequent trials on a behavioral level [...]

      Page 17: Moreover, instead of modeling the three feedback levels with three independent regressors, we repeated the analysis modeling the three feedback levels as one parametric regressor with three levels. All other regressors remained unchanged, and the model included the regressors for ITIs and ISIs. We then conducted t-tests implemented in SPM12 using the beta estimates obtained for the parametric feedback regressor (Fig. 2C). Compared to the initial analyses presented above, this has the advantage that medium-accuracy feedback trials are considered for the statistics as well.

      6) The authors claim that the results support the idea that the hippocampus is findingan "optimal trade-off between specificity and regularization". This seems overly speculative given the results presented.

      We understand the reviewer's skepticism about this statement and agree that the manuscript does not show that the hippocampus is finding the trade-off between specificity and regularization. However, this is also not exactly what the manuscript claims. Instead, it suggests that the hippocampus “may contribute” to solving this trade-off (page 3) as part of a “brain-wide network“ (pages 2,3,9,12). We also state that “Our [...] results suggest that this trade-off [...] is governed by many regions, updating different types of task information in parallel” (Page 11). To us, these phrasings are not equivalent, because we do not think that the role of the hippocampus in sensorimotor updating (or in any process really) can be understood independently from the rest of the brain. We do however think that our results are in line with the idea that the hippocampus contributes to solving this trade-off, and that this is exciting and surprising given the sensorimotor nature of our task, the ultrashort time scale of the underlying process, and the relationship to behavioral performance. We tried expressing that some of the points discussed remain speculation, but it seems that we were not always successful in doing so in the initial submission. We apologize for the misunderstanding, adapted corresponding statements in the manuscript, and we express even more carefully that these ideas are speculation.

      Following changes were made to the introduction and discussion.

      Page 2: Here, we approach this question with a new perspective by converging two parallel lines of research centered on sensorimotor timing and hippocampal-dependent cognitive mapping. Specifically, we test how the human hippocampus, an area often implicated in episodic-memory formation (Schiller et al., 2015; Eichenbaum, 2017), may support the flexible updating of sensorimotor representations in real time and in concert with other regions.

      Page 12: Because hippocampal activity (Julian & Doeller, 2020) and the regression effect (Jazayeri & Shadlen, 2010) were previously linked to the encoding of (temporal) context, we reasoned that hippocampal activity should also be related to the regression effect directly. This may explain why hippocampal activity reflected the magnitude of the regression effect as well as behavioral improvements independently from TTC, and why it reflected feedback, which informed the updating of the internal prior.

      Page 12: This is in line with our behavioral results, showing that TTC-task performance became more optimal in the face of both of these two objectives. Over time, behavioral responses clustered more closely between the diagonal and the average line in the behavioral response profile (Fig. 1B, S1G), and the TTC error decreased over time. While different participants approached these optimal performance levels from different directions, either starting with good performance or strong regularization, the group approached overall optimal performance levels over the course of the experiment.

      Page 13: This is in line with the notion that the hippocampus [...] supports finding an optimal trade off between specificity and regularization along with other regions. [...] Our results show that the hippocampus supports rapid and feedback-dependent updating of sensorimotor representations, suggesting that it is a central component of a brain-wide network balancing task specificity vs. regularization for flexible behavior in humans.

      Note that in response to comment 1 by reviewer 2, the revised manuscript now reports the results of additional behavioral analyses that support the notion that participants find an optimal trade-off between specificity and regularization over time (independent of whether the hippocampus was involved or not).

      7) The authors find that hippocampal activity is related to behavioral improvement fromthe prior trial. This seems to be a simple learning effect (participants can learn plenty about this task from a prior trial that does not have the exact same timing as the current trial) but is interpreted as sensitivity to temporal context. The temporal context framing seems too far removed from the analyses performed.

      We agree with the reviewer that our observation that hippocampal activity reflects TTC-independent behavioral improvements across trials could have multiple explanations. Critically, i) one of them is that the hippocampus encodes temporal context, ii) it is only one of multiple observations that we build our interpretation on, and iii) our interpretation builds on multiple earlier reports

      Interval estimates regress toward the mean of the sampled intervals, an effect that is often referred to as the “regression effect”. This effect, which we observed in our data too (Fig. 1B), has been proposed to reflect the encoding of temporal context (e.g. Jazayeri & Shadlen 2010). Moreover, there is a large body of literature on how the hippocampus may support the encoding of spatial and temporal context (e.g. see Bellmund, Polti & Doeller 2020 for review).

      Because both hippocampal activity and the regression effect were linked to the encoding of (temporal) context, we reasoned that hippocampal activity should also be related to the regression effect directly. If so, one would expect that hippocampal activity should reflect behavioral improvements independently from TTC, it should reflect the magnitude of the regression effect, and it should generally reflect feedback, because it is the feedback that informs the updating of the internal prior.

      All three observations may have independent explanations indeed, but they are all also in line with the idea that the hippocampus does encode temporal context and that this explains the relationship between hippocampal activity and the regression effect. It therefore reflects a sparse and reasonable explanation in our opinion, even though it necessarily remains an interpretation. Of course, we want to be clear on what our results are and what our interpretations are.

      In response to the reviewer’s comment, we therefore toned down two of the statements that mention temporal context in the manuscript, and we removed an overly speculative statement from the result section. In addition, the discussion now describes more clearly how our results are in line with this interpretation.

      Abstract: This is in line with the idea that the hippocampus supports the rapid encoding of temporal context even on short time scales in a behavior-dependent manner.

      Page 13: This is in line with the notion that the hippocampus encodes temporal context in a behavior-dependent manner, and that it supports finding an optimal trade off between specificity and regularization along with other regions.

      Page 12: Because hippocampal activity (Julian & Doeller, 2020) and the regression effect (Jazayeri & Shadlen, 2010) were previously linked to the encoding of (temporal) context, we reasoned that hippocampal activity should also be related to the regression effect directly. This may explain why hippocampal activity reflected the magnitude of the regression effect as well as behavioral improvements independently from TTC, and why it reflected feedback, which informed the updating of the internal prior.

      The following statement was removed, overlapping with comment 2 by Reviewer 3:

      Instead, these results are consistent with the notion that hippocampal activity signals the updating of task-relevant sensorimotor representations in real-time.

      8) I am not sure the term "extraction of statistical regularities" is appropriate. The termis typically used for more complex forms of statistical relationships.

      We agree with the reviewer that this expression may be interpreted differently by different readers and are grateful to be pointed to this fact. We therefore removed it and instead added the following (hopefully less ambiguous) statement to the manuscript.

      Page 9: This study investigated how the human brain flexibly updates sensorimotor representations in a feedback-dependent manner in the service of timing behavior.

      Reviewer #2 (Public Review):

      The authors conducted a study involving functional magnetic resonance imaging and a time-to-contact estimation paradigm to investigate the contribution of the human hippocampus (HPC) to sensorimotor timing, with a particular focus on the involvement of this structure in specific vs. generalized learning. Suggestive of the former, it was found that HPC activity reflected time interval-specific improvements in performance while in support of the latter, HPC activity was also found to signal improvements in performance, which were not specific to the individual time intervals tested. Based on these findings, the authors suggest that the human HPC plays a key role in the statistical learning of temporal information as required in sensorimotor behaviour.

      By considering two established functions of the HPC (i.e., temporal memory and generalization) in the context of a domain that is not typically associated with this structure (i.e., sensorimotor timing), this study is potentially important, offering novel insight into the involvement of the HPC in everyday behaviour. There is much to like about this submission: the manuscript is clearly written and well-crafted, the paradigm and analyses are well thought out and creative, the methodology is generally sound, and the reported findings push us to consider HPC function from a fresh perspective. A relative weakness of the paper is that it is not entirely clear to what extent the data, at least as currently reported, reflects the involvement of the HPC in specific and generalized learning. Since the authors' conclusions centre around this observation, clarifying this issue is, in my opinion, of primary importance.

      We thank the reviewer for these positive and extremely helpful comments, which we will address in detail below. In response to these comments, the revised manuscript clarifies why the observed performance improvements are not at odds with the idea that an optimal trade-off between specificity and regularization is found, and how the time course of learning relates to those reported in previous literature. In addition, we conducted two new fMRI analyses, ensuring that our conclusions remain unchanged even if feedback is modeled with one parametric regressor, and if the number or nuisance regressors is reduced to control for overparameterization of the model. Please find our responses underneath each individual point below.

      1) Throughout the manuscript, the authors discuss the trade-off between specific and generalized learning, and point towards Figure S1D as evidence for this (i.e., participants with higher TTC accuracy exhibited a weaker regression effect). What appears to be slightly at odds with this, however, is the observation that the deviation from true TTC decreased with time (Fig S1F) as the regression line slope approached 0.5 (Fig S1E) - one would have perhaps expected the opposite i.e., for deviation from true TTC to increase as generalization increases. To gain further insight into this, it would be helpful to see the deviation from true TTC plotted for each of the four TTC intervals separately and as a signed percentage of the target TTC interval (i.e., (+) or (-) deviation) rather than the absolute value.

      We thank the reviewer for raising this important question and for the opportunity to elaborate on the relationship between the TTC error and the magnitude of the regression effect in behavior. Indeed, we see that the regression slopes approach 0.5 and that the TTC error decreases over the course of the experiment. We do not think that these two observations are at odds with each other for the following reasons:

      First, while the reviewer is correct in pointing out that the deviation from the TTC should increase as “generalization increases”, that is not what we found. It was not the magnitude of the regularization per se that increased over time, but the overall task performance became more optimal in the face of both objectives: specificity and generalization. This optimum is at a regression-line slope of 0.5. Generalization (or regularization how we refer to it in the present manuscript), therefore did not increase per se on group level.

      Second, the regression slopes approached 0.5 on the group-level, but the individual participants approached this level from different directions: Some of them started with a slope value close to 1 (high accuracy), whereas others started with a slope value close to 0 (near full regression to the mean). Irrespective of which slope value they started with, over time, they got closer to 0.5 (Rebuttal Figure 1A). This can also be seen in the fact that the group-level standard deviation in regression slopes becomes smaller over the course of the experiment (Rebuttal Figure 1B, SFig 1G). It is therefore not generally the case that the regression effect becomes stronger over time, but that it becomes more optimal for longer-term behavioral performance, which is then also reflected in an overall decrease in TTC error. Please see our response to the reviewer’s second comment for more discussion on this.

      Third, the development of task performance is a function of two behavioral factors: a) the accuracy and b) the precision in TTC estimation. Accuracy describes how similar the participant’s TTC estimates were to the true TTC, whereas precision describes how similar the participant’s TTC estimates were relative to each other (across trials). Our results are a reflection of the fact that participants became both more accurate over time on average, but also more precise. To demonstrate this point visually, we now plotted the Precision and the Accuracy for the 8 task segments below (Rebuttal Figure 1C, SFig 1H), showing that both measures increased as the time progressed and more trials were performed. This was the case for all target durations.

      In response to the reviewer’s comment, we clarified in the main text that these findings are not at odds with each other. Furthermore, we made clear that regularization per se did not increase over time on group level. We added additional supporting figures to the supplementary material to make this point. Note that in our view, these new analyses and changes more directly address the overall question the reviewer raised than the figure that was suggested, which is why we prioritized those in the manuscript.

      However, we appreciated the suggestion a lot and added the corresponding figure for the sake of completeness.

      Following additions were made.

      Page 5: In support of this, participants' regression slopes converged over time towards the optimal value of 0.5, i.e. the slope value between veridical performance and the grand mean (Fig. S1F; linear mixed-effects model with task segment as a predictor and participants as the error term, F(1) = 8.172, p = 0.005, ε2=0.08, CI: [0.01, 0.18]), and participants' slope values became more similar (Fig. S1G; linear regression with task segment as predictor, F(1) = 6.283, p = 0.046, ε2 = 0.43, CI: [0, 1]). Consequently, this also led to an improvement in task performance over time on group level (i.e. task accuracy and precision increased (Fig. S1I), and the relationship between accuracy and precision became stronger (Fig. S1H), linear mixed-effect model results for accuracy: F(1) = 15.127, p = 1.3x10-4, ε2=0.06, CI: [0.02, 0.11], precision: F(1) = 20.189, p = 6.1x10-5, ε2 = 0.32, CI: [0.13, 1]), accuracy-precision relationship: F(1) = 8.288, p =0.036, ε2 = 0.56, CI: [0, 1], see methods for model details).

      Page 12: This suggests that different regions encode distinct task regularities in parallel to form optimal sensorimotor representations to balance specificity and regularization. This is in line with our behavioral results, showing that TTC-task performance became more optimal in the face of both of these two objectives. Over time, behavioral responses clustered more closely between the diagonal and the average line in the behavioral response profile (Fig. 1B, S1G), and the TTC error decreased over time. While different participants approached these optimal performance levels from different directions, either starting with good performance or strong regularization, the group approached overall optimal performance levels over the course of the experiment.

      Page 15: We also corroborated this effect by measuring the dispersion of slope values between participants across task segments using a linear regression model with task segment as a predictor and the standard deviation of slope values across participants as the dependent variable (Fig. S1G). As a measure of behavioral performance, we computed two variables for each target-TTC level: sensorimotor timing accuracy, defined as the absolute difference in estimated and true TTC, and sensorimotor timing precision, defined as coefficient of variation (standard deviation of estimated TTCs divided by the average estimated TTC). To study the interaction between these two variables for each target TTC over time, we first normalized accuracy by the average estimated TTC in order to make both variables comparable. We then used a linear mixed-effects model with precision as the dependent variable, task segment and normalized accuracy as predictors and target TTC as the error term. In addition, we tested whether accuracy and precision increased over the course of the experiment using separate linear mixed-effects models with task segment as predictor and participants as the error term.

      2) Generalization relies on prior experience and can be relatively slow to develop as is the case with statistical learning. In Jazayeri and Shadlen (2010), for instance, learning a prior distribution of 11-time intervals demarcated by two briefly flashed cues (compared to 4 intervals associated with 24 possible movement trajectories in the current study) required ~500 trials. I find it somewhat surprising, therefore, that the regression line slope was already relatively close to 0.5 in the very first segment of the task. To what extent did the participants have exposure to the task and the target intervals prior to entering the scanner?

      We thank the reviewer for raising the important question about the time course of learning in our task and how our results relate to prior work on this issue. Addressing the specific reviewer question first, participants practiced the task for 2-3 minutes prior to scanning. During the practice, they were not specifically instructed to perform the task as well as they could nor to encode the intervals, but rather to familiarize themselves with the general experimental setup and to ask potential questions outside the MRI machine. While they might have indeed started encoding the prior distribution of intervals during the practice already, we have no way of knowing, and we expect the contribution of this practice on the time course of learning during scanning to be negligible (for the reasons outlined above).

      However, in addition to the specific question the reviewer asked, we feel that the comment raises two more general points: 1) How long does it take to learn the prior distribution of a set of intervals as a function of the number of intervals tested, and 2) Why are the learning slopes we report quite shallow already in the beginning of the scan?

      Regarding (1), we are not aware of published reports that answer this question directly, and we expect that this will depend on the task that is used. Regarding the comparison to Jazayeri & Shadlen (2010), we believe the learning time course is difficult to compare between our study and theirs. As the reviewer mentioned, our study featured only 4 intervals compared to 11 in their work, based on which we would expect much faster learning in our task than in theirs. We did indeed sample 24 movement directions, but these were irrelevant in terms of learning the interval distribution. Moreover, unlike Jazayeri & Shadlen (2010), our task featured moving stimuli, which may have added additional sensory, motor and proprioceptive information in our study which the participants of the prior study could not rely on.

      Regarding (2), and overlapping with the reviewer’s previous comment, the average learning slope in our study is indeed close to 0.5 already in the first task segment, but we would like to highlight that this is a group-level measure. The learning slopes of some subjects were closer to 1 (i.e. the diagonal in Fig 1B), and the one of others was closer to 0 (i.e. the mean) in the beginning of the experiment. The median slope was close to 0.65. Importantly, the slopes of most participants still approached 0.5 in the course of the experiment, and so did even the group-level slope the reviewer is referring to. This also means that participants’ slopes became more similar in the course of the experiment, and they approached 0.5, which we think reflects the optimal trade-off between regressing towards the mean and regressing towards the diagonal (in the data shown in Fig. 1B). This convergence onto the optimal trade-off value can be seen in many measures, including the mean slope (Rebuttal Figure 1A, SFig 1F), the standard deviation in slopes (Rebuttal Figure 1B, SFig 1G) as well as the Precision vs. Accuracy tradeoff (Rebuttal Figure 1C, SFig 1H). We therefore think that our results are well in line with prior literature, even though a direct comparison remains difficult due to differences in the task.

      In response to the reviewer’s comment, and related to their first comment, we made the following addition to the discussion section.

      Page 12: This suggests that different regions encode distinct task regularities in parallel to form optimal sensorimotor representations to balance specificity and regularization. This is well in line with our behavioral results, showing that TTC-task performance became more optimal in the face of both of these two objectives. Over time, behavioral responses clustered more closely between the diagonal and the average line in the behavioral response profile (Fig. 1B, S1G), and the TTC error decreased over time. While different participants approached these optimal performance levels from different directions, either starting with good performance or strong regularization, the group approached overall optimal performance levels over the course of the experiment.

      3) I am curious to know whether differences between high-accuracy andmedium-accuracy feedback as well as between medium-accuracy and low-accuracy feedback predicted hippocampal activity in the first GLM analysis (middle page 5). Currently, the authors only present the findings for the contrast between high-accuracy and low-accuracy feedback. Examining all feedback levels may provide additional insight into the nature of hippocampal involvement and is perhaps more consistent with the subsequent GLM analysis (bottom page 6) in which, according to my understanding, all improvements across subsequent trials were considered (i.e., from low-accuracy to medium-accuracy; medium-accuracy to high-accuracy; as well as low-accuracy to high-accuracy).

      We thank the reviewer for this thoughtful question, which relates to questions 5 by reviewer 1. The reviewer is correct that the contrast shown in Fig 2 does not consider the medium-accuracy feedback levels, and that the model in itself is slightly different from the one used in the subsequent analysis presented in Fig. 3. To reply to this comment as well as to a related one by reviewer 1 together, we therefore repeated the full analysis while modeling the three feedback levels in one parametric regressor, which includes the medium-accuracy feedback trials, and is consistent with the analysis shown in Fig. 3. The results of this new analysis are presented in the new Supplementary Fig. 3B.

      In short, the model included one parametric regressor with three levels reflecting the three types of feedback, and all nuisance regressors remained unchanged. Instead of contrasting high vs. low accuracy feedback, we then performed voxel-wise t-tests on the beta estimates obtained for the parametric feedback regressor. We found that our results presented initially were very robust: Both the observed clusters in the voxel-wise analysis (on whole-brain FWE-corrected levels) as well as the ROI results replicated across the two analyses, and our conclusions therefore remain unchanged.

      We made multiple textual additions to the manuscript to include this new analysis, and we present the results of the analysis including a direct comparison to our initial results in the new Supplementary Fig. 3. Following textual additions were.

      Page 5: Note that these results were robust even when fewer nuisance regressors were included to control for model over-specification (Fig. S3B; two-tailed one-sample t tests: anterior HPC, t(33) = -3.65, p = 8.9x10-4, pfwe = 0.002, d=-0.63, CI: [-1.01, -0.26]; posterior HPC, t(33) = -1.43, p = 0.161, pfwe = 0.322, d=-0.25, CI: [-0.59, 0.10]), and when all three feedback levels were modeled with one parametric regressors (Fig. S3C; two-tailed one-sample t tests: anterior HPC, t(33) = -3.59, p = 0.002, pfwe = 0.005, d=-0.56, CI: [-0.93, -0.20]; posterior HPC, t(33) = -0.99, p = 0.329, pfwe = 0.659, d=-0.17, CI: [-0.51, 0.17]). Further, there was no systematic relationship between subsequent trials on a behavioral level [...]

      Page 17: Moreover, instead of modeling the three feedback levels with three independent regressors, we repeated the analysis modeling the three feedback levels as one parametric regressor with three levels. All other regressors remained unchanged, and the model included the regressors for ITIs and ISIs. We then conducted t-tests implemented in SPM12 using thebeta estimates obtained for the parametric feedback regressor (Fig. S2C). Compared to the initial analyses presented above, this has the advantage that medium-accuracy feedback trials are considered for the statistics as well.

      4) The authors modeled the inter-trial intervals and periods of rest in their univariateGLMs. This approach of modelling all 'down time' can lead to model over-specification and inaccurate parameter estimation (e.g. Pernet, 2014). A comment on this approach as well as consideration of not modelling the inter-trial intervals would be useful.

      This is an important issue that we did not address in our initial manuscript. We are aware and agree with the reviewer’s general concern about model over-specification, which can be a big problem in regression as it leads to biased estimates. We did examine whether our model was overspecified before running it, but we did not report a formal test of it in the manuscript. We are grateful to be given the opportunity to do so now.

      In response to the reviewer’s comment, we repeated the full analysis shown in Fig. 2 while excluding the nuisance regressors for inter-trial intervals (ISI) and breaks (or inter-session intervals, ISI). All other regressors and analysis steps stayed unchanged relative to the one reported in Fig. 2. The new results are presented in a new Supplementary Figure 3B.

      Like for our previous analysis, we again see that the results we initially presented were extremely robust even on whole-brain FWE corrected levels, as well as on ROI level. Our conclusions therefore remain unchanged, and the results we presented initially are not affected by potential model overspecification. In addition to the new Supplementary Figure 3B, we made multiple textual changes to the manuscript to describe this new analysis and its implications. Note that we used the same nuisance regressors in all other GLM analyses too, meaning that it is also very unlikely that model overspecification affects any of the other results presented. We thank the reviewer for suggesting this analysis, and we feel including it in the manuscript has further strengthened the points we initially made.

      Following additions were made to the manuscript.

      Page 16: The GLM included three boxcar regressors modeling the feedback levels, one for ITIs, one for button presses and one for periods of rest (inter-session interval, ISI) [...]

      Page 16: ITIs and ISIs were modeled to reduce task-unrelated noise, but to ensure that this did not lead to over-specification of the above-described GLM, we repeated the full analysis without modeling the two. All other regressors including the main feedback regressors of interest remained unchanged, and we repeated both the voxel-wise and ROI-wise statistical tests as described above (Fig. S2B).

      Page 17: Note that these results were robust even when fewer nuisance regressors were included to control for model over-specification (Fig. S3B; two-tailed one-sample t tests: anterior HPC, t(33) = -3.65, p = 8.9x10-4, pfwe = 0.002, d=-0.63, CI: [-1.01, -0.26]; posterior HPC, t(33) = -1.43, p = 0.161, pfwe = 0.322, d=-0.25, CI: [-0.59, 0.10]), and when all three feedback levels were modeled with one parametric regressors (Fig. S3C; two-tailed one-sample t tests: anterior HPC, t(33) = -3.59, p = 0.002, pfwe = 0.005, d=-0.56, CI: [-0.93, -0.20]; posterior HPC, t(33) = -0.99, p = 0.329, pfwe = 0.659, d=-0.17, CI: [-0.51, 0.17]). Further, there was no systematic relationship between subsequent trials on a behavioral level [...]

      Reviewer #3 (Public Review):

      This paper reports the results of an interesting fMRI study examining the neural correlates of time estimation with an elegant design and a sensorimotor timing task. Results show that hippocampal activity and connectivity are modulated by performance on the task as well as the valence of the feedback provided. This study addresses a very important question in the field which relates to the function of the hippocampus in sensorimotor timing. However, a lack of clarity in the description of the MRI results (and associated methods) currently prevents the evaluation of the results and the interpretations made by the authors. Specifically, the model testing for timing-specific/timing-independent effects is questionable and needs to be clarified. In the current form, several conclusions appear to not be fully supported by the data.

      We thank the reviewer for pointing us to many methodological points that needed clarification. We apologize for the confusion about our methods, which we clarify in the revised manuscript. Please find our responses to the individual points below.

      Major points

      Some methodological points lack clarity which makes it difficult to evaluate the results and the interpretation of the data.

      We really appreciate the many constructive comments below. We feel that clarifying these points improved our manuscript immensely.

      1) It is unclear how the 3 levels of accuracy and feedback (high, medium, and lowperformance) were computed. Please provide the performance range used for this classification. Was this adjusted to the participants' performance?

      The formula that describes how the response window was computed for the different speed levels was reported in the methods section of the original manuscript on page 13. It reads as follows:

      “The following formula was used to scale the response window width: d ± ((k ∗ d)/2) where d is the target TTC and k is a constant proportional to 0.3 and 0.6 for high and medium accuracy, respectively.“

      In response to the reviewer’s comment, we now additionally report the exact ranges of the different response windows in a new Supplementary Table 1 and refer to it in the Methods section as follows.

      Page 10: To calibrate performance feedback across different TTC durations, the precise response window widths of each feedback level scaled with the speed of the fixation target (Table S1).

      2) The description of the MRI results lacks details. It is not always clear in the resultssection which models were used and whether parametric modulators were included or not in the model. This makes the results section difficult to follow. For example,

      a) Figure 2: According to the description in the text, it appears that panels A and B report the results of a model with 3 regressors, ie one for each accuracy/feedback level (high, medium, low) without parametric modulators included. However, the figure legend for panel B mentions a parametric modulator suggesting that feedback was modelled for each trial as a parametric modulator. The distinction between these 2 models must be clarified in the result section.

      We thank the reviewer very much for spotting this discrepancy. Indeed, Figure 2 shows the results obtained for a GLM in which we modeled the three feedback levels with separate regressors, not with one parametric regressor. Instead, the latter was the case for Figure 3. We apologize for the confusion and corrected the description in the figure caption, which now reads as follows. The description in the main text and the methods remain unchanged.

      Caption Fig. 2: We plot the beta estimates obtained for the contrast between high vs. low feedback.

      Moreover, note that in response to comment 5 by reviewer 1 and comment 3 by reviewer 2, the revised manuscript now additionally reports the results obtained for the parametric regressor in the new Supplementary Figure 3C. All conclusions remain unchanged.

      Additionally, it is unclear how Figure 2A supports the following statement: "Moreover, the voxel-wise analysis revealed similar feedback-related activity in the thalamus and the striatum (Fig. 2A), and in the hippocampus when the feedback of the current trial was modeled (Fig. S3)." This is confusing as Figure 2A reports an opposite pattern of results between the striatum/thalamus and the hippocampus. It appears that the statement highlighted above is supported by results from a model including current trial feedback as a parametric modulator (reported in Figure S3).

      We agree with the reviewer that our result description was confusing and changed it. It now reads as follows.

      Page 5: Moreover, the voxel-wise analysis revealed feedback-related activity also in the thalamus and the striatum (Fig. 2A) [...]

      Also, note that it is unclear from Figure 2A what is the direction of the contrast highlighting the hippocampal cluster (high vs. low according to the text but the figure shows negative values in the hippocampus and positive values in the thalamus). These discrepancies need to be addressed and the models used to support the statements made in the results sections need to be explicitly described.

      The description of the contrast is correct. Negative values indicate smaller errors and therefore better feedback, which is mentioned in the caption of Fig. 2 as follows:

      “Negative values indicate that smaller errors, and higher-accuracy feedback, led to stronger activity.”

      Note that the timing error determined the feedback, and that we predicted stronger updating and therefore stronger activity for larger errors (similar to a prediction error). We found the opposite. We mention the reasoning behind this analysis at various locations in the manuscript e.g. when talking about the connectivity analysis:

      “We reasoned that larger timing errors and therefore low-accuracy feedback would result in stronger updating compared to smaller timing errors and high-accuracy feedback”

      In response to the reviewer’s remark, we clarified this further by adding the following statement to the result section.

      Page 5: “Using a mass-univariate general linear model (GLM), we modeled the three feedback levels with one regressor each plus additional nuisance regressors (see methods for details). The three feedback levels (high, medium and low accuracy) corresponded to small, medium and large timing errors, respectively. We then contrasted the beta weights estimated for high-accuracy vs. low-accuracy feedback and examined the effects on group-level averaged across runs.”

      b) Connectivity analyses: It is also unclear here which model was used in the PPIanalyses presented in Figure 2. As it appears that the seed region was extracted from a high vs. low contrast (without modulators), the PPI should be built using the same model. I assume this was the case as the authors mentioned "These co-fluctuations were stronger when participants performed poorly in the previous trial and therefore when they received low-accuracy feedback." if this refers to low vs. high contrast. Please clarify.

      Yes, the PPI model was built using the same model. We clarified this in the methods section by adding the following statement to the PPI description.

      Page 17: “The PPI model was built using the same model that revealed the main effects used to define the HPC sphere “

      Yes, the reviewer is correct in thinking that the contrast shows the difference between low vs. high-accuracy feedback. We clarified this in the main text as well as in the caption of Fig. 2.

      Caption Fig 2: [...] We plot results of a psychophysiological interactions (PPI) analysis conducted using the hippocampal peak effects in (A) as a seed for low vs. high-accuracy feedback. [...]

      Page 17: The estimated beta weight corresponding to the interaction term was then tested against zero on the group-level using a t-test implemented in SPM12 (Fig. 2C). The contrast reflects the difference between low vs. high-accuracy feedback. This revealed brain areas whose activity was co-varying with the hippocampus seed ROI as a function of past-trial performance (n-1).

      c) It is unclear why the model testing TTC-specific / TTC-independent effects (resultspresented in Figure 3) used 2 parametric modulators (as opposed to building two separate models with a different modulator each). I wonder how the authors dealt with the orthogonalization between parametric modulators with such a model. In SPM, the orthogonalization of parametric modulators is based on the order of the modulators in the design matrix. In this case, parametric modulator #2 would be orthogonalized to the preceding modulator so that a contrast focusing on the parametric modulator #2 would highlight any modulation that is above and beyond that explained by modulator #1. In this case, modulation of brain activity that is TTC-specific would have to be above and beyond a modulation that is TTC-independent to be highlighted. I am unsure that this is what the authors wanted to test here (or whether this is how the MRI design was built). Importantly, this might bias the interpretation of their results as - by design - it is less likely to observe TTC-specific modulations in the hippocampus as there is significant TTC-independent modulation. In other words, switching the order of the modulators in the model (or building two separate models) might yield different results. This is an important point to address as this might challenge the TTC-specific/TTC-independent results described in the manuscript.

      We thank the reviewer for raising this important issue. When running the respective analysis, we made sure that the regressors were not collinear and we therefore did not expect substantial overlap in shared variance between them. However, we agree with the reviewer that orthogonalizing one regressor with respect to the other could still affect the results. To make sure that our expectations were indeed met, we therefore repeated the main analysis twice: 1) switching the order of the modulators and 2) turning orthogonalization off (which is possible in SPM12 unlike in previous versions). In all cases, our key results and conclusions remained unchanged, including the central results of the hippocampus analyses.

      Anterior (ant.) / Posterior (post.) Hippocampus ROI analysis with A) original order of modulators, B) switching the order of the modulators and C) turning orthogonalization of modulators off. ABC) Orange color corresponds to the TTC-independent condition whereas light-blue color corresponds to the TTC-specific condition. Statistics reflect p<0.05 at Bonferroni corrected levels () obtained using a group-level one-tailed one-sample t-test against zero; A) pfwe = 0.017, B) pfwe = 0.039, C) pfwe = 0.039.*

      Because orthogonalization did not affect the conclusions, the new manuscript simply reports the analysis for which it was turned off. Note that these new figures are extremely similar to the original figures we presented, which can be seen in the exemplary figure below showing our key results at a liberal threshold for transparency. In addition, we clarified that orthogonalization was turned off in the methods section as follows.

      Page 18: These two regressors reflect the tests for target-TTC-independent and target-TTC-specific updating, respectively, and they were not orthogonalized to each other.

      Comparison of old & new results: also see Fig. 3 and Fig. S5 in manuscript

      d) It is also unclear how the behavioral improvement was coded/classified "wecontrasted trials in which participants had improved versus the ones in which they had not improved or got worse"- It appears that improvement computation was based on the change of feedback valence (between high, medium and low). It is unclear why performance wasn't used instead? This would provide a finer-grained modulation?

      We thank the reviewer for the opportunity to clarify this important point. First, we chose to model feedback because it is the feedback that determines whether participants update their “internal model” or not. Without feedback, they would not know how well they performed, and we would not expect to find activity related to sensorimotor updating. Second, behavioral performance and received feedback are tightly correlated, because the former determines the latter. We therefore do not expect to see major differences in results obtained between the two. Third, we did in fact model both feedback and performance in two independent GLMs, even though the way the results were reported in the initial submission made it difficult to compare the two.

      Figure 4 shows the results obtained when modeling behavioral performance in the current trial as an F-contrast, and Supplementary Fig 4 shows the results when modeling the feedback received in the current trial as a t-contrast. While the voxel-wise t-maps/F-maps are also quite similar, we now additionally report the t-contrast for the behavioral-performance GLM in a new Supplementary Figure 4C. The t-maps obtained for these two different analyses are extremely similar, confirming that the direction of the effects as well as their interpretation remain independent of whether feedback or performance is modeled.

      The revised manuscript refers to the new Supplementary Figure 4C as follows.

      Page 17: In two independent GLMs, we analyzed the time courses of all voxels in the brain as a function of behavioral performance (i.e. TTC error) in each trial, and as a function of feedback received at the end of each trial. The models included one mean-centered parametric regressor per run, modeling either the TTC error or the three feedback levels in each trial, respectively. Note that the feedback itself was a function of TTC error in each trial [...] We estimated weights for all regressors and conducted a t-test against zero using SPM12 for our feedback and performance regressors of interest on the group level (Fig. S4A). [...]

      Page 17: In addition to the voxel-wise whole-brain analyses described above, we conducted independent ROI analyses for the anterior and posterior sections of the hippocampus (Fig. S2A). Here, we tested the beta estimates obtained in our first-level analysis for the feedback and performance regressors of interest (Fig. S4B; two-tailed one-sample t tests: anterior HPC, t(33) = -5.92, p = 1.2x10-6, pfwe = 2.4x10-6, d=-1.02, CI: [-1.45, -0.6]; posterior HPC, t(33) = -4.07, p = 2.7x10-4, pfwe = 5.4x10-4, d=-0.7, CI: [-1.09, -0.32]). See section "Regions of interest definition and analysis" for more details.

      If the feedback valence was used to classify trials as improved or not, how was this modelled (one regressor for improved, one for no improvement? As opposed to a parametric modulator with performance improvement?).

      We apologize for the lack of clarity regarding our regressor design. In response to this comment, we adapted the corresponding paragraph in the methods to express more clearly that improvement trials and no-improvement trials were modeled with two separate parametric regressors - in line with the reviewer’s understanding. The new paragraph reads as follows.

      Page 18: One regressor modeled the main effect of the trial and two parametric regressors modeled the following contrasts: Parametric regressor 1: trials in which behavioral performance improved \textit{vs}. parametric regressor 2: trials in which behavioral performance did not improve or got worse relative to the previous trial.

      Last, it is also unclear how ITI was modelled as a regressor. Did the authors mean a parametric modulator here? Some clarification on the events modelled would also be helpful. What was the onset of a trial in the MRI design? The start of the trial? Then end? The onset of the prediction time?

      The Inter-trial intervals (ITIs) were modeled as a boxcar regressor convolved with the hemodynamic response function. They describe the time after the feedback-phase offset and the subsequent trial onset. Moreover, the start of the trial was the moment when the visual-tracking target started moving after the ITI, whereas the trial end was the offset of the feedback phase (i.e. the moment in which the feedback disappeared from the screen). The onset of the “prediction time” was the moment in which the visual-tracking target stopped moving, prompting participants to estimate the time-to-contact. We now explain this more clearly in the methods as shown below.

      Page 16: The GLM included three boxcar regressors modeling the feedback levels, one for ITIs, one for button presses and one for periods of rest (inter-session interval, ISI), which were all convolved with the canonical hemodynamic response function of SPM12. The start of the trial was considered as the trial onsets for modeling (i.e. the time when the visual-tracking target started moving). The trial end was the offset of the feedback phase (i.e. the moment in which the feedback disappeared from the screen). The ITI was the time between the offset of the feedback-phase and the subsequent trial onset.

      On a related note, in response to question 4 by reviewer 2, we now repeated one of the main analyses (Fig. 2) without modeling the ITI (as well as the Inter-session interval, ISI). We found that our key results and conclusions are independent of whether or not these time points were modeled. These new results are presented in the new Supplementary Figure 3B.

      Page 16: ITIs and ISIs were modeled to reduce task-unrelated noise, but to ensure that this did not lead to over-specification of the above-described GLM, we repeated the full analysis without modeling the two. [...]

      1. Perhaps as a result of a lack of clarity in the result section and the MRI methods, it appears that some conclusions presented in the result section are not supported by the data. E.g. "Instead, these results are consistent with the notion that hippocampal activity signals the updating of task-relevant sensorimotor representations in real-time." The data show that hippocampal activity is higher during and after an accurate trial. This pattern of results could be attributed to various processes such as e.g. reward or learning etc. I would recommend not providing such interpretations in the result section and addressing these points in the discussion.

      Similar to above, statements like "These results suggest that the hippocampus updates information that is independent of the target TTC". The data show that higher hippocampal activity is linked to greater improvement across trials independent of the timing of the trial. The point about updating is rather speculative and should be presented in the discussion instead of the result section.

      The reviewer is referring to two statements in the results section that reflect our interpretation rather than a description of the results. In response to the reviewer’s comment, we therefore removed the following statement from the results.

      Instead, these results are consistent with the notion that hippocampal activity signals the updating of task-relevant sensorimotor representations in real-time.

      In addition, we replaced the remaining statement by the following. We feel this new statement makes clear why we conducted the analysis that is described without offering an interpretation of the results that were presented before.

      Page 8: We reasoned that updating TTC-independent information may support generalization performance by means of regularizing the encoded intervals based on the temporal context in which they were encoded.

    1. Author Response

      Reviewer #1 (Public Review):

      7) Can the primary cells in Figure 2E and AML#1 and AML#2 be studied for mTORC1 activity by Western, as in 2D?

      For reasons that we do not understand, we have been unable to effectively culture primary FLT3-ITD AMLs, despite being able to culture most other AMLs for weeks. This issue has prevented us from being able to perform biochemical analyses of FLT3-ITD AMLs in response to FLT3 inhibition.

      8) Additional genetic information should be provided if possible for the primary AML cells - what other mutations in addition to FLT3 were present? Were there any mTOR pathway alterations?

      We provided the other mutations of AML#1 sample (NPM1 mutation) in the section METHODS-Therapeutic modeling in mice, as well as Figure legends 2E and 3D. There were no evident alterations in the mTOR pathway (beyond the FLT3-ITD mutation).

    1. Author Response:

      We thank the reviewers for their thoughtful critiques and helpful suggestions for how to improve our manuscript. Described below is our response clarifying a number of issues raised by the reviewers.

      We agree with the reviewer that we cannot definitively conclude that the first division chromosome segregation defects and the later mid-blastula transition CI-induced defects are the result of distinct mechanisms. In fact, we raise this possibility in the discussion. However, our finding that the CI phenotype induces a temporally and developmentally deferred chromosome segregation defects in the late blastoderm divisions (in addition to the well-studied first division defect) alters the established view of the CI phenotype and must be taken into account when considering mechanisms of CI. Our current view is that the distinct early and late defects could be caused by either 1) a common mechanism (possibly a chromosome mark/defect inherited through the early blastoderm divisions causing segregation defects in the late blastoderm divisions) or 2) distinct early and late mechanisms that do not strictly “depend” upon one another. We have clarified this point in the revised manuscript.

      We disagree with the reviewer that this result is to be expected given previous studies. In D. simulans, a small percentage of embryos derived from the CI cross hatch. These embryos are thought to have bypassed the first division defect. It is not obvious why there must be late defects in these embryos that “escape” early CI-induced defects and subsequently hatch. Previous studies interpreted embryos that exhibit late division errors as those that have lost their entire paternal complement of chromosomes as a result of strong CI-induced defects during the first mitotic division and develop as maternal haploids. These studies, including transgene- induced CI, have focused primarily on embryos that have undergone the first mitotic division embryonic defects. To the best of our knowledge, no group has thoroughly examined embryos that progress normally through the pre-cortical cycles 2-9 as performed in this manuscript. Thus, it was entirely unexpected that these embryos would exhibit the mitotic defects during the late blastoderm divisions and the MBT. We discuss how this finding requires modified current models for the mechanisms of CI.

      Regarding the comment that “the primary claim of the paper that later-stage embryos die for different reasons than early-stage embryos,” we make no such claim. In fact, we provide evidence that the failure to hatch (late embryonic lethality) is, at least in part, due to haploid development—a direct result of the first division CI defect. The focus of our studies are those CI-derived embryos that progress normally, maintain the normal complement of chromosomes through the first division, and exhibit chromosome segregation errors during the late blastoderm divisions. We do not know the fate of these embryos, and previous studies have demonstrated that embryos suffering extensive late blastoderm segregation errors are able to hatch (Sullivan, 1990, Development 110:311-323). We have clarified these points in the discussion.

      While we agree that transgenic tools have proven invaluable in the study of CI, they are not appropriate for these studies. The purpose of our study was to undertake an unbiased re-examination of the CI phenotype. Of necessity, the transgenic studies rely on exogenous host promoters rather than the natural endogenous Wolbachia/Prophage promoters. Thus, while informative, it is unlikely the that the transgenic alleles would capture all of the complexities and nuance of the CI phenotype. In addition, the transgenic studies, of which we are aware, have only interrogated a single pair of the CI-inducing genes, while the Wolbachia genome contains both Cid and Cin CI-associated gene pairs and possibly other yet-to-be-identified CI/Rescue genes.

      Our unbiased re-examination of the CI phenotype induced by W. riverside in D. simulans identified a previously unsuspected temporally and developmentally distinct set of CI-induced defects that occur during and after the mid-blastula transition. This finding must be taken into account when considering the mechanisms that cause CI. In our revisions, we clarify the above points and qualify our statements to appropriately interpret our results in context of the nuances and uncertainties of CI and early Drosophila embryogenesis.

    1. Author Response:

      Reviewer #3 (Public Review):

      The authors revealed the novel role of the DLL-4-Notch1-NICD signaling axis played in platelet activation, aggregation, and thrombus formation. They firstly confirmed the expression of Notch1 and DLL-4 in human platelets and demonstrated both Notch1 and DLL-4 were upregulated in response to thrombin stimulation. Further, they confirmed the exposure of human platelets with DLL-4 would lead to γ-secretase mediated NICD (a calpain substrate) release. Stimulating platelets with DLL-4 alone triggered platelet activation measured by integrin αIIbβ3 activation, P-selectin translocation, granule release, enhanced platelet-neutrophil and platelet-monocyte interactions, intracellular calcium mobilization, PEVs release, phosphorylation of cytosolic proteins, and PI3K and PKC activation. In addition, Susheel N. Chaurasia et al. showed that when platelets were stimulated with DLL-4 and low-dose thrombin, the Notch1 signaling can operate in a juxtacrine manner to potentiate low dose thrombin mediate platelet activation. When the DLL-4-Notch1-NICD signaling axis was blocked by γ-secretase inhibitors, the platelets responding to stimulation were attenuated, and the arterial thrombosis in mice was impaired.

      This study by Susheel N. Chaurasia et al. was carefully designed and used multiple approaches to test their hypothesis. Their research raised the potential of targeting the DLL-4-Notch1-NICD signaling axis for anti-platelet and anti-thrombotic therapies. Interestingly, compared to thrombin, a potent physiological platelet agonist, the signaling cascade triggered by DLL-4 was relatively weak. Given that the upregulation of DLL-4 and Notch1 happened in response to thrombin stimulation, how much DLL-4 mediated signaling could contribute to in vivo platelet activation in the presence of thrombin is questionable. This could potentially limit the application of targeting Notch1 as an anti-thrombotic therapy. Further, the authors showed that Notch1 signaling could operate in a juxtacrine manner to potentiate low dose thrombin mediate platelet activation, which means the DLL-4 mediated platelet signaling can act as an accelerator of early-stage hemostasis. Again, inhibition of Notch1 may slow down the hemostasis process. But given the fact that there are other platelet agonists (ADP, collagen...) existing simultaneously, blocking Notch1 signaling may not have a strong anti-platelet effect.

      We concur with the Public Reviewer that, further study is needed to delineate extent of contribution of DLL-4 signaling in thrombin-activated platelets. However, it is now amply clear that Notch signaling plays a central role in development of thrombinactivated phenotype of platelets. Further, DLL-4-Notch1 interaction on surfaces of adjacent platelets within the thrombus reinforces platelet-platelet aggregate formation. This is further reflected from significant inhibition of thrombus formation in vivo in presence of DAPT in a mouse model of intravital thrombosis. Given that there is a lot of redundancy in stimulation of platelets employing different physiological agonists (ADP, collagen, thrombin etc.), none of the present-day drugs is fully capable of effective platelet inhibition due to parallel signaling pathways. Thus, discovery of Notch signaling and its seminal role in platelet activation could explain redundancy associated with anti-platelet drugs, and Notch inhibition could complement with existing anti-platelet regimen in evoking effective and complete platelet inhibition.