1,295 Matching Annotations
1. Jun 2021
2. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

In this work, Panigrahi et. al. develop a powerful deep-learning-based cell segmentation platform (MiSiC) capable of accurately segmenting bacteria cells densely packed within both homogenous and heterogeneous cell populations. Notably, MiSiC can be easily implemented by a researcher without the need for high-computational power. The authors first demonstrate MiSiC's ability to accurately segment cells with a variety of shapes including rods, crescents and long filaments. They then demonstrate that MiSiC is able to segment and classify dividing and non-dividing Myxococcus cells present in a heterogenous population of E. coli and Myxococcus. Lastly, the authors outline a training workflow with which MiSiC can be trained to identify two different cell types present in a mixed population using Myxococcus and E. coli as examples.

While we believe that MiSiC is a very powerful and exciting tool that will have a large impact on the bacterial cell biological community, we feel explanations of how to use the algorithm should be more greatly emphasized. To help other scientists use MiSiC to its fullest potential, the range of applications should be clarified. Furthermore, any inherent biases in MiSiC should be discussed so that users can avoid them.

We thank the reviewer for the positive feedback and comments to help disseminate MiSiC to the broad bacterial cell biology community as it is meant to. As described above we have largely addressed this comment via the redaction of a comprehensive handbook. As detailed below, we now also provide precise measurements of the MiSiC segmentation accuracy compared to ground truth for the various imaging modalities and bacterial species segmentation.

Major Concerns:

1) It is unclear to us how a MiSiC user should choose/tune the value for the noise variance parameter. What exactly should be considered when choosing the noise variance parameter? Some possibilities include input image size, cell size (in pixels), cell density, and variance in cell size. Is there a recommended range for the parameter? These questions along with our second minor correction can be addressed with a paragraph in the Discussion section.

Setting the noise parameters is now detailed in the handbook (section 1.d). A set of thumb rules and recommendations are provided. In addition a paragraph explaining the importance of noise addition for images with sparse bacterial cell density has been added in the results section.

“Associated Figure S1. Background noise can lead to spurious cell detection by MiSiC. SI images retain the shape/curvature information of the intensities in a raw image through eigenvalues of the hessian of the image and an arctan function, creating the smooth areas corresponding to cell bodies and propagating noisy regions where there is no shape information. Thus, MiSiC segments the cells by discriminating between “smooth” and “rough” regions. In effect, when adjusting the size parameter, scaling smooths out the image noise, leading to background regions that have a smoother SI than in the raw image. Some of these areas could be falsely detected as bacterial cells. This effect is shown here: When an image with uniform and random intensity values is segmented with MiSiC with increasing smoothening (here using a gaussian blur filter), spurious cell detection becomes apparent. In addition, since the SI keeps the shape information and not the intensity values, background objects that are of relatively low contrast (ie dead cells or debris) may be detected as cells. All these artifacts can be mitigated by adding synthetic noise to the scaled images.”

2) Could the authors expand on using algorithms like watershed, conditional random fields, or snake segmentation to segment bacteria when there is not enough edge information to properly separate them? How accurate are these methods at segmenting the cells? Should other MiSiC parameters be tuned to increase the accuracy when implementing these methods?

We thank the reviewer for raising this point as it is important to make clear that post-processing algorithms can certainly improve the accuracy of MiSiC masks downstream. To show this specifically, we further processed MiSiC masks of Bacillus subtilis filamentous cells to resolve division septa using the watershed algorithm. This example is now provided as Figure S3. Importantly, there is no particular MiSiC adjustment that needs to be performed prior to running these processing steps, which can be done directly in Image-J or its bacterial cell analysis plug-in, MicrobeJ. It is worth noting that the post- processing strategy may depend on the scientific question under consideration. In the handbook, we also give an example of post-processing methods that may be used.

“Associated Figure S3. Refining cell separations with watershed. Watershed methods may be used to obtain a more accurate segmentation of septate filaments such as Bacillus subtilis. In this example applying this method to the MiSiC mask effectively resolves cell boundaries that are not captured in the prediction but are visible by eye (arrows).”

3) Can the MiSiC's ability to accurately segment phase and brightfield images be quantitatively compared against each other and against fluorescent images for overall accuracy? A figure similar to Fig. 2C, with the three image modalities instead of species would nicely complement Fig. 2A. If the segmentation accuracy varies significantly between image modalities, a researcher might want to consider the segmentation accuracy when planning their experiments. If the accuracy does not vary significantly, that would be equally useful to know.

This is a very important issue that was also raised by reviewer 3 and which we decided to address in full. For each imaging modality and distinct species, we measured the Jaccard Index as a function of the threshold set for the Intersection over Union (ioU). The resulting curves are now provided in two separate Figures 2 and 3 and a supplemental Figure S2; they provide a robust measure of the segmentation for each modality/tested species.

“Figure 2. MiSiC predictions under various imaging modalities. a) MiSiC masks and corresponding annotated masks of fluorescence, phase contrast and bright field images of a dense E. coli microcolony. b) Jaccard index as a function of IoU threshold for each modality determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained Jaccard score curves are the average of analyses conducted over three biological replicates and n=763, 811, 799 total cells for Fluorescence, Phase Contrast and Bright Field, respectively (bands are the maximum range, the solid line is the median). The fluorescence images were pre-processed using a Gaussian of Laplacian filter to improve MiSiC prediction (see methods).”

“Associated Figure S2. MiSiC predictions under various imaging modalities. a) MiSiC masks and corresponding annotated masks of fluorescence, phase contrast and bright field images of a dense M. xanthus microcolony. b) Jaccard index as a function of IoU threshold for each modality determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained curves are the average of analyses conducted over three biological replicates and n=193,206,211 total cells for Fluorescence, Phase Contrast and Bright Field, respectively. The fluorescence (bands are the maximum range, the solid line is the median) images were pre-processed using a Gaussian of Laplacian filter to improve MiSiC prediction (see methods). c) A human observer is slightly less performant than MiSiC. The same ground truth as used in Figure 2 (dashed lines) was compared to an independent observer’s annotation (solid lines) and Jaccard score curves were constructed as shown in Figure 2. BF: Bright Field, PC: Phase Contrast, Fluo: Fluorescence.”

“Figure 3. MiSiC predictions in various bacterial species and shapes. a) MiSiC masks and corresponding annotated masks of phase contrast images of another Pseudomonas aeruginosa (rod-shape), Caulobacter crescentus (crescent shape) and Bacillus subtilis (filamentous shape). b) Jaccard index as a function of IoU threshold for each species determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained Jaccard score curves are the average of analyses conducted over three biological replicates and n=1149,101,216 total cells for P. aeruginosa, B. subtilis and C. crescentus, respectively (bands are the maximum range, solid line the median). Note that the B. subtilis filaments are well predicted but edge information is missing for optimal detection of the cell separations.”

4) The ability of MiSiC to segment dense clusters of cells is an exciting advancement for cell segmentation algorithms. However, is there a minimum cell density required for robust segmentation with MiSiC? The algorithm should be applied to a set of sparsely populated images in a supplemental figure. Is the algorithm less accurate for sparse images (perhaps reflected by an increase in false-positive cell identifications)? Any possible biases related to cell density should be noted.

In fact, MiSiC performs well both with densely or sparsely populated images. In the case of sparsely populated images it is however possible that non-cell objects can occasionally appear in the MiSiC mask. As mentioned above, inclusion of noise can help remove these objects in the sparsely populated images. This issue is now fully explained in a supplemental Figure S1. Of note, non-cell objects -if they were to remain after noise addition- can be eliminated using additional general morphometric filters or specific models fitting bacterial cells, as for example those included in Microbe-J and Oufti. These points are now clarified in the text.

“Associated Figure S1. Background noise can lead to spurious cell detection by MiSiC. SI images retain the shape/curvature information of the intensities in a raw image through eigenvalues of the hessian of the image and an arctan function, creating the smooth areas corresponding to cell bodies and propagating noisy regions where there is no shape information. Thus, MiSiC segments the cells by discriminating between “smooth” and “rough” regions. In effect, when adjusting the size parameter, scaling smooths out the image noise, leading to background regions that have a smoother SI than in the raw image. Some of these areas could be falsely detected as bacterial cells. This effect is shown here: When an image with uniform and random intensity values is segmented with MiSiC with increasing smoothening (here using a gaussian blur filter), spurious cell detection becomes apparent. In addition, since the SI keeps the shape information and not the intensity values, background objects that are of relatively low contrast (ie dead cells or debris) may be detected as cells. All these artifacts can be mitigated by adding synthetic noise to the scaled images.”

and:

“Along similar lines, non-cell objects can appear in the MiSiC masks and while some can be removed by the introduction of noise, an easy way to do it is to apply a post-processing filter, for example using morphometric parameters to remove objects that are not bacteria. This can be easily done using Fiji, MicrobeJ or Oufti."

5) It is exciting to see the ability of MiSiC to segment single cells of M. xanthus and E. coli species in densely packed colonies (Fig. 4b). Although three morphological parameters after segmentation were compared with ground truth, the comparison was conducted at the ensemble level (Fig. 4c). Could the authors use the Mx-GFP and Ec-mCherry fluorescence as a ground truth at the single cell level to verify the results of segmentation? For example, for any Ec cells identified by MiSiC in Fig. 4b, provide an index of whether its fluorescence is red or green. This single-cell level comparison is most important for the community.

We have now performed this comparison and determined Jaccard indexes for E. coli and Myxococcus detection using the individual fluorescence images as a reference (figure 5b). Since we were only able to make this comparison in relatively small fields we also kept the comparison of expected morphometric parameters in large images. Taken together, these data now demonstrate that semantic classification as performed does well separate Myxococcus cells from E. coli cells (see more details in our response to reviewer 3).

Reviewer #2 (Public Review):

Panigrahi and co-authors introduce a program that can segment a variety of images of rod-shaped bacteria (with somewhat different sizes and imaging modalities) without fine-tuning. Such a program will have a large impact on any project requiring segmentation of a large number of rod-shaped cells, including the large images demonstrated in this manuscript. To my knowledge, training a U-Net to classify an image from the image's shape index maps (SIM) is a new scheme, and the authors show that it performs fairly well despite a small training set including synthetic data that, based on Figure 1, does not closely resemble experimental data other than in shape. The authors discuss extending the method to objects with other shapes and provide an example of labelling two different species - these extensions are particularly promising.

The authors show that their network can reproduce results of manual segmentation with bright field, phase and fluorescence input. Performance on fluorescence data in Fig. 1 where intensities vary so much is particularly good and shows benefits of the SIM transformation. Automated mapping of FtsZ show that this method can be immediately useful, though the authors note this required post-processing to remove objects with abnormal shapes. The application in mixed samples in Fig. 4 shows good performance. However, no Python workflow or application is provided to reproduce it or train a network to classify mixtures in different experiments.

We thank the reviewer for the positive comment. As discussed in our answer to reviewer 1, the classification presented in Figure 4 (now Figure 5) is meant to provide an example of how MiSiC can be further used to train networks to classify species in interspecies communities by generating two datasets, one per species of interest, to further train a U-Net. Here, the secondary U-Net was developed to specifically discriminate Myxococcus from E. coli, which is a very specialized application. Hence it was not included in the MiSiC package. Nevertheless the code is accessible at https://github.com/pswapnesh/MyxoColi (which is mentioned in the Methods).

Performance was compared between SuperSegger with default parameters and MiSiC with tuned parameters for a single data set. Perhaps other SuperSegger parameters would perform better with the addition of noise, and it's unclear that adding Gaussian noise to a phase contrast image is the best way to benchmark performance. An interesting comparison would be between MiSiC and other methods applying neural networks to unprocessed data such as DeepCell and DeLTA, with identical training/test sets and an attempt to optimize free parameters.

In fact, we believe that it does make sense to test how MiSiC performs in the presence of noise and show that it is robust, making it suitable for use on complex multi-tile images. For this analysis we kept the comparison with Superseger, which provides a reference as it is done on a data set optimized for Superseger segmentation. Importantly, we keep the parameters constant throughout the analysis because it would not be feasible to tweek parameters tile-by-tile in a multi-tile image. This analysis shows that MiSiC is more adapted for this application.

INSTALLATION: I installed both the command line and GUI versions of MiSiC on a Windows PC in a conda environment following provided instructions. Installation was straightforward for both. MiSiCgui gave one error and required reinstallation of NumPy as described on GitHub. Both give an error regarding AVX2 instructions. MiSiCgui gives a runtime error and does not close properly. These are all fairly small issues. Performance on a stack of images was sufficiently fast for many applications and could be sped up with a GPU implementation.

We have updated the pip install script available in GitHub for MiSiCgui that remediates some of these issues : There is no more numpy error, it closes properly and there are only warning messages concerning future deprecations in the napari packages. We have tested in Windows 10, Linux Ubuntu 18, and Mac OS Catalina. For the moment it seems impossible to install in Mac OS BigSur maybe due to the python 3.7 requirement. We will work on this problem in the near future. We have removed the command line interface as we are developing future version with an easiest way to provide MiSiC as Napari or FIJI/ImageJ plugin

TESTING: I tested the programs using brightfield data focused at a different plane than data presumably used to train the MiSiC network, so cells are dark on a light background and I used the phase option which inverts the image. With default settings and a reasonable cell width parameter (10 pixels for E. coli cells with 100-nm pixel width; no added noise since this image requires no rescaling) MiSiCgui returned an 8-bit mask that can be thresholded to give segmentation acceptable for some applications. There are some straight-line artifacts that presumably arise from image tiling, and the quality of segmentation is lower than I can achieve with methods tuned to or trained on my data. Tweaking magnification and added noise settings improved the results slightly. The MiSiC command line program output an unusable image with many small, non-cell objects. Looking briefly at the code, it appears that preprocessing differs and it uses a fixed threshold.

We thank the reviewer for testing the programs. Tiling related artifacts may now be avoided by excluding a few pixels at the border in the new version of MiSiC code. This is now implemented in the MiSiC.segment function as segment(im,invert = False,exclude = 16). Without seeing the reviewers data it is difficult for us to see how the segmentation (which is said to be acceptable) could be further improved. The command line program has now been removed in favor of continuous development on the graphical interface.

Reviewer #3 (Public Review):

The authors aimed to develop a 2D image analysis workflow that performs bacterial cell segmentation in densely crowded colonies, for brightfield, fluorescence, and phase contrast images. The resulting workflow achieves this aim and is termed "MiSiC" by the authors.

I think this tool achieves high-quality single-cell segmentations in dense bacterial colonies for rod-shaped bacteria, based on inspection of the examples that are shown. However, without a quantification of the segmentation accuracy (e.g. Jaccard coefficient vs. intersection over union, false positive detection, false negative detection, etc), it is difficult to pass a final judgement on the quality of the segmentation that is achieved by MiSiC.

We thank the reviewer for this comment. To address it we divided the previous Figure 2 into two figures (and associated supplemental figures) separately showing how MiSiC performs (i), to segment two very distinct bacterial species E. coli and Myxococcus under various imaging modalities. (ii) to segment other bacterial species: rods (P. aeruginosa), filaments (B. subtilis) and crescent shapes (C. crescentus). The results now clearly show both the strength and limitations of the system.

A particular strength of the MiSiC workflow arises from the image preprocessing into the "Shape Index Map" images (before the neural network analysis). These shape index maps are similar for images that are obtained by phase contrast, brightfield, and fluorescence microscopy. Therefore, the neural network trained with shape index maps can apparently be used to analyze images acquired with at least the above three imaging modalities. It would be important for the authors to unambiguously state whether really only a single network is used for all three types of image input, and whether MiSiC would perform better if three separate networks would be trained.

A single network is using a shape-index-map rather than the original images as an input. As mentioned by the reviewer this is a major strength of the workflow given that it permits segmentation, independent of the imaging modality, which we now measure for each modality.

As the reviewer hints, three different models specific to each modality (CP, Fluorescence and BF) could also be used to train three networks, allowing the direct end-to-end segmentation of raw images. In theory, this could improve the segmentation (although this might lead to negligible benefits given the actual segmentation quality).

#### URL

3. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

The study by Diebold et al. describes a fast and scalable method that allows to link bacterial plasmids to the organisms that harbor them. The authors then go on to apply this technique to track horizontal gene transfer in an complex bacterial population originating from clinical samples. There is no doubt that the development of such methodologies for better tracking plasmidic resistance genes and following horizontal gene transfer events is very important. The authors do a good job in optimizing their method to be a one step process that has high sensitivity and relatively low error, while it can also be scaled, automated and used with multiplex primers. Subsequently, they apply this method to two clinical patient samples for which metagenomic data is available. In this case, they correctly identify expected relationships between beta-lactamase genes and specific bacterial taxa (and in particular K. pneumoniae), but also find that the same beta-lactamase genes are associated with organisms of the microbiome. With the exception of providing evidence that the association of particular genes with multiple organisms is not due to physical association of the bacteria in question, this is an interesting study putting forward a much needed technique for the study of antibiotic resistance but also other relationships in complex bacterial mixtures.

We are very thankful for the positive review and the reviewer’s suggestion that we distinguish between gene transfer and physical association. We provide a detailed response to this in major point #1 of the review summary, but to summarize, we performed an OIL-PCR experiment to confirm that the results are indeed due to physical association of the bacteria and updated our manuscript accordingly.

Reviewer #2 (Public Review):

Diebold et al. developed a simplified and improved version of the epicPCR method applied to environmental samples. The results section describes well how they perform their development and support the easy to use application. They clearly demonstrate that their methods could be used to screen association of specific genes to taxonomic markers in environmental microbial populations. They then apply their methods on human gut samples ranging from hospitalized patients and demonstrate demonstrate the utility of their methods to characterize the hosts of different targeted genes (notably AMR and plasmid related genes). However, most of their results are based on previous studies on the same sample. Therefore, it appears difficult to know how their method can be used on new samples. Do they need to redo a classical metagenomic analysis in order to obtain data on new samples ? What kind of metagenomic analysis is mandatory before performing their methods ? What is the depth of the metagenomic analysis ? Those are important questions as it will be clearly more expensive to perform the whole metagenomic analysis.

Thank you for pointing out the need to explain possible screening methods for OIL-PCR on unsequenced samples. We chose to use sequenced stool samples for testing the method in order to provide parallel validation of our results; however, we agree that metagenomic sequencing is not a practical or cost-effective way to select samples for OIL-PCR. qPCR is a more practical method to pre-screen samples for target genes before performing OIL, but we failed to include this important point in our discussion.

Since drafting and submitting the manuscript, we have demonstrated that the three primers designed for OIL (forward, fusion, and nested primers) can easily be converted into probe- based qPCR assays by designing a fluorescent probe with the nested primer sequence. We have updated the discussion to convey this important feature of OIL-PCR.

The conclusion of the paper is well supported by data but the overall approach on new sample is never discussed. Moreover, the title appear somehow misleading as their methods do not allow to clearly identify plasmids but rather to link some targeted genes to taxonomic markers.

Reviewer #3 (Public Review):

This manuscript is composed of two parts. The first part describes development of an emulsion-based PCR fusion method, called OIL-PCR, for matching two specific gene sequences from the same cell. In this report these are beta-lactamase genes from the V4 section of rRNA, allowing the matching of this horizontally transferred gene with its donor sequence. The second part is a demonstration project that features the use of OIL-PCR to monitor horizontal transfer of beta-lactam genes between gut bacteria from the metagenomes of two neutropenic patients. OIL-PCR was set to multiplexed class A beta-lactam genes. This is a descriptive study that largely recapitulates a previously published work on these samples showing that the relatively unstudied Romboutsia commensal genus is a carrier of these plasmid-borne genes in patient metagenomes.

Overall, this is a well-written manuscript. Data were comprehensively analyzed with appropriate controls. The figures are excellent.

OIL-PCR is a derived of other fusion PCR methods, especially epicPCR. There are some nice technical improvements described here, e.g efficient lysis within emulsion droplets using Ready-Lyse lysozyme. This is an incremental technical advance for a fairly niche application (where you have known target genes and are concerned about potential culture-bias) but it may be useful in particular for understanding HGT in microbiomes. There are some problems with the method that are brought to the foreground by the authors rather than quietly dropped, which is commendable.

Thank you for acknowledging our effort to be up front about the strengths and weaknesses of OIL-PCR. We hope that this information will help inform other researchers in applying this method.

One problem appears to be that the necessary dilution for single-cell PCR reduces the taxonomic diversity of the metagenome. The only way around this to perform efficient sampling appears to be to perform multiple independent sequencing experiments and pool the results. Another feature of the system is that the accuracy falls slightly as the proportion of the target sequence in the community increases for reasons that are not discussed. However, this effect is not great (97% accuracy at 10% proportion) and most applications, the target cells will be a much lower proportion of the community.

The results of the demonstration study on metagenomes from neutropenic patients are clearly described and provide a nicely worked example of combining this directed method with metagenome sequencing. The significance is limited but gives some descriptive hits about the mechanism of HGT between Romboutsia and Klebsiella.

Other points:

Unfortunately, there was no comparative test where the same samples were run against "competing" technologies (e.g sequencing of cultured beta-lactam resistant strains, epicPCR, Hi-C or single-cell) to directly compare strengths (and weaknesses) of OIL-PCR.

Thank you for this fair criticism that we did not compare OIL-PCR to other available methods. We address comparing OIL-PCR to Hi-C in our response to major point #4 (above). With regards to epicPCR, we did consider comparing OIL-PCR to epicPCR, but decided against it for two main reasons: 1) Acquiring all the reagents necessary to perform epicPCR was cost- prohibitive (over $1,000 for the one demonstration experiment), and 2) because a large motivation for the development of OIL-PCR is the difficulty of performing epicPCR. Although we believe that both epicPCR and OIL-PCR are robust methods, OIL-PCR is a shorter protocol that does not rely on hazardous, costly and difficult to obtain reagents. We were concerned an inexperienced attempt by us to perform epicPCR would likely have yielded poor results and would not provide a fair comparison. Overall, we feel that the validation experiments we perform with OIL-PCR are enough to highlight both the strengths and weakness of the method. As protocol development is central to this manuscript paper, and one of the main advantages claimed for OIL-PCR is ease of use, the supplement should contain a detailed protocol for control sample with a list of equipment and reagents needed and what results should be obtained. This could easily be adapted from the methods section, which is highly detailed. What is the estimated cost-per sample of this procedure and how does it compare roughly with other methods, - EPIC-PCR and culture-based? Thank you for the suggestion that we provide a detailed protocol. We hope that the inclusion of this step-by-step protocol will enable more labs to adopt the method. The cost of OIL is approximately$15 per replicate. The cost is largely driven by the large amount of Phusion polymerase needed, which is the same as in epicPCR. Culturing may be less expensive depending on the cost of reagents needed for media, antibiotics etc, but we do not feel the two are comparable. For example, even though we show that Romboutsia did not acquire resistance genes in this case, even if it had, culturing would not have captured it due to the difficult and specific culturing conditions required for growing most Romboutsia strains.

Line 197-198 reference needed to the Kent et al study here? What is the reason that the Hi-C results from this manuscript are not compared to the results of the OIL-PCR experiments?

Thank you for this suggestion. The congruence of our results highlights the strengths of both approaches. As we discuss in detail for major point 4 (above), the Hi-C and OIL-PCR results both correctly identify Klebsiella as a carrier of the plasmid with CTX-M and TEM. We have now added this to the manuscript.

#### URL

4. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

The manuscript by Chakraborty focuses on methods to direct dsDNA to specific cell types within an intact multicellular organism, with the ultimate goal of targeting DNA-based nanodevices, often as biosensors within endosomes and lysosomes. Taking advantage of the endogenous SID-2 dsRNA receptor expressed in C. elegans intestinal cells, the authors show that dsDNA conjugated to dsRNA can be taken into the intestinal endosomal system via feeding and apical endocytosis, while dsDNA alone is not an efficient endocytic cargo from the gut lumen. Since most cells do not express a dsRNA receptor, the authors sought to develop a more generalizable approach. Via phage display screening they identified a novel camelid antibody 9E that recognizes a short specific DNA sequence that can be included at the 3' end of synthesized dsDNAs. The authors then showed that this antibody can direct binding, and in some cases endocytosis, of such DNAs when 9E was expressed as a fusion with transmembrane protein SNB-1. This approach was successful in targeting microinjected dsDNA pan-neuronally when expressed via the snb-1 promoter, and to specific neuronal subsets when expressed via other promoters. Endocytosed dsDNA appeared in puncta moving in neuronal processes, suggesting entry into endosomes. Plasma membrane targeting appeared feasible using 9E fusion to ODR-2.

The major strength of the paper is in the identification and testing of the 9E camelid antibody as part of a generalizable dsDNA targeting system. This aspect of the paper will likely be of wide interest and potentially high impact, since it could be applied in any intact animal system subject to transgene expression. A weakness of the paper is the choice of "nanodevice". It was not clear what utility was present in the DNAs used, such as D38, that made them "devices", aside from their fluorescent tag that allowed tracking their localization.

We used a DNA nanodevice, denoted pHlava-9E, that uses pHrodo as a pH-sensitive dye. pHlava-9E is designed to provide a digital output of compartmentalization i.e., its pH profile is such that even if it is internalized into a mildly acidic vesicle, the pH readout is as high as one would observe with a lysosome. This gives an unambiguous readout of surface-immobilized probe to endocytosed probe.

Another potential weakness is that the delivered DNA is limited to the cell surface or the lumen of endomembrane compartments without access to the cytoplasm or nucleus. In general the data appeared to be of high quality and was well controlled, supporting the authors conclusions.

We completely agree that we cannot target DNA nanodevices to sub-cellular locations such as the cytoplasm or the nucleus with this strategy. However, we do not see this as a “weakness”, but rather, as a limitation of the current capabilities of DNA nanotechnology. It must be mentioned that though fluorescent proteins were first described in 1962, it was 30 years before others targeted them to the endoplasmic reticulum (1992) or the nucleus (1993)(Brini et al., 1993; Kendall et al., 1992). Probe technologies undergo stage-wise improvements/expansions. We have therefore added a small section in the conclusions section outlining the future challenges in sub-cellular targeting of DNA-nanodevices.

Reviewer #2 (Public Review):

The authors demonstrate the tissue-specific and cell-specific targeting of double-stranded DNA (dsDNA) using C. elegans as a model host animal. The authors focused on two distinct tissues and delivery routes: feeding dsDNA to target a class of organelles within intestinal cells, and injecting dsDNA to target presynaptic endocytic structures in neurons. To achieve efficient intestinal targeting, the authors leveraged dsRNA uptake via endogenous intestinal SID-2 receptors by fusing dsRNA to a fluorophore-labeled dsDNA probe. In contrast, neuronal endosome/synaptic vesicle (SV) targeting was achieved by designing a nanobody that specifically binds a short dsDNA motif fused to the fluorophore-labeled dsDNA probe. Combining dsDNA probe injection with nanobody neuronal expression (fused to a neuronal vSNARE to achieve synaptic targeting), the authors demonstrated that the injected dsDNA could be taken up by a variety of distinct neuronal subtypes.

Strengths:

While nanodevices built on dsDNA platforms have been shown to be taken up by scavenger receptors in C. elegans (including previous work from several of these authors), this strategy will not work in many tissue types lacking these receptors. The authors successfully circumvented this limitation using distinct strategies for two cell types in the worm, thereby providing a more general approach for future efforts. The approaches are creative, and the nanobody development in particular allows for endocytic delivery in any cell type. The authors exploited quantitative imaging approaches to examine the subcellular targeting of dsDNA probes in living animals and manipulated endogenous receptors to demonstrate the mechanism of dsRNA-based dsDNA uptake in intestinal cells.

Weaknesses:

To validate successful delivery of a functional nanodevice, one would ideally demonstrate the function of a particular nanodevice in at least one of the examples provided in this work. The authors have successfully used a variety of custom-designed dsDNA probes in living worms in numerous past studies, so this would not be a technical hurdle. In the current study, the reader has no means of assessing whether the dsDNA is intact and functional within its intracellular compartment.

We now demonstrate the use of a functional nanodevice to detect pH profiles of a given microenvironment. This functional nanodevice contains two fluorescent reporter dyes, each attached to one of the strands of a DNA duplex. In order to obtain pH readouts, the device integrity is essential for ratiometric sensing.

Coelomocytes are cells known for their scavenging and degradative lysosomal machinery. Previous studies of the stability of variously structured DNA nanodevices in coelomocytes, have shown that DNA devices based on 38 bp DNA duplexes have a half life of >8 hours in actively scavenging cells such as coelomocytes (Chakraborty et al., 2017; Surana et al., 2013) Given that our sensing in the gut as well as in the neuron are performed in <1 hour post feeding or injection, pHlava-9E is >97% intact.

Another minor weakness is the lack of a quantitative assessment of colocalization in intestinal cells or neurons in an otherwise nicely quantitative study. Since characterization of the targeting described here is an essential part of evaluating the method, a stronger demonstration of colocalization would significantly buttress the authors' claims.

We have now quantified colocalization in each cellular system. Please see Figure R1 below (Figure 1 Supplementary figure 1 and Figure 4 Supplementary figure 2 of the revised manuscript).

Figure R1: a) Pearson’s correlation coefficient (PCC) calculated for the colocalization between R50D38 (red) and lysosomal markers LMP-1 or GLO-1 (green) in the indicated transgenic worms. b) & d) Representative images of nanodevice nD647 uptake (red) in transgenics expressing both prab-3::gfp::rab-3 (green) and psnb-1:snb-1::9E c - e) Normalized line intensity profiles across the indicated lines in b and d; f) Percentage colocalization of nD647 (red) with RAB3:GFP (green). Error bar represents the standard deviation between two data sets.

While somewhat incomplete, this study represents a step forward in the development of a general targeting approach amenable to nanodevice delivery in animal models.

#### URL

5. May 2021
6. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

This study investigates the temporal orientation abilities of cerebellar degeneration and control subjects during an orientation discrimination task of visual stimuli with showed a contrast near threshold. Participants were queried to express their discrimination decision with a response only after a random delay following target offset, which decreases the motor preparation component of the task in the interval-based condition. CD subjects showed similar visual discrimination performance to controls when cued by a rhythmic set of stimuli but showed no benefit when the target interval was presented aperiodically. The authors interpret these findings as evidence supporting the notion that the cerebellum plays a role in interval based attentional orienting to proactively modulate perception. This is an elegantly simple experiment providing a novel observation in the field.

2. Reviewer #2 (Public Review):

The article by Breska and Ivry provides a nice, timely, and relevant continuation of their previous recent work on the role of the cerebellum in interval-based (but not rhythm-based) anticipation in time. While in their related prior work (in particular their recent articles in PNAS and Science Advances) the authors used simple reaction time tasks that made it difficult to attribute the observed effects to visual vs. motor anticipatory mechanisms, in the current work they used a perceptual discrimination task with a delayed response to focus on potential contributions of the cerebellum to temporal anticipation specifically for perceptual sensitivity (where the role of the cerebellum is less obvious, given it has traditionally been implicated more in motor control than in perception). They do so by comparing individuals with cerebellar degeneration to controls, and finding a selective impairment of the individuals with cerebellar degeneration to use interval-based temporal predictions to facilitate visual discrimination, while rhythm-based performance benefits are spared (providing a neat comparison and control).

I have no major comments to detail. The short report is well written, complements related work by the authors nicely, and makes an important and novel contribution to the literature on temporal anticipation (while also having relevant implications more generally for views on the role of the cerebellum in cognition).

3. Reviewer #1 (Public Review):

Breska and Ivry tested the role of the cerebellum in temporal expectation, specifically in how temporal expectation affects perception. The question is interesting, as the neural mechanisms mediating the substantial effects of temporal expectation on perception are not well understood. The authors found that in a perceptual discrimination task, individuals with cerebellar degeneration (CD) showed reduced effects of temporal expectation on discriminability with interval timing cues, but intact effects with rhythmic cues. This shows that the role of the cerebellum in temporal expectation (which had been previously demonstrated by the authors) is not merely one of motor preparation. Rather, the cerebellum appears to play a causal role in bringing about the perceptual consequences of temporal expectation for predictable intervals. It also reveals differences between interval timing and rhythmic manipulations in terms of the mechanisms by which they affect perception.

This is a straightforward study with a clean experimental approach and clear presentation of the data. However, I felt the manuscript would benefit from a more thorough analysis of the dataset, especially given the rarity of individuals with CD.

4. Evaluation Summary:

This study provides evidence that individuals with cerebellar degeneration show reduced effects of temporal expectation on perceptual discriminability with interval timing cues, but intact effects with rhythmic cues. The authors compare individuals with cerebellar degeneration to controls, and find a selective impairment of the individuals with cerebellar degeneration to use interval-based temporal predictions to facilitate visual discrimination, whereas rhythm-based performance benefits are spared. This study is of interest to psychologists and neuroscientists investigating prediction, perception, attention, and motor control, as it demonstrates a key role for the cerebellum in mediating the effects of interval-based temporal expectation on perception.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

#### URL

7. www.biorxiv.org www.biorxiv.org
1. Author Response:

Evaluation Summary:

This paper will be of interest to biologists who study mechanisms of cell-to-cell variability in gene expression and those who wish to have a tool to alter variability in mammalian cells. Key regulators of gene expression variability in mammalian cells are identified and noise modulation in a synthetic system is shown. The data quality is high. A model for the origin of the observed noise is proposed, but will require some additional experimental evidence.

We thank the reviewers for their thorough reviews, insightful critics, and very constructive suggestions of our manuscript. It genuinely helps us improve our work and manuscript. We have performed all the additional experiments suggested. We believe that our new results and revised manuscript answered these questions raised by the reviewers and editors.

Reviewer #1 (Public Review):

The manuscript aims to identify origins of stochasticity ('noise') in mammalian gene expression focused on the case when a single transcription factor controls the expression of a target gene. It also aims to devise strategies to control mean and variance of gene expression independently.

The experimental approach uses a light-induced transcriptional activator in two stimulation modes, namely amplitude modulation (AM: time-constant light input) and pulse width modulation (PWM: periodic light inputs in the form of a pulse train). Perturbation experiments target histone-modifying enzymes to influence epigenetic states, with corresponding measurements of single-cell epigenetic states and mRNA dynamics to dissect mechanisms of noise control. Beyond this synthetic setting, the study is complemented by endogenous gene expression noise in human and mouse cells under the same perturbations.

Major strengths of the study are:

• The experimental demonstration that, and under which conditions PWM can reduce gene expression noise in mammalian cells; the corresponding data sets could be very valuable for further quantitative analysis.
• Providing strong evidence via perturbation studies that the extent of gene expression noise is linked to chromatin-modifying activities, specifically opposing HDAC4/5 histone deacetylase activities and CBP/p300 histone acetyltransferase activities.
• Proposing a positive-feedback model established by these two opposing activities that is consistent with the reported data from perturbation experiments and on chromatin accessibility / modification states.
• Providing evidence that also in the natural (human and mouse cell) setting, the regulators HDAC4/5 and CBP/p300 contribute to the control of gene expression noise.

We thank the reviewer for the careful analysis of our manuscript.

Major weaknesses are:

We appreciate that the reviewer pointed out two studies with E. coli and yeast with similar PWM. We believed that their concepts were different. The concept of “stabilized unstable steady states” was a specifically developed in control chaos in physical by Ott, Grebogi, and Yorke (OGY theory, https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.64.1196 ). Their motivation was to feedback control chaos with small perturbation in the systems. Non-feedback control with small periodic perturbation has also been shown to control chaos by stabilizing unstable steady state. The E. Coli work to stabilize an unstable steady state could be considered as an extension of these concepts in complex biological systems. In addition, the location of unstable steady state in a bistable system would decrease with increasing light intensity, as shown in the black dashed line in Figure 2E, inconsistent with our result that the mean mRuby is monotonically correlated with the mean light intensity (Figure 1C).

It is correct that the hypothesis proposed by Benzinger and Khammash in their yeast paper, that the cooperative TF-gene expression curve is sufficient to generate bimodal distribution with high variable TF distribution, shown in Figure 1G. But it is not the case in our study. In our experiment, GAVPO and mRuby expression do not exhibit clear cooperativity. In addition, the authors didn’t show bimodality unless a non-isogenic cell population is used (Fig. 3h in Benzinger and Khammash’s paper).

• Insufficient evidence for the postulated bistability caused by positive feedback on chromatin states in the mammalian system analyzed, which has implications for the mechanistic explanations provided (e.g., if PWM allows rapid cell switching between 'high' and 'low' states as postulated).

We agree with the reviewer that the current technology limits the possibility to obtain more direct evidence of bistability in chromatin states. Our scATAC-seq data shows that chromatin openness oscillated between light “on” and “off” phase with reduced heterogeneity comparing to the dark control. Our bulk data suggest that H3K27ac has larger differences between “high” and “low” states. A better measurement would be single-cell ChiP-seq for H3K27ac. However, the current single-cell ChiP-seq technologies provide coverages too low (~1% of scATAC-seq reads) to support measurements at specific loci (https://www.nature.com/articles/s41592-021-01060-3, https://www.nature.com/articles/s41587-021-00869-9 ).

• Limited theoretical support for the proposed (not directly observable) mechanisms that uses a mathematical model illustrating the potential consistency, but the model is not directly linked to the experimental data and hence of limited use for their interpretation.

Our ODE model wasn’t built to fit to the experimental data. We used it to generate hypotheses with perturbation in HDAC4/5 and CBP/p300. We validate the model prediction of inhibition p300 reducing heterogeneity.

It was validated in experiments. We have built a stochastic model containing all the processes in our ODE model, considered nine independent promoters, and have written the code for stochastic simulation algorithm similar to the yeast paper, and performed optimization. But we don’t have enough CPU time to fit to the experimental data and finding the “global minimum” using the parallel tempering Monte-Carlo method (https://pubmed.ncbi.nlm.nih.gov/19810318/).

Overall, the authors achieved their aim of elucidating mechanisms for noise control in mammalian gene expression by identifying specific, opposing regulators of chromatin states, with clear support in the synthetic setting, and evidence in endogenous expression control. Conceptual advances regarding strategies for the external control of gene expression noise appear limited because of prior work, which includes more in-depth theoretical analysis in simpler (bacterial, yeast) systems.

Hence, the likely impact of the work will be primarily on the more detailed (in terms of histone regulators, etc.) study of noise control in mammalian cells, while the data sets presented in the study could prove valuable for follow-up quantitative (model-based) analyses because they are unique in combining different readouts such as single-cell protein and mRNA abundances as well as histone and chromatin states.

We appreciate that reviewer finds this manuscript support that the molecular mechanisms regulate mammalian gene expression noise control in both synthetic and endogenous gene regulations.

Reviewer #2 (Public Review):

The manuscript describes a tool to independently tune mean protein expression levels and noise. Light induces dimerization and subsequent activation of transcriptional activator GAVPO. By introducing 5xUAS (a target sequence for dimerized GAVPO) upstream a mRuby reporter gene, the effect of light can be measured on mRuby mean and noise.

By pulsing light at different periods (from 100-400 minutes), the authors reduce the mRuby noise for intermediate average light intensities. Notably, the pulses are all applied at an absolute light intensity of 100 uW/cm2, with the average light intensity being modulated through the light-off time-periods. Therefore, as all periods tend towards 100 uW/cm2 average light intensity, the PWM duty cycles becomes more similar to the 100 uW/cm2 AM case.

Strengths:

The proposed method is an elegant way to independently tune protein mean and noise. This would have a broad application in the field and is much needed to be able to study the consequence of protein expression noise, independently of mean. In addition, the authors use multiple powerful single-cell techniques to try and determine the mechanism underpinning the light-induced noise modulation.

During constant exposure to light, increased light intensity increases the mean expression of mRuby, while decreasing the noise. This high noise is mostly due to observed bimodality in mRuby expression. Through ODEs and by using small molecule inhibitors, the authors show that this bimodality is caused by some cells being stably off, while other cells enter an on state. In this on state a positive feedback can occur where initial binding of dimerized GAVPO induces histone acetylation and chromatin accessibility, and thus stimulates further GAVPO binding. Bistability induced by constant light exposure is disrupted using small molecule inhibitors of CBP/p300 HAT activity, indicating that histone regulation is a cause for this observed bistability. The stable on state is demonstrated to be more active and accessible through ChIP-seq and ATAC-seq respectively.

We appreciate that reviewer recognize that our method of independent tuning protein mean and noise has a broad application and is much needed, and our adaptation of integrating multiple single cell analyses to determine noise control mechanism. We believe that this method would be proven especially useful in cell fate control studies, in vitro with stem cell differentiation or in vivo with embryo development.

Weakness:

The single-cell ATAC-seq data indicate that pulsing light induces switching from an accessible (light on) to inaccessible (light off) chromatin state. The authors argue that the switching back into a chromatin inaccessible state prevents the positive feedback to occur and thus reduces noise. However, there are weaknesses in the description of the mechanism by which the pulses modulate (i.e., reduce) noise. Overall, since these sections in the manuscript are not easy to understand, it is difficult to parse what mechanism the authors attributed to the observed noise reduction and to assess if the data supports the conclusions.

We apologize for the lack of clarity in this aspect. We have extensively rewritten the descriptions in the related sections. As the PWM light intensities alternate between 100 uW/cm2 and dark, which located at high and low monostable states. We need to show if the fraction of times at each state are sufficient. The scATAC-seq data indicate, one 150-minute of 100 uW/cm2 light pulse is sufficient to elevate the chromatin accessibility while reduce the cell-cell variations, two features of the high monostable state. The 450-minute dark period will reduce the chromatin accessibility. In this dark period, the cells will fall back to the low monostable state without sufficient activated GAVPO. H3K27ac has larger dynamic range between low and high state (Figure 3J), but single-cell ChiP-seq methods don’t provide sufficient coverage to assess H3K27ac heterogeneity at the 5xUAS-mRuby loci. Nevertheless, indirect evidences with perturbation of p300 activation or GAVPO-p300 interactions support this picture.

The data from the single-mRNA live-cell imaging experiments are somewhat ambiguous and do not necessarily support some of the arguments. The conclusion that transcription, nuclear export, and mRNA degradation flatten the pulsatile chromatin caused by the PWM is not clear from the data. Especially, since most cells do not show any pulsatile behavior both in the single-cell ATAC-seq and the live-cell imaging data.

We improved the presentation of the data. With the data presented in logarithm scale, it is visible that most cells exhibit pulsatile behavior (new Figure 5C). These can be further visualized with averaging over subpopulation of cells. As shown in Figure 5G in the revised manuscript. there are approximated 57% of cells show oscillations. The mean mRNA shows a damped periodic oscillation. The statement that nuclear export, and mRNA degradation flatten the pulsatile chromatin caused by the PWM are postulated due to the rate constants in the literatures, and removed in the revised manuscript. The half-life of mRuby is about 24 hours, sufficiently longer than the period of PWM. We have added an analysis of single-cell mRuby dynamics with 400 min PWM, which don’t exhibit periodic oscillations (Figure 5-figure supplement 2).

Reviewer #3 (Public Review):

The authors use a synthetic light-controlled transcription factor (GAVPO) to test a model of bistable gene expression that is hypothesized to originate from positive feedback via local histone modifications by trans-activator recruitment of CBP/p300 to facilitate open chromatin, which facilitates GAVPO binding, etc… Their proposed model for the origin of bistability is important because it should apply to any trans-activator that recruits CBP/p300 to modify chromatin and active gene expression. The authors show that periodic modulation of light reduces the bimodal distribution at intermediate light-intensity levels to a unimodal distribution. This is an elegant demonstration of how GAVPO and different temporal patterns of light can reduce cell-to-cell variability in gene expression, if needed.

Strengths:

The authors generate an impressive amount of single-cell data of gene expression and chromatin state (flow cytometry, single-cell sequencing, live-cell MS2-tagging) at different intensity levels. The periodic modulation of GAVPO activity by light is a practical demonstration of how to sculpt the gene expression output in useful ways. This may be a very useful tool for future biologists.

We thank the reviewer for the positive comments on the mammalian noise control mechanism we discovery and its broad implications.

Weakness:

The proposed model for bistability is not convincingly tested or supported by the existing data. Each reporter should exhibit a bistable response because the positive feedback is localized to the promoter via cis-effects on gene expression by local chromatin state/GAVPO binding. The authors show a bimodal distribution of gene expression in a population of cells, which is consistent with a bistable response in a single reporter gene. However, their strain has 9 independent reporters integrated into the genome. Thus, I would expect to see up to 10 peaks, not 2 peaks. Moreover, the mathematical model used to validate their observations does not model the total expression from 9 independent promoters, which is a critical omission given the cis-nature of the positive feedback loop. The fact that these 9 promoters generate 2 peaks at intermediate light intensity suggests that the GAVPO bistability likely originates from a trans-effect, i.e., either all 9 promoters are OFF or all 9 promoters are ON, not a cis-effect.

We appreciate the reviewer’s insight. We agree that theoretically there should be potentially 10 peaks. The separation between two adjacent “high” peaks is about 2 folds. The experimentally measure high mRuby peak with the lowest CV is about 0.47 (cells under maximum light with LMK-235 and A485, Figure 3B). This variation could overshadow the 2-fold differences in mean mRuby and prevent the recognition of multiple “high” peaks. On the other hand, the difference between low state and any of the high states is large enough to be recognized as separate peaks. We emulate the case with the 9 sites chose “low” and “high” states stochastically and stochastically (Figure 3-figure supplement 2). The 9 potential high peaks are convoluted into a broader peak, similar to experimental observations.

We agree that our model is very simple and didn’t model the total expression from independent promoter. We have built a stochastic model containing all the processes in our ODE model, considered nine independent promoters. Unfortunately the fitting to experimental data using the parallel tempering Monte-Carlo method costs too much time.

We performed additional experiments to mutate p65AD of GAVPO to specifically reduce its interaction with CBP/p300. The disappearance of bimodal distribution validates that the direct interaction between UAS-binding GAVPO and CBP/p300 causes the bistability, not a trans-effect through intermediates. We performed single-cell mRuby dynamics and selected cells with nearly identical GAVPO (Figure 2H). The mRuby-high cells elevated earlier and stay at high state (red lines in Figure 2G), and the mRuby-low cells remain low (blue lines in Figure 2G). There are a few cells seem to make the transitions between the two states. These data are consistent with bistability model with small rates of stochastic transition in between. Prior exposure to 100 uW/cm2 light also tilted the distribution toward the “high” state, validate the hysteresis properties of the bistability (Figure 2I-J).

2. Reviewer #1 (Public Review):

The manuscript aims to identify origins of stochasticity ('noise') in mammalian gene expression focused on the case when a single transcription factor controls the expression of a target gene. It also aims to devise strategies to control mean and variance of gene expression independently.

The experimental approach uses a light-induced transcriptional activator in two stimulation modes, namely amplitude modulation (AM: time-constant light input) and pulse width modulation (PWM: periodic light inputs in the form of a pulse train). Perturbation experiments target histone-modifying enzymes to influence epigenetic states, with corresponding measurements of single-cell epigenetic states and mRNA dynamics to dissect mechanisms of noise control. Beyond this synthetic setting, the study is complemented by endogenous gene expression noise in human and mouse cells under the same perturbations.

Major strengths of the study are:

• The experimental demonstration that, and under which conditions PWM can reduce gene expression noise in mammalian cells; the corresponding data sets could be very valuable for further quantitative analysis.
• Providing strong evidence via perturbation studies that the extent of gene expression noise is linked to chromatin-modifying activities, specifically opposing HDAC4/5 histone deacetylase activities and CBP/p300 histone acetyltransferase activities.
• Proposing a positive-feedback model established by these two opposing activities that is consistent with the reported data from perturbation experiments and on chromatin accessibility / modification states.
• Providing evidence that also in the natural (human and mouse cell) setting, the regulators HDAC4/5 and CBP/p300 contribute to the control of gene expression noise.

Major weaknesses are:

• Limited conceptual novelty because noise-reducing effects of PWM have been demonstrated and analyzed previously in synthetic systems in bacteria (with an engineered positive feedback loop; https://www.nature.com/articles/s41467-017-01498-0) and in yeast (with an engineered single transcription factor as in the present study: https://www.nature.com/articles/s41467-018-05882-2#Sec25).
• Insufficient evidence for the postulated bistability caused by positive feedback on chromatin states in the mammalian system analyzed, which has implications for the mechanistic explanations provided (e.g., if PWM allows rapid cell switching between 'high' and 'low' states as postulated).
• Limited theoretical support for the proposed (not directly observable) mechanisms that uses a mathematical model illustrating the potential consistency, but the model is not directly linked to the experimental data and hence of limited use for their interpretation.

Overall, the authors achieved their aim of elucidating mechanisms for noise control in mammalian gene expression by identifying specific, opposing regulators of chromatin states, with clear support in the synthetic setting, and evidence in endogenous expression control. Conceptual advances regarding strategies for the external control of gene expression noise appear limited because of prior work, which includes more in-depth theoretical analysis in simpler (bacterial, yeast) systems.

Hence, the likely impact of the work will be primarily on the more detailed (in terms of histone regulators, etc.) study of noise control in mammalian cells, while the data sets presented in the study could prove valuable for follow-up quantitative (model-based) analyses because they are unique in combining different readouts such as single-cell protein and mRNA abundances as well as histone and chromatin states.

3. Reviewer #3 (Public Review):

The authors use a synthetic light-controlled transcription factor (GAVPO) to test a model of bistable gene expression that is hypothesized to originate from positive feedback via local histone modifications by trans-activator recruitment of CBP/p300 to facilitate open chromatin, which facilitates GAVPO binding, etc... Their proposed model for the origin of bistability is important because it should apply to any trans-activator that recruits CBP/p300 to modify chromatin and active gene expression. The authors show that periodic modulation of light reduces the bimodal distribution at intermediate light-intensity levels to a unimodal distribution. This is an elegant demonstration of how GAVPO and different temporal patterns of light can reduce cell-to-cell variability in gene expression, if needed.

Strengths:

The authors generate an impressive amount of single-cell data of gene expression and chromatin state (flow cytometry, single-cell sequencing, live-cell MS2-tagging) at different intensity levels. The periodic modulation of GAVPO activity by light is a practical demonstration of how to sculpt the gene expression output in useful ways. This may be a very useful tool for future biologists.

Weakness:

The proposed model for bistability is not convincingly tested or supported by the existing data. Each reporter should exhibit a bistable response because the positive feedback is localized to the promoter via cis-effects on gene expression by local chromatin state/GAVPO binding. The authors show a bimodal distribution of gene expression in a population of cells, which is consistent with a bistable response in a single reporter gene. However, their strain has 9 independent reporters integrated into the genome. Thus, I would expect to see up to 10 peaks, not 2 peaks. Moreover, the mathematical model used to validate their observations does not model the total expression from 9 independent promoters, which is a critical omission given the cis-nature of the positive feedback loop. The fact that these 9 promoters generate 2 peaks at intermediate light intensity suggests that the GAVPO bistability likely originates from a trans-effect, i.e., either all 9 promoters are OFF or all 9 promoters are ON, not a cis-effect.

4. Reviewer #2 (Public Review):

The manuscript describes a tool to independently tune mean protein expression levels and noise. Light induces dimerization and subsequent activation of transcriptional activator GAVPO. By introducing 5xUAS (a target sequence for dimerized GAVPO) upstream a mRuby reporter gene, the effect of light can be measured on mRuby mean and noise.

By pulsing light at different periods (from 100-400 minutes), the authors reduce the mRuby noise for intermediate average light intensities. Notably, the pulses are all applied at an absolute light intensity of 100 uW/cm2, with the average light intensity being modulated through the light-off time-periods. Therefore, as all periods tend towards 100 uW/cm2 average light intensity, the PWM duty cycles becomes more similar to the 100 uW/cm2 AM case.

Strengths:

The proposed method is an elegant way to independently tune protein mean and noise. This would have a broad application in the field and is much needed to be able to study the consequence of protein expression noise, independently of mean. In addition, the authors use multiple powerful single-cell techniques to try and determine the mechanism underpinning the light-induced noise modulation.

During constant exposure to light, increased light intensity increases the mean expression of mRuby, while decreasing the noise. This high noise is mostly due to observed bimodality in mRuby expression. Through ODEs and by using small molecule inhibitors, the authors show that this bimodality is caused by some cells being stably off, while other cells enter an on state. In this on state a positive feedback can occur where initial binding of dimerized GAVPO induces histone acetylation and chromatin accessibility, and thus stimulates further GAVPO binding. Bistability induced by constant light exposure is disrupted using small molecule inhibitors of CBP/p300 HAT activity, indicating that histone regulation is a cause for this observed bistability. The stable on state is demonstrated to be more active and accessible through ChIP-seq and ATAC-seq respectively.

Weakness:

The single-cell ATAC-seq data indicate that pulsing light induces switching from an accessible (light on) to inaccessible (light off) chromatin state. The authors argue that the switching back into a chromatin inaccessible state prevents the positive feedback to occur and thus reduces noise. However, there are weaknesses in the description of the mechanism by which the pulses modulate (i.e., reduce) noise. Overall, since these sections in the manuscript are not easy to understand, it is difficult to parse what mechanism the authors attributed to the observed noise reduction and to assess if the data supports the conclusions.

The data from the single-mRNA live-cell imaging experiments are somewhat ambiguous and do not necessarily support some of the arguments. The conclusion that transcription, nuclear export, and mRNA degradation flatten the pulsatile chromatin caused by the PWM is not clear from the data. Especially, since most cells do not show any pulsatile behavior both in the single-cell ATAC-seq and the live-cell imaging data.

5. Evaluation Summary:

This paper will be of interest to biologists who study mechanisms of cell-to-cell variability in gene expression and those who wish to have a tool to alter variability in mammalian cells. Key regulators of gene expression variability in mammalian cells are identified and noise modulation in a synthetic system is shown. The data quality is high. A model for the origin of the observed noise is proposed, but will require some additional experimental evidence.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

#### URL

8. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

The manuscript titled "The Shu complex prevents mutagenesis and cytotoxicity of single-strand specific alkylation lesions" investigates the biological function of the Shu complex in S. cerevisiae. The Shu complex, containing a DNA binding module comprised of the Csm2-Psy3 heterodimer, is conserved from budding yeast to man, and contributes to the defense against DNA damage caused by DNA alkylation. DNA alkylation occurs due to spontaneous reactions with metabolites and can be greatly increased by exogenous exposure to DNA alkylating agents. Therefore, it is an important question for how the Shu complex acts to detect and direct repair of alkylation damage. It has been well established that loss of the Shu complex sensitizes cells to alkylation damage, but the mechanism by which this complex locates sites of DNA damage and directs repair is not fully understood. This paper measures the methylation-induced mutation spectrum and uses genetic interactions to argue that the Shu complex may be involved in detecting and directing error-free repair of 3-methyl cytosine. This is a plausible hypothesis based on the body of previous work, however the evidence that Csm2-Psy3 directly detects 3-methyl cytosine sites is indirect. It would be highly significant if this complex recognizes many different structures, but future structural information is needed to understand how this could be possible.

The strengths of the paper are in the use of whole genome sequencing to map mutation type and location in different genetic backgrounds and in the systematic testing for genetic interactions between csm2 and other DNA repair factors. It appears that the mutation spectra are very similar in the presence and absence of csm2, which suggests a broad role of the Shu complex in the cellular response to MMS.

The impact of the work is that it could help to explain the cellular program for protection against DNA alkylating agents in budding yeast which has been a very valuable model eukaryotic organism, and raise new questions about how DNA alkylation repair pathways might function in humans that differ from yeast in important features such as in the presence of a direct repair pathway performed by ALKBH2 and ALKBH3.

2. Reviewer #2 (Public Review):

The manuscript entitled "The Shu Complex Prevents Mutagenesis and Cytotoxicity of Single-Strand Specific Alkylation Lesions" by Bonilla and colleagues reports that the yeast Shu complex promotes repair of 3meC in single-stranded DNA during S phase. Specifically, the authors show that mutations and cell lethality induced by MMS in csm2∆ cells are suppressed by overexpression of the human ALKBH2. Further, the authors find that the Csm2-Psy3 module of the Shu complex has increased affinity for 3meC-containing DNA relative to unmodified DNA. The authors propose a model, where the Shu complex binds to 3meC-containing DNA to facilitate HR-dependent post-replicative gap-filling.

3. Reviewer #1 (Public Review):

This study shows that the Shu complex is critical for 3meC damage tolerance in yeast, supporting the existence of a new pathway for the removal of an important DNA lesion that seems essential in yeast but likely contributes in other organisms. At the same time, it contributes to clarify the distinctive role of homologous recombination in double strand break repair and post-replicative repair.

4. Evaluation Summary:

This paper is of potential interest to an audience of DNA repair and cancer biologists because it seeks to refine the mechanism by which cells respond to DNA damage. By combining a number of genetic experiments based on cell survival of different mutant combinations and mutation analysis, their results support the view that Shu is critical for 3meC damage tolerance in yeast. Notably, expression of human ALKBH2, responsible for the repair of 3meC rescues the MMS-sensitivity of Shu mutants but not that of homologous recombination mutants. The study supports the existence of a new pathway for the removal of an important DNA lesion that seems essential in yeast, but likely contributes in other organisms, and helps clarify the distinctive role of homologous recombination in DSB repair and post-replicative repair. A few additional experiments are suggested to strengthen the mechanistic conclusions and better support the central model.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

#### URL

9. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

The authors tackle an interesting question - whether the dentate gyrus is a locus of pathology in Scn1a+/- mice and uncover a strong phenotype - the granule cells of the dentate gyrus are over-activated and the EC to dentate pathway is prone to seizure genesis. In the discussion, they suggest that their results support the idea that the DG may be a common locus to several different types of epilepsy... an attractive hypothesis! There are several strengths of the paper. The team has done a nice job of presenting 'ground-truth' data that their measurements of dF/F across a large population of granule cells correlates with action potentials in these cells. As the authors point out, this is especially important when working in disease models in which the dF/F-action potential relationship may be altered. Throughout, the authors were also careful about considering the limitations of their various techniques and analyze the data in several ways to account for possible artifacts (e.g. ensuring that differences in activation are not arising because of slicing and consideration of kindling in later in vivo seizure threshold experiments). The experiments were well designed and appropriately interpreted.

One of most intriguing results of the work is that PV interneurons in the DG of Scn1a+/- show only very minor impairments in young adult animals (they show more spike accommodation than in control animals). Rather, it seems that the GCs receive enhanced excitation from the entorhinal cortex. They perform a set of pharmacological experiments to prove that PV interneurons (and more generally inhibition) do not account for the difference in granule cell activation - however, here it would be useful to see the data summarized more consistently. It is difficult to interpret the pharmacological results (both of which are presented as changes in dF/F0) with respect to the initial findings of the manuscript (presented as estimated activation across the entire population). A beautiful aspect of this work is that it goes from cells to circuits to intact brain (in vivo). They nicely show that the heightened excitation from the EC to the DG is sufficient to drive seizures in the Scn1a+/- mice, and finally that since PVs are intact, they can be harnessed to balance out the over activation of GC via optogenetic stimulation of PVs.

2. Reviewer #2 (Public Review):

Mattis et al have used a hemizygous mutant of the gene Scn1a to study changes underlying the severe epilepsy disorder Dravet syndrome. They describe a change in activation of the dentate gyrus in this mouse model, due to altered excitatory synaptic input. They show that this occurs in the age range after normalization of early inhibitory interneuron dysfunction. This provides an interesting potential mechanism by which neural circuit function is altered even after deficits in inhibition are seemingly corrected. They also report that stimulation of inputs to the dentate gyrus increase seizure susceptibility when body temperature is elevated. Overall these findings indicate a new form of circuit dysfunction that may underlie the etiology of this severe genetic epilepsy disorder.

These findings are not fully complete, and the manuscript suffers from some flaws in experimental design.

The most pressing issue is the lack of a counter-balanced design in experiments testing the ictogenicity of DG stimulation. The authors attempt to justify this stating "there is a theoretical concern that seizure threshold on Day 2 (the second consecutive day of stimulation) could be lowered by a seizure 24 hours prior (a "kindling"-like phenomenon)". In the very next sentence, they cite a study in which this phenomenon has been shown (thus the concern is not theoretical). That said, this is not a semantic argument, but a flaw in experimental design. On day 1, the authors perform experiment A. On day 2, they perform experiment A+B. In an attempt to show that performing experiment A on day 1 does not by itself lead to changes in experiment A+B, they use a separate cohort and show that experiment A does not lead to changes in a repetition of experiment A. Unfortunately, this is not an adequate control. Experiment A+B involves a different set of stimuli, to which the response could very well be altered by the day 1 experiment, but this change would not be revealed with the described experimental design. To determine whether the effect shown in experiment A+B requires a more rigorous, counter-balanced experimental design where one group undergoes experiment A followed by experiment A+B, and a second group undergoes experiment A+B followed by experiment A.

The second major issue is a lack of wild type control groups for several experiments. The experiments presented in Figures 4, 6C and F, and 7 all lack the necessary wild type control measures. Wild type controls were done for Figure 6E, but the data are not presented in the figure.

Some of the cell physiology experiments presented were not optimally designed to provide a relevant mechanistic follow-up to the major findings. For the first major finding of the paper, Figure 2 shows clear and interesting changes in DG activation in the mouse model, and Figure 5 reveals changes to synaptic excitation and inhibition in these neurons. Figure 3 and 4 present data showing changes to PV-interneuron intrinsic properties that only reveal themselves under very intense stimulation. While these findings are interesting and worthy of follow-up, the changes aren't relevant to the synaptic stimulation used in Figure 2.

Finally, Figure 2 has missing data points, seemingly due to cropping of panels. Data visualization is problematic for this vital figure. The fit lines for individual experiments overwhelm the color-filled variance of the mean. Thus, the data in this figure are very difficult to read and interpret. The figure would benefit from including all the individual data points and summary data, but removing the individual fits or putting them into a supplement.

3. Reviewer #1 (Public Review):

Dravet syndrome is a developmental and epileptic encephalopathy resulting from mutations in a sodium channel subunit that is widely thought to cause disease by affecting synaptic inhibition. Here the authors use a well-established mouse model to show that circuit dysfunction results from excess synaptic excitation in the dentate gyrus, potentially providing new insight into the pathological mechanisms underlying seizure activity.

Strengths of the study include the sophisticated approach of 2P Ca2+ imaging of population activity and whole-cell recording in slices that provide well-supported evidence that circuit dysfunction is independent of GABAergic inhibition. Weaknesses include some oversimplification of the results in the data interpretation such that not all the claims are fully supported and lack of in-depth analysis of the circuit dysfunction with a clear presentation of its developmental time course.

4. Evaluation Summary:

Dravet syndrome, a severe seizure disorder resulting from a sodium channel mutation, is widely thought to result from impaired synaptic inhibition. Here the authors present multi-level evidence that excess synaptic excitation in the dentate gyrus is a locus of pathology. These results provide new insight into pathological mechanisms in Dravet syndrome that will be of interest to a broad range of neuroscientists studying epilepsy, as well as the role of the hippocampus and synaptic alterations in neurological disease.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

#### URL

10. www.biorxiv.org www.biorxiv.org
1. Reviewer #2 (Public Review):

In the manuscript entitled "The Crystal Structure of Bromide-bound GtACR1 Reveals a Pre-activated State in the Transmembrane Anion Tunnel", Li et al. analyzed the effect of bromide binding to GtACR1 by X-ray crystallography and electrophysiology. The authors propose that a bromide ion is bound to the intracellular pocket in the dark, inactivated state and induces a structural transition from an inactivated to a pre-activated state.

I agree that some of the amino acid residues in the current crystal structure change their conformations compared to the previous one reported in 2019 (Li et al., 2019), and it is very impressive that the authors determined the structure using state-of-the-art crystallography technique, ISIMX. However, unfortunately, most of the conclusions and claims described in the manuscript are not well supported by the authors' data.

1) The most serious problem is that the evidence of bromide binding is too weak. The authors showed the composite omit map in Supplementary Figure 1A, but they should present an anomalous difference Fourier map to validate the bromide binding. The authors also claim that they replaced the bromide ion to the water, run the PHENIX refinement, and observed a strong positive electron density at the bromide position in the Fo-Fc difference map (Supplementary Figure 1B). However, when I do the same thing using the provided coordinate and map (I really appreciate the honesty and transparency of the authors), I could not reproduce their result; a weak positive electron density is observed between the bromide position and Pro58 in chain A and there is no positive peak at the position in chain B (Fo-Fc, contoured at 3σ). I am wondering the occupancy and B-factor of the water molecule they show in Supplementary Figure 1B.

In addition to the insufficient evidence, the current models of bromide ions have significant steric clashes. The PDB validation report shows that the top 5 serious steric clashes observed in the coordinate are the contacts between the bromide ions and surrounding residues (PDB validation report, Page 10). I analyzed them and found that the distance between the bromide ion and CG and CD atoms of Pro58 in chain A are only 2.43Å and 2.36Å, respectively. The authors claim that such a close proline-halide interaction has also been observed in the structure of the chloride-pump rhodopsin CIR, but in the structure (PDB ID: 5G28), the distances between the chloride ion and CD and CG atoms of Pro45 are much larger (3.43 and 3.91Å, respectively) and there is no steric clash. Moreover, the authors claim that Pro58 changes its conformation by bromide binding, but it is very possible that the PHENIX program just displaces Pro58 to alleviate the steric clash between the proline and the bromide ion, so the authors should carefully check the possibility.

Overall, the authors should analyze the density again, provide more solid evidence for the bromide binding such as anomalous difference Fourier map, and if they could, they should correct the current significant steric clashes in their models.

2) To analyze the functional importance of putative bromide binding, the authors prepared W246E and W250E mutants and analyzed their electrophysiological properties. Because tryptophan and glutamate are so different in terms of volume and charge, they should analyze other mutants as well. The authors claim that bromide is stabilized by a hydrogen bond interaction formed by the indole NH group of W246, so they should at least test the W246F mutant.

3) The authors claim that the bromide binding in the intracellular pocket induces the conformational change of R94, but the causal relationship is doubtful. As mentioned in the manuscript, R94 forms a salt-bridge with D234 in chain A. However, the arginine has a completely different conformation and does not have any interaction with D234 in chain B. If the bromide binds both in chain A and B and induces the conformational change of R94, why only R94 in chain A interacts with D234? The authors change the pH in the crystallization condition compared to their 2019 study (Li et al., 2019), so the pH may affect the protonation state of D223 and/or other titratable residues and induces the conformational change of R94. The authors should provide more solid evidence for the causal relationship between the bromide binding and the conformational change of R94.

4) The authors assume that the conformational change of R94 creates a functional anion binding site with the Schiff base in GtACR1, but it is too speculative. If the anomalous difference Fourier map does not support the idea, they should delete it.

2. Reviewer #1 (Public Review):

The dark structure of GtACR1 has been almost simultaneously published at the end of 2018 and beginning of 2019 by the Deisseroth and Spudich groups, respectively. Both groups did not manage to solve a structure with an ion bound and there is very limited information on the open conformation of the channel. Both groups identified a central constriction site as being central for the gating mechanism but the Spudich group proposes two additional constrictions (C1 and C3). In this work Li et al are able to solve the structure of a GtACR1 with a bromide bound near C3, which clearly represents a significant step towards understanding the mechanism of light gated anion channels. The structure reveals that Br binds to the intracellular constriction site (C3) resulting in a small opening of C3. The data support the notion that the partial electropositivity of Pro58 together with two tryptophans play a critical role in anion interaction at C3, which was also confirmed by mutagenesis studies. In addition, there was a noteworthy conformational change in the Bromide bound protein in the extracellular constriction (C1), a 180 degree flip of Arg 94 resulting in a salt bridge to Asp 234 and a slight opening of the C1 constriction.

While the data and conclusions are sound, the lack of discussion of their data in the context of the work of others is a bit surprising.

3. Evaluation Summary:

This manuscript reports a significant contribution towards an improved mechanistic understanding of light gated anion channels. The studies, which use the recently established method of in meso in situ serial data collection (IMISX), provide a basis for optimizing the anion channelrhodopsin GtACR1 from the alga Guillardia theta as a neuron-inhibiting optogenetics tool. The work will be of interest to anyone using optogenetics for functional studies. The reviewers had a few comments regarding technical aspects of the work.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors)

4. Author Response:

Reviewer #1 (Public Review):

The dark structure of GtACR1 has been almost simultaneously published at the end of 2018 and beginning of 2019 by the Deisseroth and Spudich groups, respectively. Both groups did not manage to solve a structure with an ion bound and there is very limited information on the open conformation of the channel. Both groups identified a central constriction site as being central for the gating mechanism but the Spudich group proposes two additional constrictions (C1 and C3). In this work Li et al are able to solve the structure of a GtACR1 with a bromide bound near C3, which clearly represents a significant step towards understanding the mechanism of light gated anion channels. The structure reveals that Br binds to the intracellular constriction site (C3) resulting in a small opening of C3. The data support the notion that the partial electropositivity of Pro58 together with two tryptophans play a critical role in anion interaction at C3, which was also confirmed by mutagenesis studies. In addition, there was a noteworthy conformational change in the Bromide bound protein in the extracellular constriction (C1), a 180 degree flip of Arg 94 resulting in a salt bridge to Asp 234 and a slight opening of the C1 constriction.

While the data and conclusions are sound, the lack of discussion of their data in the context of the work of others is a bit surprising.

We thank the reviewer for thorough reading of our submission and constructive criticism, which helped us to improve the quality of our manuscript. As requested, we added the following paragraph at the end of the Results section (lines 219-233):

“Studies in 3 different laboratories have concluded that Asp234 is neutral in the dark state from measurements of the D234N mutant of GtACR1 by UV-vis absorption spectroscopy (Kim et al., 2018; Sineshchekov et al., 2016), Resonance Raman spectroscopy (Yi et al., 2016), and FTIR (Kim et al., 2018). Both studies of independently determined crystal structures of GtACR1 attribute the major component of its neutralization to hydrogen-bonding to Tyr207 and Tyr72 (Kim et al., 2018, Li et al., 2019), leaving open partial electronegativity of Asp234 participating in hydrogen-bonding to the protonated Schiff base (PSB). The Asp234 residue is expected to be functionally important given its proximity to the PSB and its nearly universal conservation in microbial rhodopsins. Kim et al (Kim et al., 2018) conducted an extensive analysis of Asp234 and report that the D234N mutation nearly abolished photocurrents. Reduced photocurrents to 20% of wild-type from the D234N mutation were also observed by Sineshchekov et al. (Sineshchekov et al., 2015). Differences in extent of photocurrent reduction are likely attributable to different assay conditions used in these studies. The electrostatic interaction of Arg94 with Asp234 in the pre-activated state may be correlated with the change in the electron conjugation of the retinylidene polyene chain in the dark that we observed by FTIR.”

Reviewer #2 (Public Review):

In the manuscript entitled "The Crystal Structure of Bromide-bound GtACR1 Reveals a Pre-activated State in the Transmembrane Anion Tunnel", Li et al. analyzed the effect of bromide binding to GtACR1 by X-ray crystallography and electrophysiology. The authors propose that a bromide ion is bound to the intracellular pocket in the dark, inactivated state and induces a structural transition from an inactivated to a pre-activated state.

I agree that some of the amino acid residues in the current crystal structure change their conformations compared to the previous one reported in 2019 (Li et al., 2019), and it is very impressive that the authors determined the structure using state-of-the-art crystallography technique, ISIMX. However, unfortunately, most of the conclusions and claims described in the manuscript are not well supported by the authors' data.

1) The most serious problem is that the evidence of bromide binding is too weak. The authors showed the composite omit map in Supplementary Figure 1A, but they should present an anomalous difference Fourier map to validate the bromide binding. The authors also claim that they replaced the bromide ion to the water, run the PHENIX refinement, and observed a strong positive electron density at the bromide position in the Fo-Fc difference map (Supplementary Figure 1B). However, when I do the same thing using the provided coordinate and map (I really appreciate the honesty and transparency of the authors), I could not reproduce their result; a weak positive electron density is observed between the bromide position and Pro58 in chain A and there is no positive peak at the position in chain B (Fo-Fc, contoured at 3σ). I am wondering the occupancy and B-factor of the water molecule they show in Supplementary Figure 1B.

We appreciate the reviewer’s effort in analysis of our structure. As described in the Discussion section (lines 238-248), the identification of bromide is supported by multiple lines of evidence: (1) the composite omit map indicates the presence of bromide at the cytoplasmic port (Suppl. Fig. 1A-1B); (2) we exclude the possibility of a water at the bromide position as demonstrated in the Fo-Fc difference map (Suppl. Fig. 1C-1D); (3) the bromide binding site exhibits a similar chemical conformation seen in chloride-binding structures (Auffinger et al., 2004); (4) functional analysis of W250F and W246F are consistent with the H-bond interaction in the bromide binding site (Fig. 2B); (5) Specific interaction of GtACR1 with bromide in the dark state was further demonstrated by FTIR analysis (Fig. 3). Differences in major bands that reflect the ethylenic (C=C) stretch mode of the retinylidene chromophore show a large bromide-induced alteration in the electron conjugation of the retinylidene polyene chain in the dark, confirming that bromide causes a significant structural change. In sum, these data confirm the bromide binding conformation in the structure.

We agree with the reviewer that the signal of bromide in chain A is stronger than in chain B. We now address the difference throughout the main text and Suppl. Fig. 1. The datasets were collected at 0.91882 Å wavelength, but we did not detect any strong bromide signals in the anomalous difference Fourier map. This may be due to preferential orientation of the thin-plate GtACR1 crystals in the IMISX plate. The weak Br signals may also be attributed to the weak bromide binding conformation, its partial occupancy, and poor intrinsic order. It is not unusual that anomalous signals are influenced by the location of the scatter. For example, in our previous structural determination of YfkE (Wu, PNAS 2013), Seleno-methionine was used to label 12 native Met residues. However, we could identify only 10 Se positions and the other 2 Se were undetectable in the anomalous difference map, despite the dataset collection at the Se absorption peak wavelength. Therefore, the lack of strong anomalous signals does not exclude the presence of bromide in the structure.

Regarding the reviewer’s question, the occupancy of the water is 1 and its B-factor is 71.

In addition to the insufficient evidence, the current models of bromide ions have significant steric clashes. The PDB validation report shows that the top 5 serious steric clashes observed in the coordinate are the contacts between the bromide ions and surrounding residues (PDB validation report, Page 10). I analyzed them and found that the distance between the bromide ion and CG and CD atoms of Pro58 in chain A are only 2.43Å and 2.36Å, respectively. The authors claim that such a close proline-halide interaction has also been observed in the structure of the chloride-pump rhodopsin CIR, but in the structure (PDB ID: 5G28), the distances between the chloride ion and CD and CG atoms of Pro45 are much larger (3.43 and 3.91Å, respectively) and there is no steric clash. Moreover, the authors claim that Pro58 changes its conformation by bromide binding, but it is very possible that the PHENIX program just displaces Pro58 to alleviate the steric clash between the proline and the bromide ion, so the authors should carefully check the possibility.

Overall, the authors should analyze the density again, provide more solid evidence for the bromide binding such as anomalous difference Fourier map, and if they could, they should correct the current significant steric clashes in their models.

We thank the reviewer for pointing out the steric clashes. We have corrected them in the revised structure as demonstrated in the latest validation report. As described in the Results section (line 107-109), the distance between the bromide ion and CG and CD atoms of Pro58 in chain A are now 3.6 Å and 3.1 Å (see the updated structure pdb), respectively, and the distance between the bromide ion and CG and CD atoms of Pro58 in chain B are 4.0 Å and 3.2 Å, respectively, similar to those distances between the chloride ion and CD and CG atoms of Pro45 in ClR (3.43 and 3.91Å, respectively). These modifications do not alter the structure beyond the local binding site of the bromide, and do not change our conclusions.<br> We do not agree that the Br--induced conformational changes are due to the refinement program. To further confirm the Pro58 position, we have performed a refinement by removing Pro58 and adjacent residues using PHENIX. The resulted electron density map shows a positive electron density at the Pro58 position, confirming the conformational changes induced by bromide binding.

2) To analyze the functional importance of putative bromide binding, the authors prepared W246E and W250E mutants and analyzed their electrophysiological properties. Because tryptophan and glutamate are so different in terms of volume and charge, they should analyze other mutants as well. The authors claim that bromide is stabilized by a hydrogen bond interaction formed by the indole NH group of W246, so they should at least test the W246F mutant.

We thank the reviewer for this important suggestion, which helps confirm the bromide binding conformation. The glutamate substitutions were chosen to assess the specific anion selectivity and conductivity of GtACR1 due to the negative charge of its side chain. We now include the data of W246F and W250F in Fig 2B. W250F shows reduction of the current amplitude by 50%, whereas W246F behaves like WT. These results are consistent with the structural observations in which W250, but not W246, stabilizes bromide via H-bond interaction. These results are provided in the Results section (lines 136-142) and in the revised Fig. 2B.

3) The authors claim that the bromide binding in the intracellular pocket induces the conformational change of R94, but the causal relationship is doubtful. As mentioned in the manuscript, R94 forms a salt-bridge with D234 in chain A. However, the arginine has a completely different conformation and does not have any interaction with D234 in chain B. If the bromide binds both in chain A and B and induces the conformational change of R94, why only R94 in chain A interacts with D234? The authors change the pH in the crystallization condition compared to their 2019 study (Li et al., 2019), so the pH may affect the protonation state of D223 and/or other titratable residues and induces the conformational change of R94. The authors should provide more solid evidence for the causal relationship between the bromide binding and the conformational change of R94.

We did not change the pH in the crystallization condition compared to our previous crystallization of GtACR1. Both structures were obtained at pH 5.5 as noted in the manuscript. In our structure, the only bromide binding site was identified near C3 and no bromide was found at C1. We address this result in Discussion (lines 276-286) as follows:

“The conformational change of Arg94 near C1 is not likely to be directly induced allosterically by bromide binding at distant C3 since it is only observed in chain A, not in chain B. Instead, this conformational change may reflect the intrinsic flexibility property of Arg94 in the tunnel in the bromide-bound state. Although both Arg94 of GtACR1 (in chain A) and Arg95 of CIR adopt a similar conformation (Fig. 4B), these two counterpart residues appear to be stabilized by distinct H-bond networks. In GtACR1, inward Arg94 only forms a salt-bridge with Asp234 and an H-bond with a water molecule (Suppl. Fig. 2A). However, in the CIR structure, in addition to the salt bridge, R95 is further stabilized by three polar residues, Asn92, Gln224, and Thr228, via two water molecules from the extracellular side of the protein (Suppl. Fig. 2B). The absence of these polar residues and waters in the vicinity may liberalize Arg94 and facilitate its flip-flopping in the tunnel of GtACR1.”

4) The authors assume that the conformational change of R94 creates a functional anion binding site with the Schiff base in GtACR1, but it is too speculative. If the anomalous difference Fourier map does not support the idea, they should delete it.

Our hypothesis (not an assumption) is based on the following facts: (1) both rhodopsin proteins GtACR1 and ClR transport the same halide substrates; (2) the chain A of GtACR1 adopts a nearly identical chemical conformation to that in the chloride-binding site (site 1) of CIR, in which the counterpart residue R95 forms a chloride binding site with the Schiff base (Fig. 4B); and (3) Arg94 is important to anion conductivity of GtACR1 (Li et al. eLife 2019). It is reasonable to hypothesize that Arg94 forms a putative anion binding site with the Schiff base in GtACR1. To make this hypothesis clear, we listed these facts in the text and rephrased our hypothesis as follows (lines 217-219): “Based on the similar chemical conformations (Fig. 4B), it is possible that Arg94 rotates its side chain to form an anion binding site with the Schiff base in GtACR1.”

#### URL

11. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

The study by Hendley et al takes advantage of duct-specific DBA-lectin expression to purify pancreatic ductal populations that were then subjected to scRNA-seq analysis. The ability to enrich for this relatively low abundant pancreatic cell population resulted in a more robust dataset that had been generated previously from whole pancreas analyses. The manuscript catalogs several different gene clusters that delineate heterogeneous subpopulations of three different pancreatic ductal subpopulations in mice: mouse pancreatic ductal cells, pancreatobiliary cells, and intra pancreatic bile duct cells. Additional comparisons of the resulting data sets with published embryonic and adult datasets is a strength of the study and allows the authors to subclassify the different ductal cell populations and facilitates the identification of potentially novel subpopulations. Pseudotime analysis also identified gene programs that led the authors to speculate the existence of an EMT axis in pancreatic ducts. Overall, the data analyses is strong, but the authors tend to draw conclusions that are not fully supported by the presented data.

The second half of this study focuses on three candidate proteins that were identified in the transcriptome analysis - Anxa3, SPP1 and Geminin. Crispr-Cas9 was used to delete each gene in an immortalized human duct cell line (HPDE). Deletion of each gene resulted in increased proliferation; SPP1 mutant cells also displayed abnormal morphology. Additional functional studies of the cell lines or in mouse models suggested a role for SPP1 in maintaining the ductal phenotype and Geminin in protecting ductal cells from DNA damage, respectively. Although the provided phenotypic analysis suggest important functional roles for these proteins, follow up studies will be required to fully understand the role of these genes in homeostatic or cancer conditions.

Strengths:

1) Enrichment of pancreatic ductal populations enhanced the robustness of the scRNA-Seq dataset

2) Quality of the sequencing data and extensive computational analysis is extremely good and more comprehensive than previously published datasets

3) Comparative analysis with existing mouse and human data sets

4) Use of human ductal cell lines and mouse models to begin to explore the function of candidate ductal genes.

Weaknesses:

1) There are many suppositions based on gene expression changes that are somewhat overstated.

2) The conclusion that there is an EMT axis in pancreatic ducts is not fully supported by the gene expression and immunofluorescence data

3) A good rationale for choosing Anxa3, SPP1 and Geminin for additional functional analysis is not provided. In addition, it isn't clear why Anxa3 function isn't pursued further.

4) Although extensive models (transplanted cells for SPP1 and mouse conditional KOs for Geminin) were generated, the functional analysis for each gene is preliminary; additional longer term studies will be necessary to fully understand the role of these proteins in pancreatic duct development and cancer.

We would like to thank the Reviewer for their fair and thoughtful review of our manuscript. We agree with the comments and have addressed them as described in detail below. In particular, we have focused on streamlining the presentation and description of our bioinformatic analysis, providing additional rationale for using the particular genes we focused on in the follow-up analyses, and including additional data to support the EMT axis.

Reviewer #2 (Public Review):

In this study the authors address the heterogeneity of the mouse ductal cell at the single cell level and conduct functional studies for selected marker genes. They isolated duct cells using the DBA lectin as a molecular surface marker. This is an noteworthy approach as it does not rely on the specificity and expression levels of reporter lines. Isolated cells contained a majority of non-duct cells that were identified by their transcriptomic profile and excluded from further analysis. The transcriptomic profiles of bona fide duct cells were then subjected to standard analyses for differentially expressed genes, activated pathways and lineage relationships. Of particular interest is the comparison of these data with human data from a recently published study that used a different sorting strategy for duct cells. As more studies at the single cell level are conducted, these types of comparisons need to become part of them in order to derive commonalities and identify deficits due to methodological or technological limitations. The study was by necessity descriptive up to this point and the authors addressed this with functional studies on SPP1 and GMNN which suggested that SPP1 is necessary for the maintenance of the ductal differentiated phenotype whereas GMNN protects cells against DNA damage during increased proliferation triggered by chronic pancreatitis.

It is an interesting study, but there are caveats, particularly concerning the functional studies. The functional analysis of SPP1 needs to be strengthened and some findings on the the analysis of GMNN clarified. There is also an over reliance on the outcome of pathways analyses and upstream regulators which are often treated as actual findings rather than possibilities to be explored in this or future studies. The single cell RNA Seq analysis would benefit from reducing speculation and restrict descriptions to the essential features of each cluster. Main figures for this analysis could also be simplified along the same lines.

We thank the reviewer for appreciating our study as “interesting” and for considering our investigations as a “noteworthy approach”. We are glad that the reviewer acknowledges our efforts in delivering a manuscript with necessary descriptive bioinformatics analysis followed up with functional studies for select subpopulation markers. Conversely, we took the constructive criticism seriously and added new data to further substantiate our claims.

Reviewer #3 (Public Review):

In this study, the authors present a high-resolution single-cell transcriptomic atlas of the pancreatic ductal tree. Using a DBA+ lectin sorting strategy murine pancreatic duct, intrapancreatic bile duct, and pancreatobiliary cells were isolated and subjected to scRNA-seq. Computational analysis of the datasets unveiled important heterogeneity within the pancreatic ductal tree and identified unique cellular states. Furthermore, the authors compared these clusters to previously reported mouse and human pancreatic duct populations and focused on the functional properties of selected duct genes, including Spp1, Anxa3 and Geminin. Overall, the results presented here suggest distinct functional roles for subpopulations of duct cells in maintenance of duct cell identity and implication in chronic pancreatic inflammation. Finally, such detailed analysis of the pancreatic duct tree is relevant also in the context of cancer biology and might help elucidating the transition from pancreatitis to pancreatic cancer and/or different predisposition to cancer.

The study is very well done, with careful controls and well-designed experiments.

We thank the reviewer for appreciating our study as “very well done” as well as envisaging the potential relevance of our findings to cancer biology.

#### URL

12. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

We thank the Reviewer #1 for their valuable comments. We agree with the Reviewer that our current results are not sufficient to confirm the therapeutic effects. The statement related to therapy is removed.

The study by Song and colleagues explores the role of circRNAs in fibrosis of the endometrium. Endometrial cells for patients with and without fibrosis were subjected to expression profiling analysis, and circPTPN12 and miR-21-5p were strongly separate in fibrosis in endometrial, with circPTPN12 acting as an inhibitory factor for miR-21-5p. Through the use of various molecular approaches, the authors further that miR-21-5p inhibition results in upregulation of ΔNp63α, and transcription factor that induces EMT. The role of circPTPN12 was also confirmed in vivo using a mouse model of mechanically induced endometrial fibrosis. The authors concluded that targeting the path circPTPN12/miR-21-5p/∆Np63α may be a therapeutic strategy for endometrial fibrosis.

The authors clearly and convincingly show the involvement of the circPTPN12/miR-21-5p/∆Np63α in EMT and its potential involvement in endometrial fibrosis. Whether or not this can be a therapeutic target is too preliminary at this point. First because the in vivo experiments confirm the link between circPTPN12/miR-21-5p/∆Np63α at the RNA level only (p63) and it would be more convincing to see protein data as well.

We did try to detect the protein of ΔNp63α in mouse with immunochemistry and immunofluorescence, using three antibodies (CST, cat# 67825 and 39692; Abcam, ab124762). Unfortunately, we did not obtain positive results. However, ΔNp63α mRNA was significantly changed.

The involvement of p63 in the process remains a little elusive in this paper.

We have reported that ΔNp63α is ectopically expressed in endometrial epithelial cells in IUA patients (Cao et al., 2018), and showed that ΔNp63α promotes the expression of SNAI1 by DUSP4/GSK3B pathway and induces EECs-EMT and fibrosis (Zhao et al., 2020). We've put this description of ΔNp63α in the discussion section (2nd paragraph).

In addition, if the authors believe this pathway can be a real future target to treat endometrial fibrosis, they could better contextualise such a statement, specifically describe what kinds of therapeutic intervention they think of, like regression or prevention of fibrosis. These should be tested in vitro and in vivo.

Our results showed that replenishing miR-21-5p can reverse EMT and remit endometrial fibrosis in vivo and in vitro. However, the therapeutic intervention of miR-21-5p in clinic needs more research on other animal models such as rats, pigs, and non-human primates. Thus, we removed therapeutic statement (page 1, Line 1-2; and page 2, Line 37-40; and page 4, Line 74-76; page 13, Line 273).

More evidence of the involvement of circPTPN12/miR-21-5p/∆Np63α and the correlation between the three players using clinical material is also necessary.

The involvement of ∆Np63α in endometrial fibrosis has been proved in our published paper and results are quoted in this paper (Zhao et al., 2020). The correlation between circPTPN12 and miR-21-5p using clinical material was listed in Figure 2J. In vivo and ex vivo experiments had confirmed that overexpression of circPTPN12 downregulates miR-21-5p and upregulates ∆Np63α (Figure 3H/Figure 4J/ Figure 5B/ Figure 5E). In addition, ex vivo experiments suggested that the decrease of ∆Np63α is secondary to the increase of miR-21-5p (Figure 4C-E).

#### URL

13. www.biorxiv.org www.biorxiv.org
1. Joint Public Review:

This behavioral study aims to provide an account of the spontaneous behavior of mice as they learn to explore a novel maze in search of a water reward. The authors analyze the trajectories of mice as they adapt to the labyrinth with particular focus on decisions taken at nodes and T junctions. They describe extremely rapid route learning to home and discontinuous exploratory learning or 'light bulb' moments as evident by instantaneous improvements in navigation performance. The authors capture most of the variance in their overall data with a predictive Markov models that could account for the much subsequent actions of the mouse as it moves from one node to the next. The study should be important to anyone who spends their time thinking about decision-making in mice. It highlights the importance of considering ethologically relevant tasks for understanding decision making in rodent species.

In this submission, the authors introduce a new experimental paradigm for the study decision making in naturalistic contexts, presenting an opportunity to observe these dynamics away from the standard two-alternative-forced-choice paradigm. The application of modern tracking and posture analysis to maze exploration by rodents generates rich and interesting data, and allows the authors to do their experiments with many animals, and with nearly no human interference or specific instructions. The design of the maze is clever, using an underlying tree-like structure (with the tree folded so it precisely and fully occupies a rectangular area), and relatively deep (6 branching points from main trunk to a leaf node). Mice explore this voluntarily, and water-restricted mice learn to find water rewards at a leaf of the maze. The authors thus study truly voluntary and highly interesting complex behavior, and in a high-throughput way. By studying the dynamics of a mouse in a maze, the authors perform a careful set of analyses, describing discontinuous learning dynamics and the effects of history on decision-making. These results should be of interest to a wide group of behavioral neuroscientists that are attempting to understand the neural basis of how animals make decisions in complicated natural environments.

The data set released with this submission will be of broad use to the community, and we would not be surprised to see dozens of papers using it moving forward.

2. Evaluation Summary:

This study lays the groundwork for a new level of precision in understanding mouse navigation behaviour by studying complex decisions that approximate those made in the wild, but can nevertheless be analysed with mathematically precise tools. Several exciting observations are made about navigation strategy. The manuscript will therefore be of broad interest across behavioural neuroscience. However, in its current form, some questions remain about some of the major claims.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

#### URL

14. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

The authors have re-sequenced 310 quinoa accessions and carried out field phenotyping of the same set of accessions for two years in order to characterize genetic diversity and analyze the genetic basis of agronomically important traits.

The main strength of the manuscript is that the authors have carefully characterized more than 300 quinoa accessions, achieving a sufficiently large population size for GWAS analysis with good statistical power. It is especially promising that the phenotypes all show high heritability. This indicates that the field phenotyping was of high quality and provides a good starting point for discovering relevant marker-trait associations. In addition, the authors provide convincing evidence for distinct population characteristics of highland and lowland quinoa, adding additional information compared to previous work (Maughan, 2012).

The weak points are related to the genotype data and the conclusions drawn based on the GWAS analysis.

1) An important issue is related to the relatively low depth of coverage (4-10x) that was used for re-sequencing. Across the accessions, there is a pronounced negative correlation between the mean sequencing depth and the heterozygosity level, indicating that heterozygotes are overcalled in individuals with low coverage. This also results in heterozygosity levels that are generally higher than expected for what is assumed to be mainly homozygous inbred lines.

2) Another potential issue concerns SNPs called in repetitive regions. Among the significant GWAS SNPs identified, a very large proportion appears to be found in intergenic regions. While this does not rule out that some of them are genuinely important associations, it does suggest a potentially high level of noise in the GWAS results. In addition to the filtering already imposed, which includes a filter for mapping quality, the SNPs called in intergenic regions with unusually high coverage could be more closely examined to determine the extent of the issue. Masking repetitive genomic regions using RepeatMasker or similar programs could be useful.

3) When the authors discuss their GWAS results, they frequently focus on cherry-picked candidate genes, although, in several cases, the top SNPs in the region in question are not found within these candidates. A more broad focus on all genes within the LD blocks, while still mentioning the candidate genes, would be more informative.

4) The manuscript includes statements that a particular genotype "results in" some phenotypic outcome, although no causal relationship has been demonstrated. In general, there is a tendency to draw too strong conclusions based on the GWAS results.

5) As this is primarily a resource paper, the authors should make the complete genotype and phenotype data as well as the layout of the field trials available. It would not be possible to reproduce the GWAS analysis based on the data included with the current version. They should also clarify how the quinoa accessions described will be made accessible to the community and provide all scripts used for data analysis through GitHub or a similar repository.

2. Reviewer #2 (Public Review):

A key genomic study on emerging, nutritious, alternative grain crop.

Deep genomic data on hundreds of land races/accessions.

Population structure analysis, could be enhanced.

Agronomic growth and yield traits are correlated and environmentally sensitive.

Genomic dissection via GWAS to multigenic loci with candidate genes add genomic prediction and selection.

Inference on domestication.

3. Reviewer #1 (Public Review):

The paper details a whole genome re-sequencing of 310 accessions of quinoa. This provides a good glimpse of diversity in this orphan crop, plus the GWAS studies are able to help provide the foundations for identifying key genes in quinoa variation. This will certainly advance our knowledge of this increasingly important orphan crop.

1) One issue that permeates the entire paper is that the analysis is fairly basic and the authors do not make full use of the data. The analysis of population diversity is restricted to PCA, ADMIXTURE and phylogenetic analysis. It would probably broaden the impact of the paper if they can do deeper analysis of quinoa diversity, maybe looking at demographic history, looking at selection of highland vs. lowland, etc.

2) There is a focus on the rapid LD decay, which the authors attribute to the short breeding history and low selection. That seems like a stretch to make this conclusion based solely on LD decay. As they point out, many other factors could account for this, and the authors should provide other lines of evidence to draw this conclusion.

3) The GWAS analysis is good and does provide a good foundation for quinoa genetics. The authors discuss possible candidate genes is these GWAS regions. For the thousand seed weight, the relative small span of the GWAS peaks allows for localization of just a few genes in the GWAS region (CqPP2C5 and the CqRING). The GWAS associated with flowering time is larger - 1 Mb with 605 genes - but the authors focus on the GLX2-1 gene. This is again a stretch, as the large region precludes narrowing the candidate list unless there was a compelling mutation (for example a deletion or insertion of a major flowering time gene).

4. Evaluation Summary:

This is a comprehensive study of genomic and phenotypic diversity in the orphan crop quinoa. Based on whole genome resequencing of 310 accessions and field phenotyping of the same set of accessions for two years, the study identified the genetic basis of agronomically important traits. Based on this promising work, there will likely be scope for quick improvement of this orphan crop through breeding.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

#### URL

15. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

Cole and co-authors report the development of a novel immunofluorescence technique, where targets of interest are analysed over iterative cycles of staining-imaging-elution(stripping). This method allows for the multiplexed analysis of protein targets, well beyond the usual constraints of such technique (limited by availability of filters and non-overlapping wavelengths of fluorophores). The authors also present several applications of such technique, highlighting how the advantage of being able to record additional parameters (such as cell morphology) can be an advantage over more high-throughput methods such as spatial-resolved transcriptomics.

The technique has been carefully tested. Staining for the same markers after several rounds of stripping/reprobing shows high concordance, indicating that the iterative treatment and staining of the same tissue section is not altering the detection of protein markers.

The authors tested staining with a total of 18 antibodies, and suggest that this number can be increased arbitrarily, as the number of iterations is not limited. Further, they suggest that this technique can be applied to virtually any tissue. It is quite possible that this technique can be readily applied to any other tissue, as the only constraint seem to be the robustness of antibodies. The authors may include the suggestion that previous success of immunofluorescence on a particular tissue type could be a good indication for the success of the iterative staining.

The proposed 4i method is quite interesting, has great potential and is likely to be of very wide interest.

2. Reviewer #2 (Public Review):

Methods to characterize cell types in intact tissue using large scale analysis of molecular expression profiles are now readily available, with the best example being in situ RNA sequencing (spatial transcriptomics). However, these methods depend on separate immunohistochemical investigations to define the precise cellular and subcellular distribution of the protein products. Cole et al use iterative indirect immunofluorescence imaging (4i, Gut et al Science 2018) to compare the immunoreactivity of an impressive 18 different molecules within the same brain sections containing the dentate gyrus from young and old mice. First, they demonstrate that the method can be applied to not only adult mouse brain tissue, but also to human embryonic stem cell derived organoids and mouse embryonic tissue, which is an advance on the original report (Gut et al 2018). This demonstration is particularly important as it shows the potential for applying 4i to different biological disciplines. The rest of the manuscript focuses on the mouse dentate gyrus (DG) at 2, 6 and 12 months of age in order to map the complex changes and associations in the tissue across age. Various combinations of the 18 molecules are used to define different cell types and it incredibly informative to be able to view so many molecules in exactly the same area and will advance the field. This is the greatest strength of the manuscript. They find that neurogenic, radial glia-like stem cells (R cells) and proliferating cells are reduced in aged animals, as are immature (DCX+) cells, but claim that fluorescence intensity increases for the remaining R cells in 12 month old mice. They report that the density of vasculature also decreased with age, as did the associated pericytes, but astrocytes associated with the blood vessels increased. The last part of the manuscript defines 'microniches' (random or targeted regions of interest within the DG) and attempts to show how cell types, especially Nestin+ R cells, change in their associations with vasculature within these sub-regions at 2, 6 and 12 months of age. It is a commendable approach and the authors use a variety of statistical tests to compare the different cell types. However, there are several parts of the methods, along with insufficient details of the results that prevent full interpretation of the data, meaning that it is difficult to determine whether all conclusions are supported.

1) There are many factors that can affect the measurements of immunoreactive structures (Fritschy, Eur J Neurosci, 2008 vol 28, p. 2365-70). The main limitation is not providing sufficient detail for the immunolabelling design and imaging parameters but providing some unclear details for the imaging analysis (below).

a. In terms of immunohistochemistry, with the impressive number of tested antibodies, there is potential for variation due to antibody antibody penetration, unreported combinations of secondary antibodies, tissue quality (variations in fixation), etc. It is difficult to have confidence in the conclusions based on a total of 3 mice per age group for a single 40 um section per mouse. Ideally, to increase confidence in individual section variability, it is recommended that measurements should be taken from at least 3 sections per mouse then averaged, before averaging for the age group.

b. Assuming there were 3 primary antibodies with 3 secondary antibodies per cycle before elution, were the combinations used consistent for all brain sections and mice? Was the testing and elution order the same (i.e. systematic)? There is a risk of cross-excitation and mis-interpretation of true immunoreactivity if spectrally close fluorophores for the secondary antibodies were selected for primary antibodies that recognize spatially overlapping structures. Can the authors show the cycle number and fluorophore for the examples in figures 1 and 2 to determine which markers were imaged together in the same cycle? This would give confidence to the methods for colocalisation and cell type descriptions. For example, can cross-excitation be ruled out for some of the signals in the images used in Fig 2 (duplicated in Fig 4) such as intensely immunopositive Laminin-B1 cells in the MT3 and Sox2 channels (2A) and Ki167, SOX2 and phospho-histone 3 channels (2C)?

c. For image acquisition, details are required on the resolution (numerical aperture of the lenses) in order to interpret colocalisation measurements in the later figures. Which beamsplitters/filters were used, and was the same laser power used for the same markers over different specimens (important for interpreting figure 4 data)?

d. For the analysis of ROIs (figures 3-6), were the 20x or 40x images used?

e. Details of the antibody specificity controls should be provided.

2) Numerous markers have been used to define different cells, but the proportions are not reported. For example, R cells are defined differently in figures 3 and 4. How many types of R cells (based on combinations of markers) were observed? High resolution examples of each defined cell type (neuronal and glial) would assist the reader in the confidence of the measurements (ideally as single channels side by side, with arrows indicating areas of detectable immunoreactivity that the authors would use to define each cell).

3) The authors use HOPX and GFAP immunoreactivity and a lack of detectable S100beta immunoreactivity to distinguish R cells from triple immunopositive mature astrocytes. In Figure 3, the images are too low power to be able to confirm this. This part would benefit from some single cell examples showing the separate channels.

a. Furthermore, the results (paragraph 2, page 7) report changes in cell number, but rather density is reported. Please either state the numbers or refer to density.

b. Related to Fig 3, there are no details of the number of R cells counted in supplementary table 1. How were the density measurements obtained? How thick were the image stacks and how many R cells per section? Similarly, as stated in methods, for glial cells, 100 cells were randomly counted in each section (presumably the same count for each age), so how was it reported that specifically the numbers of astrocytes were reduced and no significant differences in other glial cell types? (bottom of p.7)

4) An increase in fluorescence intensity for HOPX and MT3 (also marks R cells) was observed with age (Fig 4), with methods stating that the 5 ROIs used to calculate the background intensity were measured at each [optical?] slice for where the cells were measured, to account for unequal antibody penetrance. Several clarifications are required in order to interpret these results: For the example HOPX images in Fig 4A, for the 2 month old mouse, the background is low, whereas for 12 months, the background is far higher, meaning different background ROI values. Can this difference be explained by differences in laser power, contrast adjustments, optical slice thickness, or whether these are maximum intensity projections of different z thickness? These values must be reported, and for each image presented in the manuscript, details must be included as to what type of image (z-projection or single optical slice, z thickness). Was the optical section(s) of the 12 month mouse imaged closer to the surface of the section for this example in Fig 4A? Were cells sampled at all depths of the imaged volume? Did the antibody show better penetration in the 12 month old mice than the 2 month old mice? How many optical slices would a cell soma cover? In these cases, how was the fluorescence intensity measured? If a soma covered several optical slices, which one was selected for the ROI measurement?

5) The described methods for studying cellular interactions are not clear, making it difficult to interpret the associations between vasculature, cell types, and age. How was colocalisation defined, and at what resolution? For example, it is expected that GFAP would be associated with but not directly colocalized with collagen IV (Fig 5). In these cases, the manuscript would benefit from high resolution examples of this colocalization/interaction. How many ROIs were taken, how exactly were the ROIs for cell types associated with collagen IV selected, was this in 2D or 3D?

6) The methods for random microniches are difficult to follow, as are the methods for investigating the associations of other markers to radial processes of R cells. Please provide a definition of a 'spot'. Again, details of the micron per pixel resolution and optical slice thickness would help in the interpretation of results. Additionally, if possible, illustrated examples of the full procedure for niche mapping should be provided in order to follow how the measurements were collected.

3. Reviewer #1 (Public Review):

Overall the analysis is conducted well and is convincing. The characterisation of neural stem cells using 7 markers as well as their morphology and position, is particularly thorough.

My main criticism is that the study purports to address the effect of aging but the ages analysed only range from 2 months to 12-months. As 12 month-old mice are still middle aged, it is difficult to conclude anything about the process of ageing, which is usually studied in much older mice (18-24 months). Indeed, some of the changes that the authors associate with an "ageing phenotype" appear in microniches already in 2 month-old mice and are predominant at 6 months. This suggest that the authors are documenting the transition from an immature/juvenile state, which is predominant in 2 month-old mice, to a mature/adult state, which already appears at 2 months but becomes predominant at 6 and 12 months. Importantly, this adult state, including the reduced number of neural stem cells, might not be dysfunctional but on the contrary, may perform very well its role of producing small numbers of new neurons as required during adult neurogenesis.

Another, lesser concern is that, based on antibody staining performed in tissues from 2-month and 12-month-old mice, conclusions are made on the different expression levels of HOPX, MT3 and LaminB1 analysed at different ages. This assumes that the efficiency of antibody staining is the same in different samples analysed in parallel but this is not shown.

4. Evaluation Summary:

The objective of this study is to develop a novel immunofluorescence technique allowing for the multiplexed analysis of protein targets. This 4i method is an important technical advance will be of great interest for the scientific community.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

#### URL

16. www.biorxiv.org www.biorxiv.org
1. Joint Pubic Review:

Church et al. carry out a mechanistic study focused on regulation of PKA activity at a specific multiprotein complex nucleated by the scaffolding protein AKAP79. The manuscript presents a rigorous biochemical approach combined with computational modeling to address fundamental issues related to PKA signaling. This is a very important but complex system and the authors have nicely addressed it using in vitro approaches. The in vitro data provide evidence that suggests that the phosphatase calcineurin (CaN), by dephosphorylating the PKA regulatory subunit type II (RII), promotes rapid re-association of the PKA catalytic subunit (C) to RII, leading to PKA inactivation. The model proposed is that this modality of PKA inactivation takes place selectively at the multiprotein complex organized by AKAP79, where CaN, PKA and PKA phosphorylation targets are co-localized: the proximity of CaN to RII at the AKAP79 complex would enhances the efficiency of RII dephosphorylation by one order of magnitude, allowing fast re-association of C and RII subunits. This would reduce the proportion of free C subunits and therefore the level of local PKA substrate phosphorylation. Using purified the FRET reporter AKAR4 as a reporter for PKA activity, they further confirm that the level of phosphorylation of this PKA target at a given cAMP concentration depends on the ability of CaN to interact with AKAP79. Based on these findings the authors conclude that CaN anchored to AKAP79 dephosphorylates AKAP79 anchored RII, leading to fast recapturing on C and inhibition of PKA catalytic activity. They then create a kinetic model for this process where cAMP and calcium are working in opposing ways. Notably, the authors also provide an estimate for the concentration of RII subunits in the hippocampal CA1 neuropil layer and find that this falls within the range at which CaM efficiently dephosphorylated RII in vitro.

In the context of compartmentalized cAMP/PKA signaling, this mechanism would provide yet another regulatory feature to ensure specific control of target phosphorylation at individual subcellular locations. For example, in dendritic spines PKA regulates long-term depression (LTD) of CA3-CA1 hyppocampal synapses via phosphorylation of AMPA-type glutamate receptors, which is facilitated by simultaneous interaction of receptor and kinase with AKAP79. In this context, at a given cAMP concentration, CaN-dependent inhibition of PKA activity would selectively attenuate AMPA phosphorylation and LTD, while PKA may still be able to phosphorylate targets at other sites.

The paper presents very clear biochemical data but can be further strengthened by some additional attention to the following:

While the in vitro data convincingly demonstrate the requirement for CaN to be anchored to AKAP79 for efficient dephosphorylation of RII and confirm that phosphorylation of RII at S98 results in more active PKA, the requirement for RII to be anchored to AKAP-79 for this regulation is not investigated, leaving open the possibility that the more efficient dephosphorylation of RII in vitro may be due increased catalytic activity of CaN when the phosphatase is associated to AKAP79c97.

The authors show convincingly that the pRII subunits are better substrates when the AKAP scaffold is present. However, they need to address the relevance of having the enzyme (CN) and the substrate (pRIIb holoenzyme) scaffolded to the same complex so that diffusion is no longer a rate-limiting factor in the catalytic event. Are MM kinetics relevant for this process? This is a single molecule event that does not necessarily require that the product be released. Instead the product is returned to the active site of the cleft of the C-subunit in the holoenzyme:CN complex where in the cell it is rapidly re-phosphorylated. Also the authors could show what happens when you have a 1:1 concentration of CN and pRIIb. Following this single transfer event does not require dissociation of the holoenzyme and is likely to be more physiologically relevant.

Do the authors know if calcium vs. Mg influences this process? Calcium stabilizes the product whereas Mg stabilizes the substrate in the case of the kinase. If calcium levels are high following release of the phosphate, would this tend to keep the phosphorylated holoenzyme in a more inhibited state until calcium went down and cAMP went up?

This process will take place at membranes which may play a significant role in determining whether the A-subunit is released into solution or not.

Another important question to consider is whether it is even necessary to dissociate the holoenzyme complex at all. Is it sufficient, for example, to simply unleash the linker region of the RII subunit and thereby open up the active site cleft of the C-subunit? Since the tail of the channel is also tethered nearby, it is perfectly reasonable to catalyze this event without dissociating the complex especially given earlier data by Wang, et al showing that the holoenzyme is very stable even when the key arginines in the inhibitor site are mutated. The same motif has access either to the active site of the C-subunit or to the active site of calcineurin in a cAMP/Ca++ dependent cycle. This leaves the phosphorylated tail of the channel free to be dephosphorylated by other phosphatases that are also tethered to AKAP79 and leaves CN committed to recycling of the RII holoenzyme. In principle this does not require dissociation of the RII holoenzyme if CN is tethered nearby. This is a very fundamental question.

One point that is not addressed in the study and is important for the interpretation of the results is whether interaction of CaN with AKAP79c97 increases CaN activity per se, such that the more effective dephosphorylation of RII is not due to the physical proximity of CaN to RII on the AKAP but to a more active CaN. This could be addressed by testing the dephosphorylation rate of a phospho-substrate other than 32P-RII, in the absence and in the presence of AKAP79c97 or by repeating the experiments shown in Fig 1 in the presence of the AKAP79c97 variant where the PKA (391-400) anchoring site has been removed.

AKAR4 is a reversible reporter of PKA activity, so it is surprising that the authors find that its phosphorylation is not affected by CaN. One possibility is that AKAR4 is not a good substrate for CaN. However, multiple studies have shown that AKAP4 can effectively be dephosphorylated. The ability of CaN to dephosphorylate AKAR4 should be investigated further to demonstrate more robustly that, in the in vitro experimental conditions used, the observed reduced phosphorylation of AKAR4 is due to less active PKA rather than more active CaN. This could be done, for example, by repeating the experiments summarized in Figure 3-figure supplement 1C & D using a different phosphatase, to ascertain that the experimental conditions allow for detection of AKAR4 dephosphorylation.

One limitation of the in vitro work is that only AKAR4 is used to measure the level of PKA dependent phosphorylation. AKAR4 is not a natural substrate for either PKA or CaN and the accessibility of the phosphorylation site to these enzymes may be different than for physiological targets. In addition, AKAR4 is not anchored to AKAP79 and may not be the ideal reporter to investigate the effects of CaN-dependent regulation of PKA targets associated to AKAP79.

Stoichiometry of free RII subunits. The authors have shown convincingly that the RII subunits in particular are present in excess of the C-subunits, and this has led to some new concepts for PKA signaling. There are two questions that need to be addressed here. Perhaps in the discussion is adequate but they do need to be addressed. First is whether there are separate pools of free RII subunits and holoenzymes within single cells. This is essential for the model of PKA signaling taking place in the presence of a 10-fold excess free RII-subunits. Are the dissociated R-subunits in the same subcellular location? Second is whether the free RII subunits are bound to cAMP. The cAMP-free subunits are noticeably less stable and degraded more rapidly that the holoenzymes so are these free R-subunits bound to cAMP? If not, are they bound to something else that keeps them stable? RII subunits do not form membrane-less puncta as was recently reported in Cell by Zhang but is there some other mechanism that allows for the sequestration of large amounts of free RII subunits?

Do you need to saturate all four sites to have an active C-subunit that can phosphorylate the tail of a channel? This relates to the question above. Perhaps this would not be measured by the AKAR4 reporter but could it be sensed if AKAR4 were fused to the tail of AKAP79 so that it would be tethered close by similar to the tail of the channel.

Stoichiometry of two calcineurins vs. one RII holoenzyme or one? The authors need to address this stoichiometry question more rigorously. It is quite fundamental for their assays. Does the computational model provide any ability to ascertain stoichiometry of the productive complex?

While it is true that neither S/A or S/E will be substrates for CN, they will in fact have a different effect on the RII holoenzyme. Ser/Ala and Ser/Glu mutants are, in principle, quite different in terms of their accessibility to the active site of the C-subunit vs. the active site of CN. The Ser/Ala mutant, for example, should be locked into the active site of the C-subunit, and this would be presumably strengthened by ATP since this is a pseudosubstrate. Does the affinity for C-subunit change in an ATP-dependent manner? The Ser/Ala mutant should be a good inhibitor that cannot be regulated by phosphorylation. It could be activated by high concentrations of cAMP but not by the cAMP signaling that is being described here. The Ser/Glu mutation would favor docking into the active site of CN but would be trapped in this state as it also could not be dephosphorylated. Is this consistent with the models proposed by the authors?

The in vivo work to assess the physiological relevance on this proposed new modality of PKA regulation is very preliminary. By overexpressing S97A and S97E mutants of RII in hippocampal neurons the authors confirm that modulation of PKA sensitivity to cAMP via RII phosphorylation affects spine density. However, no experimental data directly assess the role of CaN-dependent dephosphorylation of RII at the AKAP79 complex and there is no evidence that this mechanism regulates AMPA phosphorylation or phosphorylation of other physiologically relevant targets. Thus, the caveats that are associated with the system and in particular the physiological relevance of the analyses needs to be addressed. Conclusions based on the preliminary 'in cell' data on physiological relevance should be appropriately tempered.

2. Evaluation Summary:

This manuscript will be of interest to neuroscientists as well as a broad audience of cell biologists, as it provides new insight into the myriad of cellular functions regulated by the well-studied cAMP-dependent protein kinase, PKA. Rigorous biochemical data supports a model for PKA inactivation wherein dephosphorylation of the PKA regulatory subunit within a multiprotein complex leads to rapid capture of the PKA catalytic subunit limiting signaling duration. Overall, the biochemical data and modeling support the conclusions although a few details can be addressed further and the in vivo data remains preliminary. The work nevertheless presents exciting findings that provide a tantalizing mechanism to selectively modulate PKA activity at precise subcellular locations.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

#### URL

17. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

The manuscript by Xiang and Bartel explores the molecular coupling of poly(A) tail length and translational efficiency (TE) in frog oocytes and various mammalian cell lines. From their experiments they draw several broad conclusions. Firstly, it is that limiting amounts of PABPC in frog oocytes is the basis for coupling between poly(A) tail length and TE. Secondly, in mammalian somatic cell lines PABPC contributes little to TE and transcript with TUT4 and TUT7-mediated uridylation promoting degradation of transcript with short poly(A) tails. Overall, the experimental design is excellent. The conclusions drawn from the frog oocytes are strongly supported by the data provided whereas the cell line studies are more open to interpretation due to the drastic consequences of PABPC depletion.

2. Reviewer #2 (Public Review):

Poly(A) tails are generally thought to stabilize mRNAs and promote translation. However, the mechanisms of this process have been difficult to experimentally assess due to the essential nature of poly(A) binding proteins, homeostatic mechanisms in gene expression, and the pleiotropic effects of altering the transcription, translation or mRNA decay machinery. The length of poly(A) tails are directly proportionally to translational efficiency in early development - the longer the tail, the more efficiently the mRNA is translated - possibly through a closed loop model. However, experiments in other cells, as well as in vitro reconstitution and imaging of single mRNAs in cells, do not support either coupling of poly(A) tail length and TE, or the closed loop model. Thus, it appears that there is a switch from embryonic to post-embryonic regulation of TE. The mechanistic basis for this switch was unclear.

Here, Xiang and Bartel use reporter assays and transcriptome-wide sequencing technologies, alongside other complementary experiments, to determine the specific circumstances that permit coupling of poly(A) tail length and translational efficiency. The authors are able to synthesize many observations - both from their own lab and from others - to come up with a unified hypothesis. Many of the individual findings have been previously reported or hypothesized but no other work has brought all of these together in one study.

Overall, the data strongly support the conclusions. Importantly, several different cell types and systems are used. In addition, a number of different methods support the work - including reporter assays, global analyses, experiments in extracts, oocytes and cell lines, etc.

A description of events that lead to the switch from embryonic to post-embryonic regulation is still lacking. However, the insight provided here is substantial. It will have influence on many areas of study of gene expression - for example, it helps to explain discrepancies in miRNA function.

3. Reviewer #1 (Public Review):

This is an excellent manuscript in which Bartel and colleagues use an abundance of approaches to provide compelling evidence relevant to the coupling between poly(A)-tail length and translational efficiency. Without reiterating the results, the data are convincing and the paper is clearly written. Any concerns are too trivial to articulate.

4. Evaluation Summary:

This manuscript addresses a long-standing question, namely how does the poly(A) tail influence translational efficiency? It will therefore be of broad interest to readers from many areas of molecular biology including those interested in translation, mRNA stability, development and gene expression in general. The authors convincingly set out three criteria that must be met for coupling of poly(A) tail length with translation.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

#### URL

18. www.medrxiv.org www.medrxiv.org
1. Author Response:

Reviewer #1 (Public Review):

Strengths:

1) The model structure is appropriate for the scientific question.

2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

3) Generally speaking, the analysis supports the conclusions.

Other considerations:

1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the the current model.

We agree that this would be an important extension for future work and have indicated this in the Discussion, along with the types of data necessary to fit such models:

“Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation \cite{Mulberry2021-tc}. These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population & within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC & Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence & CI among these groups that is predicted in Figure 3 & Sup fig 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use & effective of NPIs, etc….). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

We have highlighted this important limitation in the Discussion:

“Finally, we have estimated model parameters using a single cross-sectional serosurvey. To improve estimates and the ability to distinguish between model structures, future studies should use longitudinal serosurveys or case data stratified by race and ethnicity and corrected for underreporting; the challenge will be ensuring that such data are systematically collected and made publicly available, which has been a persistent barrier to research efforts \cite{Krieger2020-ss}. Addressing these data barriers will also be key for translating these and similar models into actionable policy proposals on vaccine distribution and non-pharmaceutical interventions.”

3) I think the range of R0 in the figures should be extended to go as as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 & 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

We agree this would be of interest and have extended the range of R0 values. Figure 1 has been updated accordingly (see below); we also updated the text with new findings: “After fitting the models across a range of $\epsilon$ values, we observed that as $\epsilon$ increases, HITs and epidemic final sizes shifted higher back towards the homogeneous case (Figure \ref{fig:model2}, Figure 1-figure supplement 4); this effect was less pronounced for $R_0$ values close to 1.”

Figure 1: Incorporating assortativity in variable exposure models results in increased HITs across a range of $R_0$ values. Variable exposure models were fitted to NYC and Long Island serosurvey data.

4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

We have conducted additional analyses exploring the important suggestion by the reviewers that social distancing could affect these conclusions. The text and figures have been updated accordingly:

“Finally, we assessed how robust these findings were to the impact of social distancing and other non- pharmaceutical interventions (NPIs). We modeled these mitigation measures by scaling the transmission

rate by a factor $\alpha$ beginning when 5\% cumulative incidence in the population was reached. Setting the duration of distancing to be 50 days and allowing $\alpha$ to be either 0.3 or 0.6 (i.e. a 70\% or 40\% reduction in transmission rates, respectively), we assessed how the $R_0$ versus HIT and final epidemic size relationships changed. We found that the $R_0$ versus HIT relationship was similar to in the unmitigated epidemic (Figure 1-figure supplement 5). In contrast, final epidemic sizes depended on the intensity of mitigation measures, though qualitative trends across models (e.g. increased assortativity leads to greater final sizes) remained true (Figure 1-figure supplement 6). To explore this further, we systematically varied $\alpha$ and the duration of NPIs while holding $R_0$ constant at 3. We found again that the HIT was consistent, whereas final epidemic sizes were substantially affected by the choice of mitigation parameters (Figure 1-figure supplement 7); the distribution of cumulative incidence at the point of HIT was also comparable with and without mitigation measures (Figure 2-figure supplement 8). The most stringent NPI intensities did not necessarily lead to the smallest epidemic final sizes, an idea which has been explored in studies analyzing optimal control measures \cite{Neuwirth2020- nb,Handel2007-ee}. Longitudinal changes in incidence rate ratios also were affected by NPIs, but qualitative trends in the ordering of racial and ethnic groups over time remained consistent (Figure 3- figure supplement 3).

Figure 1-figure supplement 6: Final epidemic sizes versus $R_0$ in variable exposure models with mitigation measures for $\alpha = 0.3$ (top) and $\alpha = 0.6$ (bottom). NPIs were initiated when cumulative incidence reached 5\% in all models and continued for 50 days. Models were fitted to NYC and Long Island serosurvey data.

Figure 1-figure supplement 7: Sensitivity analysis on the impact of intensity and duration of NPIs on final epidemic sizes. HIT values for the same mitigation parameters were 46.4 $\pm$ 0.5\% (range). The smallest final size, corresponding to $\alpha = 0.6$ and duration = 100, was 51\%. Census-informed assortativity models were fit to Long Island seroprevalence data. NPIs were initiated when cumulative incidence reached 5\% in all models.

See points 1 and 2 above for examples of additional data required.

Minor issues:

-This is subjective but I found the words "active" and "high activity" to describe increases in contacts per day to be confusing. I would just say more contacts per day. It might help to change "contacts" to "exposure contacts" to emphasize that not all contacts are high risk.

To clarify this, we have replaced instances of “activity level” (and similar) with “total contact rate”, indicating the total number of contacts per unit time per individual; e.g. “The estimated total contact rate ratios indicate higher contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data \cite{Chang2020-in}; however, the magnitudes of the ratios are substantially higher than we expected given the findings from those studies.”

We have also clarified our definition of contacts: “We define contacts to be interactions between individuals that allow for transmission of SARS-CoV-2 with some non-zero probability.”

-The abstract has too much jargon for a generalist journal. I would avoid words like "proportionate mixing" & "assortative" which are very unique to modeling of infectious diseases unless they are first defined in very basic language.

We have revised the abstract to convey these same concepts in a more accessible manner: “A simple model where interactions occur proportionally to contact rates reduced the HIT, but more realistic models of preferential mixing within groups increased the threshold toward the value observed in homogeneous populations.”

-I would cite some of the STD models which have used similar matrices to capture assortative mixing.

We have added a reference in the assortative mixing section to a review of heterogeneous STD models: “Finally, under the \textit{assortative mixing} assumption, we extended this model by partitioning a fraction $\epsilon$ of contacts to be exclusively within-group and distributed the rest of the contacts according to proportionate mixing (with $\delta_{i,j}$ being an indicator variable that is 1 when $i=j$ and 0 otherwise) \cite{Hethcote1996-bf}:”

-Lines 164-5: very good point but I would add that members of ethnic / racial groups are more likely to be essential workers and also to live in multigenerational houses

We have added these helpful examples into the text: “Variable susceptibility to infection across racial and ethnic groups has been less well characterized, and observed disparities in infection rates can already be largely explained by differences in mobility and exposure \cite{Chang2020-in,Zelner2020- mb,Kissler2020-nh}, likely attributable to social factors such as structural racism that have put racial and ethnic minorities in disadvantaged positions (e.g., employment as frontline workers and residence in overcrowded, multigenerational homes) \cite{Henry_Akintobi2020-ld,Thakur2020-tw,Tai2020- ok,Khazanchi2020-xu}.”

-Line 193: "Higher than expected" -> expected by who?

We have clarified this phrase: “The estimated total contact rate ratios indicate higher exposure contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data \cite{Chang2020-in}; however, the magnitudes of the ratios are substantially higher than we expected given the findings from those studies.”

-A limitation that needs further mention is that fact that race & ethnic group, while important, could be sub classified into strata that inform risk even more (such as SES, job type etc….)

We agree and have added this to the Discussion: “Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation \cite{Mulberry2021-tc}. These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

Reviewer #2 (Public Review):

Overall I think this is a solid and interesting piece that is an important contribution to the literature on COVID-19 disparities, even if it does have some limitations. To this point, most models of SARS-CoV-2 have not included the impact of residential and occupational segregation on differential group-specific covid outcomes. So, the authors are to commended on their rigorous and useful contribution on this valuable topic. I have a few specific questions and concerns, outlined below:

We thank the reviewer for the supportive comments.

1) Does the reliance on serosurvey data collected in public places imply a potential issue with left-censoring, i.e. by not capturing individuals who had died? Can the authors address how survival bias might impact their results? I imagine this could bring the seroprevalence among older people down in a way that could bias their transmission rate estimates.

We have included this important point in the limitations section on potential serosurvey biases: “First, biases in the serosurvey sampling process can substantially affect downstream results; any conclusions drawn depend heavily on the degree to which serosurvey design and post-survey adjustments yield representative samples \cite{Clapham2020-rt}. For instance, because the serosurvey we relied on primarily sampled people at grocery stores, there is both survival bias (cumulative incidence estimates do not account for people who have died) and ascertainment bias (undersampling of at-risk populations that are more likely to self-isolate, such as the elderly) \cite{Rosenberg2020-qw,Accorsi2021-hx}. These biases could affect model estimates if, for instance, the capacity to self-isolate varies by race or ethnicity -- as suggested by associations of neighborhood-level mobility versus demographics \cite{Kishore2020- sy,Kissler2020-nh} -- leading to an overestimate of cumulative incidence and contact rates in whites.”

2) It might be helpful to think in terms of disparities in HITs as well as disparities in contact rates, since the HIT of whites is necessarily dependent on that of Blacks. I'm not really disagreeing with the thrust of what their analysis suggests or even the factual interpretation of it. But I do think it is important to phrase some of the conclusions of the model in ways that are more directly relevant to health equity, i.e. how much infection/vaccination coverage does each group need for members of that group to benefit from indirect protection?

We agree with this important point and indeed this was the goal, in part, of the analyses in Figure 2. We have added additional text to the Discussion highlighting this: “Projecting the epidemic forward indicated that the overall HIT was reached after cumulative incidence had increased disproportionately in minority groups, highlighting the fundamentally inequitable outcome of achieving herd immunity through infection. All of these factors underscore the fact that incorporating heterogeneity in models in a mechanism-free manner can conceal the disparities that underlie changes in epidemic final sizes and HITs. In particular, overall lower HIT and final sizes occur because certain groups suffer not only more infection than average, but more infection than under a homogeneous mixing model; incorporating heterogeneity lowers the HIT but increases it for the highest-risk groups (Figure \ref{fig:hitcomp}).”

For vaccination, see our response to Reviewer #1 point 4.

3) The authors rely on a modified interaction index parameterized directly from their data. It would be helpful if they could explain why they did not rely on any sources of mobility data. Are these just not broken down along the type of race/ethnicity categories that would be necessary to complete this analysis? Integrating some sort of external information on mobility would definitely strengthen the analysis.

This is a great suggestion, but this type of data has generally not been available due to privacy concerns from disaggregating mobility data by race and ethnicity (Kishore et al., 2020). Instead, we modeled NPIs as mentioned in Reviewer #1 point 4, with the caveat that reduction in mobility was assumed to be identical across groups. We added this into the text explicitly as a limitation: “Third, we have assumed the impact of non-pharmaceutical interventions such as stay-at-home policies, closures, and the like to equally affect racial and ethnic groups. Empirical evidence suggests that during periods of lockdown, certain neighborhoods that are disproportionately wealthy and white tend to show greater declines in mobility than others \cite{Kishore2020-sy,Kissler2020-nh}. These simplifying assumptions were made to aid in illustrating the key findings of this model, but for more detailed predictive models, the extent to which activity level differences change could be evaluated using longitudinal contact survey data \cite{Feehan2020-ta}, since granular mobility data are typically not stratified by race and ethnicity due to privacy concerns \cite{Kishore2020-mg}.”

Reviewer #3 (Public Review):

Ma et al investigate the effect of racial and ethnic differences in SARS-CoV-2 infection risk on the herd immunity threshold of each group. Using New York City and Long Island as model settings, they construct a race/ethnicity-structured SEIR model. Differential risk between racial and ethnic groups was parameterized by fitting each model to local seroprevalence data stratified demographically. The authors find that when herd immunity is reached, cumulative incidence varies by more than two fold between ethnic groups, at approximately 75% of Hispanics or Latinos and only 30% of non-Hispanic Whites.

This result was robust to changing assumptions about the source of racial and ethnic disparities. The authors considered differences in disease susceptibility, exposure levels, as well as a census-driven model of assortative mixing. These results show the fundamentally inequitable outcome of achieving herd immunity in an unmitigated epidemic.

The authors have only considered an unmitigated epidemic, without any social distancing, quarantine, masking, or vaccination. If herd immunity is achieved via one of these methods, particularly vaccination, the disparities may be mitigated somewhat but still exist. This will be an important question for epidemiologists and public health officials to consider throughout the vaccine rollout.

We thank the reviewer for the detailed and helpful summary and suggestions.

2. Reviewer #3 (Public Review):

Ma et al investigate the effect of racial and ethnic differences in SARS-CoV-2 infection risk on the herd immunity threshold of each group. Using New York City and Long Island as model settings, they construct a race/ethnicity-structured SEIR model. Differential risk between racial and ethnic groups was parameterized by fitting each model to local seroprevalence data stratified demographically. The authors find that when herd immunity is reached, cumulative incidence varies by more than two fold between ethnic groups, at approximately 75% of Hispanics or Latinos and only 30% of non-Hispanic Whites.

This result was robust to changing assumptions about the source of racial and ethnic disparities. The authors considered differences in disease susceptibility, exposure levels, as well as a census-driven model of assortative mixing. These results show the fundamentally inequitable outcome of achieving herd immunity in an unmitigated epidemic.

The authors have only considered an unmitigated epidemic, without any social distancing, quarantine, masking, or vaccination. If herd immunity is achieved via one of these methods, particularly vaccination, the disparities may be mitigated somewhat but still exist. This will be an important question for epidemiologists and public health officials to consider throughout the vaccine rollout.

3. Reviewer #2 (Public Review):

Overall I think this is a solid and interesting piece that is an important contribution to the literature on COVID-19 disparities, even if it does have some limitations. To this point, most models of SARS-CoV-2 have not included the impact of residential and occupational segregation on differential group-specific covid outcomes. So, the authors are to commended on their rigorous and useful contribution on this valuable topic. I have a few specific questions and concerns, outlined below:

1) Does the reliance on serosurvey data collected in public places imply a potential issue with left-censoring, i.e. by not capturing individuals who had died? Can the authors address how survival bias might impact their results? I imagine this could bring the seroprevalence among older people down in a way that could bias their transmission rate estimates.

2) It might be helpful to think in terms of disparities in HITs as well as disparities in contact rates, since the HIT of whites is necessarily dependent on that of Blacks. I'm not really disagreeing with the thrust of what their analysis suggests or even the factual interpretation of it. But I do think it is important to phrase some of the conclusions of the model in ways that are more directly relevant to health equity, i.e. how much infection/vaccination coverage does each group need for members of that group to benefit from indirect protection?

3) The authors rely on a modified interaction index parameterized directly from their data. It would be helpful if they could explain why they did not rely on any sources of mobility data. Are these just not broken down along the type of race/ethnicity categories that would be necessary to complete this analysis? Integrating some sort of external information on mobility would definitely strengthen the analysis.

4. Reviewer #1 (Public Review):

Strengths:

1) The model structure is appropriate for the scientific question.

2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

3) Generally speaking, the analysis supports the conclusions.

Other considerations:

1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the the current model.

2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population & within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC & Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence & CI among these groups that is predicted in Figure 3 & Sup fig 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use & effective of NPIs, etc....). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

3) I think the range of R0 in the figures should be extended to go as as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 & 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

Minor issues:

-This is subjective but I found the words "active" and "high activity" to describe increases in contacts per day to be confusing. I would just say more contacts per day. It might help to change "contacts" to "exposure contacts" to emphasize that not all contacts are high risk.

-The abstract has too much jargon for a generalist journal. I would avoid words like "proportionate mixing" & "assortative" which are very unique to modeling of infectious diseases unless they are first defined in very basic language.

-I would cite some of the STD models which have used similar matrices to capture assortative mixing.

-Lines 164-5: very good point but I would add that members of ethnic / racial groups are more likely to be essential workers and also to live in multigenerational houses

-Line 193: "Higher than expected" -> expected by who?

-A limitation that needs further mention is that fact that race & ethnic group, while important, could be sub classified into strata that inform risk even more (such as SES, job type etc....)

5. Evaluation Summary:

This excellent paper by Ma and colleagues assesses the role of assortative mixing in regards to racial and ethnic disparities to estimate herd immunity thresholds (HIT) for SARS-CoV-2. The paper is conceptual in nature and builds on similar models which have been particularly useful to understand the dynamics of sexually transmitted diseases. The model is explained well and the paper is clearly written. The conclusions are justified by the analysis. One limitation is that the model is trained against a single cross-sectional seroprevalence estimate (one in NYC & one in Long Island) which allows for multiple models (ranging from homogeneous mixing to proportionate mixing) to recapitulate the data and in turn does not allow general estimates of HIT for these regions. It is also unclear if a more realistic epidemic simulation that included repeated waves of infection &/or vaccine roll out would change the conclusions regarding HIT according to race and ethnicity.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

#### URL

19. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

The gist of this work is that the simple concept of a solubility product determines a threshold for phase separation, thereby enabling buffering even in systems where phase separation is driven by heterotypic interactions. The solubility product or SP is determined by the number of complementary interaction sites and the coordination number i.e., the number of bonds one can make per site.

The work appears to be motivated by two questions: Are concentrations buffered in systems where heterotypic interactions drive phase separation thereby negating the presence of a rigorously definable saturation concentration? This question was motivated by work from Klosin et al., showing how phase separation can enable buffering of noise in transcription. They relied on the concept of a saturation concentration. In a paper that followed a few months after, Riback et al., showed that the concept of a saturation concentration ceases to exist, as defined for systems where phase separation is driven purely by homotypic interactions. This was taken to imply that the formation of multicomponent condensates via a blend of homotypic and heterotypic interactions causes a loss of buffering capacity afforded by phase separation. The second question motivating the current work is the apparent absence of a theoretical framework for "varying threshold concentrations" in systems governed by heterotypic interactions.

Using two flavors of simulations, the authors propose that the SP sets an upper limit on the convolution of concentrations that determine phase separation. They show this via simulations where they follow the formation of clusters formed by linear multivalent macromolecules and monitor the emergence of a bimodal distribution of clusters. In 1:1 mixtures of multivalent macromolecules they find that SP sets a threshold beyond which a bimodal distribution of clusters emerges. The authors further find that SP sets an upper limit even in systems that deviate from the 1:1 stoichiometry.

The authors proceed to show that the SP is influenced by the valence of multivalent macromolecules. They also demonstrate that short rigid linkers can cause an arrest of phase separation through a so-called "dimer trap" reminiscent of the "magic number" postulate put forth by Wingreen and colleagues.

Is the work significant, novel, and timely? Effectively the authors propose that the driving forces for phase separation can be distilled down to the concept of a solubility product. Given prior knowledge of the valence, coordination number, and affinities can one predict concentration thresholds for phase separation? The authors suggest that this can be gleaned from either network based simulations, which are very inexpensive, or through more elaborate simulations. They further propose that it is the solubility product that sets the threshold.

It is worth noting that the authors are quantifying what is known in the physical literature as a percolation threshold. The seminal work of Flory and Stockmayer dating back to the 1940s showed how one can calculate a percolation threshold by taking in prior knowledge of valence, coordination numbers, and affinities whilst ignoring cooperativity. These ideas have been refined and advanced in several theoretical contributions by various labs. While none of the papers in the physical literature use the concept of a solubility product, they rely on the concept of a percolation threshold because the transition to large, system-spanning clusters is a continuous one and it is debatable if this is a bona fide phase transition. Rather it is a topological transition.

Yes, we agree that the novelty and importance of our work rests in the application of the simple and accessible concept of solubility product, which has not been previously considered in relation to LLPS. The relationship of our analysis to the physics underlying phase diagrams is discussed in a new paragraph within the Discussion.

As for novelty, unfortunately the authors disregard prior work that showed how linker length impacts local vs. global cooperativity in phase transitions that combine phase separation and percolation. Ref. 23 is the work in question and it is mentioned in passing, even though the contributions here are entirely a redux.

We have eliminated the results on how molecular structural features control LLPS to fully focus our paper on the Ksp concept, as suggested by the Editor. However, in our original manuscript, we described results not just related to linker length, but also steric effects.

The concept of a solubility product, introduced here to model / understand phase behavior of multivalent macromolecules, is an interesting and potentially appealing simple description. It might make the understanding of phase transitions more accessible, but it has problems: (a) it does not define phase separation; rather it defines percolation transitions; (b) without prior knowledge of the relevant quantities, the solubility product cannot be readily inferred, even from simulations, although one can scan parameter space to arrive at predictions regarding the apparent valence and coordination numbers. (c) the solubility product does not tell us much about properties of condensates, interfaces, or the driving forces for phase transitions that are influenced by the collective effects of interaction domains / motifs and spacers.

Recent papers have drawn attention to the potential importance of buffering as a biological function of biomolecular condensation, and also the failure of buffering in heterotypic LLPS. We felt that the Ksp would help “rescue” the idea of buffering, as Reviewer 1 has so aptly put it below. We have refocused the paper to emphasize this. Of course, we describe this for a series of ideal systems with known valency and affinities. However, theoretical systems are always “ideal” and the deviations from ideality are what make experiments so vital. We have added a paragraph in the Discussion that relates our work to the physics of phase transitions, providing 2 citations, [13, 21], to support taking the percolation threshold as a proxy for the phase boundary. We also point out at the end of the Discussion, how the Ksp concept might be validated experimentally and might be useful in categorizing the effective valency of molecules comprising a cellular condensate.

Finally, as for the absence of a theoretical explanation for the apparent loss of buffering in systems with heterotypic interactions, the authors would do well to see the work of Choi et al., published in PLoS Comput. Biol. in 2019. Figure 12 in that work clearly establishes that the concentrations of A and B species in the coexisting dilute phase are set by the slopes of tie lines - the lines of constant chemical potential. These slopes are set by the relative strengths of homotypic vs. heterotypic interactions, and to zeroth order, that is the physical explanation.

We apologize for missing this very relevant work and have now cited it several times in the paper. However, as Reviewer 1, states, Figure 12 treats the potential competition between homotypic and heterotypic interactions within a system. We did not address this in our paper. Rather, for our purposes, homotypic interactions are a special case that still fits within the solubility product framework. We do now address the relationship of tie-lines in phase diagrams to the Ksp in the Discussion paragraph mentioned above.

Reviewer #2 (Public Review):

This paper asks whether systems composed of more than one component (heterotypic) that undergo liquid-liquid phase separation will follow the same rules as ionic solutions. The question is motivated by (i) the behavior of homotypic solutions, where after phase separation, monomer concentrations remain fixed despite addition of new components, which is not true for heterotypic systems and (ii) the known behavior of multivalent ionic salts. This idea has not previously been tested. They show quite clearly through simulations that the solubility product, Ksp, can be used as a quantitative metric to delineate phase transition behavior in heterotypic systems. This is a valuable contribution to the understanding of phase separation in these systems, and could be impactful in analyzing experimental observables, at least in vitro, to determine the valency of interacting systems. It provides a relatively straightforward conceptual basis for observed partitioning of components into dilute and dense phases. The result seems robust and likely to be reproducible experimentally and through alternative simulation studies, particularly given its established history in quantifying the related phenomena in ionic salts.

A weakness is the rather qualitative comparison to experiment, which is justified by the authors based on the unknown valency of the experimental system. There is also no quantitative comparison between simulation types (spatial vs non-spatial). However, the simulations do seem sufficiently detailed to test and validate the Ksp concept.

Strengths:

• The paper is very focused, and uses multiple simulation 'experiments' to test the role of the Ksp in delineating the phase transition, showing good agreement for multiple systems, with both matched and distinct stoichiometries between the components. They see typical behavior at the phase transition point, where they observe the largest variability or fluctuations in the formation of the dense phase. Thus the results strongly support the conclusion that the Ksp delineates phase transitions in these 2-3 component systems.

• A comparison is made to a recent experimental result with three components, showing qualitative agreement with an observed lack of buffering, which was unexpected at the time due to the behavior observed for homotypic systems. Here this result is now rationalized via the Ksp, which does plateau despite the monomer concentrations changing.

• Spatial simulations probe the role of structure and flexibility in impacting phase separation, finding general agreement with previously published experimental and modeling work. These observations about flexibility and matched valency are also relatively intuitive.

Weaknesses

• There is no quantitative comparison between the two simulation approaches (spatial and non-spatial), which should be straightforward. By using the same composition and KD in both types of simulations and directly comparing outcomes, it would help explain when and why the spatial simulations differ from the non-spatial ones-see subsequent comments below:

• A related methodological point: On Line 97 it states that NFSim does not allow intramolecular bonds to form, but this is not true. On one hand, they can be written out explicitly. E.g. A(a1!1, a2).B(b1!1, b2)->A(a1!1, a2!2).B(b1!1, b2!2), would form a second bond between an AB complex that already had one bond. While quite tedious, these could be enumerated, allowing for the zippering effect they see spatially, although the rates would not be bimolecular. This would still leave out intra-complex bonds between proteins without a direct link. However, based on the NFsim website, by default it does in fact allow these types of intra-complex bonds to be formed (http://michaelsneddon.net/nfsim/pages/support/support.html) see "Reactant Connectivity Enforcement". So it is not clear to me which option was used in this paper. According to what is written in the methods, no intra-complex bonds are formed, but this is not the default in NFsim and is indeed allowable.

The reviewer misinterpreted this admittedly unclear statement: “The binding rules only allow inter-molecular binding; internal bond formation within the molecular clusters is not permitted, as NFsim cannot account for proximity of binding sites within clusters.” We did not intend this to imply that NFSim does not support intramolecular binding; rather we meant that our choice was to only allow intermolecular bond formation. We made this choice because, being non-spatial, NFSIM cannot account for spatial proximity or steric effects. We have clarified this in the revised ms as follows: “We chose binding rules to only allow inter-molecular binding; we felt this was appropriate because NFsim cannot account for spatial proximity of binding sites or steric crowding within clusters.”

• The spatial simulations do not show the bimodal distribution under the fixed concentrations (Fig S9). This is a significant difference from the non-spatial result. They attribute this to a 'dimer trap', but given they see the dense phase in the clamped monomer simulations, this cannot be the only explanation. What about kinetic effects, due to the differences in initial concentrations of monomers in the two simulation approaches? The rate constants are not listed anywhere. They only seem to see large clusters at fixed concentrations for the mismatched sizes (Fig S12B), where the Ksp behavior does not hold. Can they increase monomer flexibility more and start to see bimodal at fixed concentration, or change the rates and see a bimodal distribution?

In general, there is a limited ability of a small number of molecules in the FTC simulations to form a clear bimodal distribution, whether spatial or non-spatial. This is directly demonstrated in Figure 1C, where the non-spatial simulations become increasingly bimodal as the number of molecules increases, keeping concentration constant. Because of the greater computational cost of SpringSaLaD calculations, we kept the FTC simulations in Figure 7 to 200 molecules. However, the histograms that are averaged over 50 runs obscure the clear separation that is apparent when examining molecule size distribution in individual trajectories for the FTC case. We now include these in the supporting figures as Figure 1- figure supplement 3 (NFsim) and Figure 7- figure supplement 2 (SpringSaLaD). Above Ksp, we see a consistent group of small oligomers (which is reinforced in the averaged histograms) and individual large clusters (which are smeared out in the average histograms). As Reviewer 2 noted, we were also able to convincingly demonstrate bimodality at and above Ksp with the CMC simulations, which are allowed to continue until they stochastically nucleate large clusters and take off.

All the FTC simulations are run to steady state, so only the Kds should matter, not the rate constants, which were actually available in the input files in the Git repository; we have now included the SpringSaLaD rate constants in the manuscript as well.

• Related-I am surprised that the sterically hindered monomers would not form large clusters at fixed concentration, as it looks like it is impossible for them to 'zipper' up their binding sites and become trapped in dimers. Is the distribution at fixed concentrations bimodal? The data is not shown.

We have removed the additional spatial simulation Results for structures other than the one in Figure 7 as requested by the Editor. We hope to thoroughly explore the molecular-structural determinants of Ksp and LLPS in a subsequent paper.

Reviewer #3 (Public Review):

In this work, Chattaraj and colleagues utilize simulation models to study collective behaviors of molecules with multiple binding sites (multivalency). When the concentrations are low, the molecules do not bind to each other frequently, and they are called free. On the other hand, if the concentrations increase, they start to bind and eventually form a wide network of molecules connected by molecular binding. This transition can be considered as a model for liquid-liquid phase separation. Their major claim is that the solubility product, a simple product of the concentrations of the free molecules, can be used as a proxy to the phase separation threshold (known as the saturation concentration). They observed in various simulation conditions that as the total concentration of molecules increases, the solubility product first increases but eventually converges to a certain value, and the value is consistent over different simulation conditions. The value is the upper limit of the solubility product, after which the molecules start to form a molecular network.

After establishing the model, they tested systems with different valences. Higher valency leads to reduction of the threshold (and phase separation occurs at lower concentrations). The theory was also valid for systems with non-equal valences (e.g. pentavalent A + trivalent B). They applied their models to a three-component system, and found that the results qualitatively explain the published experimental patterns. Lastly, using off-lattice coarse-grained simulations, they show that the linker flexibility and the spacing of binding sites are important determinants of the threshold, which confirms the findings from other computational and experimental works.

The authors successfully defend their claim by using different types of simulations, and their methods to crosscheck the physical validity of their models may be useful for other simulation works. For example, the authors checked if increasing the number of molecules and reducing the system size give the same results for equal concentrations. Also, they employed two different methods (so-called FTC and CMC in the manuscript) to determine the threshold concentrations. However, the conclusions are not easily transferable to real biopolymer systems, since it is hard to determine the valences (and binding affinities) of biopolymers such as intrinsically disordered proteins.

Our work was motivated by recent work highlighting the importance of buffering as a biological function of biomolecular condensation, but also the failure of buffering in heterotypic LLPS. We realized that Ksp offers a more general framework than buffering that encompasses complex multicomponent (heterotypic) systems. But our original manuscript was not sufficiently focused on this primary motivation and has been revised accordingly. Of course, we used simulations on ideal systems to establish this idea. We suggest at the end of the discussion that the Ksp concept may potentially be used to derive effective parameters for experimental systems.

2. Reviewer #3 (Public Review):

In this work, Chattaraj and colleagues utilize simulation models to study collective behaviors of molecules with multiple binding sites (multivalency). When the concentrations are low, the molecules do not bind to each other frequently, and they are called free. On the other hand, if the concentrations increase, they start to bind and eventually form a wide network of molecules connected by molecular binding. This transition can be considered as a model for liquid-liquid phase separation. Their major claim is that the solubility product, a simple product of the concentrations of the free molecules, can be used as a proxy to the phase separation threshold (known as the saturation concentration). They observed in various simulation conditions that as the total concentration of molecules increases, the solubility product first increases but eventually converges to a certain value, and the value is consistent over different simulation conditions. The value is the upper limit of the solubility product, after which the molecules start to form a molecular network.

After establishing the model, they tested systems with different valences. Higher valency leads to reduction of the threshold (and phase separation occurs at lower concentrations). The theory was also valid for systems with non-equal valences (e.g. pentavalent A + trivalent B). They applied their models to a three-component system, and found that the results qualitatively explain the published experimental patterns. Lastly, using off-lattice coarse-grained simulations, they show that the linker flexibility and the spacing of binding sites are important determinants of the threshold, which confirms the findings from other computational and experimental works.

The authors successfully defend their claim by using different types of simulations, and their methods to crosscheck the physical validity of their models may be useful for other simulation works. For example, the authors checked if increasing the number of molecules and reducing the system size give the same results for equal concentrations. Also, they employed two different methods (so-called FTC and CMC in the manuscript) to determine the threshold concentrations. However, the conclusions are not easily transferable to real biopolymer systems, since it is hard to determine the valences (and binding affinities) of biopolymers such as intrinsically disordered proteins.

3. Reviewer #2 (Public Review):

This paper asks whether systems composed of more than one component (heterotypic) that undergo liquid-liquid phase separation will follow the same rules as ionic solutions. The question is motivated by (i) the behavior of homotypic solutions, where after phase separation, monomer concentrations remain fixed despite addition of new components, which is not true for heterotypic systems and (ii) the known behavior of multivalent ionic salts. This idea has not previously been tested. They show quite clearly through simulations that the solubility product, Ksp, can be used as a quantitative metric to delineate phase transition behavior in heterotypic systems. This is a valuable contribution to the understanding of phase separation in these systems, and could be impactful in analyzing experimental observables, at least in vitro, to determine the valency of interacting systems. It provides a relatively straightforward conceptual basis for observed partitioning of components into dilute and dense phases. The result seems robust and likely to be reproducible experimentally and through alternative simulation studies, particularly given its established history in quantifying the related phenomena in ionic salts.

A weakness is the rather qualitative comparison to experiment, which is justified by the authors based on the unknown valency of the experimental system. There is also no quantitative comparison between simulation types (spatial vs non-spatial). However, the simulations do seem sufficiently detailed to test and validate the Ksp concept.

Strengths:

• The paper is very focused, and uses multiple simulation 'experiments' to test the role of the Ksp in delineating the phase transition, showing good agreement for multiple systems, with both matched and distinct stoichiometries between the components. They see typical behavior at the phase transition point, where they observe the largest variability or fluctuations in the formation of the dense phase. Thus the results strongly support the conclusion that the Ksp delineates phase transitions in these 2-3 component systems.

• A comparison is made to a recent experimental result with three components, showing qualitative agreement with an observed lack of buffering, which was unexpected at the time due to the behavior observed for homotypic systems. Here this result is now rationalized via the Ksp, which does plateau despite the monomer concentrations changing.

• Spatial simulations probe the role of structure and flexibility in impacting phase separation, finding general agreement with previously published experimental and modeling work. These observations about flexibility and matched valency are also relatively intuitive.

Weaknesses

• There is no quantitative comparison between the two simulation approaches (spatial and non-spatial), which should be straightforward. By using the same composition and KD in both types of simulations and directly comparing outcomes, it would help explain when and why the spatial simulations differ from the non-spatial ones-see subsequent comments below:

• A related methodological point: On Line 97 it states that NFSim does not allow intramolecular bonds to form, but this is not true. On one hand, they can be written out explicitly. E.g. A(a1!1, a2).B(b1!1, b2)->A(a1!1, a2!2).B(b1!1, b2!2), would form a second bond between an AB complex that already had one bond. While quite tedious, these could be enumerated, allowing for the zippering effect they see spatially, although the rates would not be bimolecular. This would still leave out intra-complex bonds between proteins without a direct link. However, based on the NFsim website, by default it does in fact allow these types of intra-complex bonds to be formed (http://michaelsneddon.net/nfsim/pages/support/support.html) see "Reactant Connectivity Enforcement". So it is not clear to me which option was used in this paper. According to what is written in the methods, no intra-complex bonds are formed, but this is not the default in NFsim and is indeed allowable.

• The spatial simulations do not show the bimodal distribution under the fixed concentrations (Fig S9). This is a significant difference from the non-spatial result. They attribute this to a 'dimer trap', but given they see the dense phase in the clamped monomer simulations, this cannot be the only explanation. What about kinetic effects, due to the differences in initial concentrations of monomers in the two simulation approaches? The rate constants are not listed anywhere. They only seem to see large clusters at fixed concentrations for the mismatched sizes (Fig S12B), where the Ksp behavior does not hold. Can they increase monomer flexibility more and start to see bimodal at fixed concentration, or change the rates and see a bimodal distribution?

• Related-I am surprised that the sterically hindered monomers would not form large clusters at fixed concentration, as it looks like it is impossible for them to 'zipper' up their binding sites and become trapped in dimers. Is the distribution at fixed concentrations bimodal? The data is not shown.

4. Reviewer #1 (Public Review):

The gist of this work is that the simple concept of a solubility product determines a threshold for phase separation, thereby enabling buffering even in systems where phase separation is driven by heterotypic interactions. The solubility product or SP is determined by the number of complementary interaction sites and the coordination number i.e., the number of bonds one can make per site.

The work appears to be motivated by two questions: Are concentrations buffered in systems where heterotypic interactions drive phase separation thereby negating the presence of a rigorously definable saturation concentration? This question was motivated by work from Klosin et al., showing how phase separation can enable buffering of noise in transcription. They relied on the concept of a saturation concentration. In a paper that followed a few months after, Riback et al., showed that the concept of a saturation concentration ceases to exist, as defined for systems where phase separation is driven purely by homotypic interactions. This was taken to imply that the formation of multicomponent condensates via a blend of homotypic and heterotypic interactions causes a loss of buffering capacity afforded by phase separation. The second question motivating the current work is the apparent absence of a theoretical framework for "varying threshold concentrations" in systems governed by heterotypic interactions.

Using two flavors of simulations, the authors propose that the SP sets an upper limit on the convolution of concentrations that determine phase separation. They show this via simulations where they follow the formation of clusters formed by linear multivalent macromolecules and monitor the emergence of a bimodal distribution of clusters. In 1:1 mixtures of multivalent macromolecules they find that SP sets a threshold beyond which a bimodal distribution of clusters emerges. The authors further find that SP sets an upper limit even in systems that deviate from the 1:1 stoichiometry.

The authors proceed to show that the SP is influenced by the valence of multivalent macromolecules. They also demonstrate that short rigid linkers can cause an arrest of phase separation through a so-called "dimer trap" reminiscent of the "magic number" postulate put forth by Wingreen and colleagues.

Is the work significant, novel, and timely? Effectively the authors propose that the driving forces for phase separation can be distilled down to the concept of a solubility product. Given prior knowledge of the valence, coordination number, and affinities can one predict concentration thresholds for phase separation? The authors suggest that this can be gleaned from either network based simulations, which are very inexpensive, or through more elaborate simulations. They further propose that it is the solubility product that sets the threshold.

It is worth noting that the authors are quantifying what is known in the physical literature as a percolation threshold. The seminal work of Flory and Stockmayer dating back to the 1940s showed how one can calculate a percolation threshold by taking in prior knowledge of valence, coordination numbers, and affinities whilst ignoring cooperativity. These ideas have been refined and advanced in several theoretical contributions by various labs. While none of the papers in the physical literature use the concept of a solubility product, they rely on the concept of a percolation threshold because the transition to large, system-spanning clusters is a continuous one and it is debatable if this is a bona fide phase transition. Rather it is a topological transition.

As for novelty, unfortunately the authors disregard prior work that showed how linker length impacts local vs. global cooperativity in phase transitions that combine phase separation and percolation. Ref. 23 is the work in question and it is mentioned in passing, even though the contributions here are entirely a redux.

The concept of a solubility product, introduced here to model / understand phase behavior of multivalent macromolecules, is an interesting and potentially appealing simple description. It might make the understanding of phase transitions more accessible, but it has problems: (a) it does not define phase separation; rather it defines percolation transitions; (b) without prior knowledge of the relevant quantities, the solubility product cannot be readily inferred, even from simulations, although one can scan parameter space to arrive at predictions regarding the apparent valence and coordination numbers. (c) the solubility product does not tell us much about properties of condensates, interfaces, or the driving forces for phase transitions that are influenced by the collective effects of interaction domains / motifs and spacers.

Finally, as for the absence of a theoretical explanation for the apparent loss of buffering in systems with heterotypic interactions, the authors would do well to see the work of Choi et al., published in PLoS Comput. Biol. in 2019. Figure 12 in that work clearly establishes that the concentrations of A and B species in the coexisting dilute phase are set by the slopes of tie lines - the lines of constant chemical potential. These slopes are set by the relative strengths of homotypic vs. heterotypic interactions, and to zeroth order, that is the physical explanation.

Overall, the two interesting observations are that the percolation threshold can be cast as a solubility product and that this product sets an upper limit on joint concentration thresholds for phase separation, even in systems with heterotypic interactions, thereby rescuing the concept of buffering.

5. Evaluation Summary:

Recent experiments have raised questions regarding concentration buffering provided by the formation of multicomponent biomolecular condensates via phase separation driven by heterotypic interactions. In this work, Chattaraja et al., demonstrate that the concept of a solubility product, used to describe the solubility limits of ionic solutions, sets an upper limit on concentration thresholds, even in systems where the driving forces for phase separation are primarily heterotypic in nature. Their work suggests that the concept of a solubility product rescues the concept of buffering via phase separation.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

#### URL

20. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #2 (Public Review):

The manuscript by Li et al describes the development of styrylpyridines as cell permeant fluorescent sensors of SARM1 activity. This work is significant because SARM1 activity is increased during neuron damage and SARM1 knockout mice are protected from neuronal degeneration caused by a variety of physical and chemical insults. Thus, SARM1 is a key player in neuronal degeneration and a novel therapeutic target. SARM1 is an NAD+ hydrolase that cleaves NAD+ to form nicotinamide and ADP ribose (and to a small extent cyclic ADP ribose) via a reactive oxocarbenium intermediate. Notably, this intermediate can either react with water (hydrolysis), the adenosine ring (cyclization to cADPR), or with a pyridine containing molecule in a 'base-exchange reaction'. The styrylpyridines described by Li et al exploit this base-exchange reaction; the styrylpyridines react with the intermediate to form a fluorescent product. Notably, the best probe (PC6) can be used to monitor SARM1 activity in vitro and in cells. Upon validating the utility of PC6, the authors use this compound to perform a high throughput screen of the Approved Drug Library (L1000) from TargetMol and identify nisoldipine as a hit. Further studies revealed that a minor metabolite, dehydronitrosonisoldipine (dHNN), is the true inhibitor, acting with single digit micromolar potency. The authors provide structural and proteomic data suggesting that dHNN inhibits SARM1 activity via the covalent modification of C311 which stabilizes the enzyme in the autoinhibited state.

Thanks to the positive comments and suggestions from Reviewer #2 !

Key strengths of the manuscript include the probe design and the authors demonstration that they can be used to monitor SARM1 activity in vitro in an HTS format and in cells. The identification of C311 as potential reactive cysteine that could be targeted for drug development is an important and significant insight.

Key weaknesses include the fact that dHNN is a highly reactive molecule and the authors note that it modifies multiple sites on the protein (they mentioned 8 but MS2 spectra for only 5 are provided). As such, the compound appears to be a non-specific alkylator that will have limited utility as a SARM1 inhibitor. Additionally, no information is provided on the proteome-wide selectivity of the compound.

Although dHNN may react with cysteines in general, our results indicate it does target specifically Cys311. Quantification of cysteine-containing peptides of other proteins showed no dHNN modification. So, we conclude that dHNN shows significant specificity to the Cys311 of SARM1. Some other SH-reactive agents we tested show little inhibition on SARM1. The evidence for Cys311 being dominant includes quantification of the intensity of the modified peptides and normalizing with that of the corresponding total peptides, with or without modification, showing that the modification is mainly on Cys311 (Figure 5—figure supplement 1). The dominant role of Cys311 is also confirmed by our mutagenesis and structural studies. Our result strongly suggested that the C311 is a druggable site for designing allosteric inhibitors against SARM1 activation.

dHNN is effective in inhibiting SARM1 activation and AxD at low micromolar range, making it a useful inhibitor. Considering that the neuroprotective effect of NSDP, an approved drug, may well be due to dHNN, labeling it as inhibitor of SARM1 serves focus more attentions.

Revision has been made in Discussion.

An additional key weakness is the lack of any mechanistic insights into how the adducts are generated. Moreover, it is not clear how the proposed sulphonamide and thiohydroxylamine adducts are formed.

From the images presented, it is unclear whether there is sufficient 'density' in the cryoEM maps to accurately predict the sites of modification.

Please refer to Fig . 5 F, in which we show the close up view of dHNN in the ARM domain. dHNN ( purple ) linked to the residue C311 and formed the hydrophobic interactions with surrounding residues E264, L268, R307, F308, and A315. The extra electron densities near the residue C311 fit the shape of dHNN and were shown as grey mesh.

Finally, the authors do not show whether the conversion of PC6 to PAD6 is stable or if PAD6 can also be hydrolyzed to form ADPR.

PAD6 is stable and cannot be hydrolyzed by the activated SARM1, as shown in the following figure. The reactions contain 10μM PAD6, 100 μM NMN, 2.65 μg/mL SARM1 or blank as a control. The PAD6 fluorescence was monitored for one hour and did not change in both groups.

2. Reviewer #2 (Public Review):

The manuscript by Li et al describes the development of styrylpyridines as cell permeant fluorescent sensors of SARM1 activity. This work is significant because SARM1 activity is increased during neuron damage and SARM1 knockout mice are protected from neuronal degeneration caused by a variety of physical and chemical insults. Thus, SARM1 is a key player in neuronal degeneration and a novel therapeutic target. SARM1 is an NAD+ hydrolase that cleaves NAD+ to form nicotinamide and ADP ribose (and to a small extent cyclic ADP ribose) via a reactive oxocarbenium intermediate. Notably, this intermediate can either react with water (hydrolysis), the adenosine ring (cyclization to cADPR), or with a pyridine containing molecule in a 'base-exchange reaction'. The styrylpyridines described by Li et al exploit this base-exchange reaction; the styrylpyridines react with the intermediate to form a fluorescent product. Notably, the best probe (PC6) can be used to monitor SARM1 activity in vitro and in cells. Upon validating the utility of PC6, the authors use this compound to perform a high throughput screen of the Approved Drug Library (L1000) from TargetMol and identify nisoldipine as a hit. Further studies revealed that a minor metabolite, dehydronitrosonisoldipine (dHNN), is the true inhibitor, acting with single digit micromolar potency. The authors provide structural and proteomic data suggesting that dHNN inhibits SARM1 activity via the covalent modification of C311 which stabilizes the enzyme in the autoinhibited state.

Key strengths of the manuscript include the probe design and the authors demonstration that they can be used to monitor SARM1 activity in vitro in an HTS format and in cells. The identification of C311 as potential reactive cysteine that could be targeted for drug development is an important and significant insight.

Key weaknesses include the fact that dHNN is a highly reactive molecule and the authors note that it modifies multiple sites on the protein (they mentioned 8 but MS2 spectra for only 5 are provided). As such, the compound appears to be a non-specific alkylator that will have limited utility as a SARM1 inhibitor. Additionally, no information is provided on the proteome-wide selectivity of the compound. An additional key weakness is the lack of any mechanistic insights into how the adducts are generated. Moreover, it is not clear how the proposed sulphonamide and thiohydroxylamine adducts are formed. From the images presented, it is unclear whether there is sufficient 'density' in the cryoEM maps to accurately predict the sites of modification. Finally, the authors do not show whether the conversion of PC6 to PAD6 is stable or if PAD6 can also be hydrolyzed to form ADPR.

3. Reviewer #1 (Public Review):

The authors aimed to develop cell-permeable small molecule probes that can monitor the activity of SARM1, an enzyme that hydrolyzes NAD+ and is thought to be important for axon degeneration. They successfully achieved this goal using the base exchange activity of SARM1 to make a donor-π-acceptor type of fluorophore. The best probe described in the manuscript is PC6. A number of experiments were carried to rigorously test that the probe works as expected. PC6 has a number of nice features. It is cell permeable, gives much stronger signal than any other probes known for SARM1, is specific for SARM1 and does not detect the activity of CD38 (another enzyme that has similar activity), and allows detection of endogenous SARM1 activation in neurons.

Using this probe PC6, the authors was able to monitor SARM1 activity in neurons treated with vincristine and demonstrated that SARM1 activation precedes axon degeneration and is important but not sufficient for axon degeneration. Most importantly, using this probe to monitor SARM activity, they screened a library of about 2000 drug molecules and discovered that a hypertension drug, nisoldipine, could inhibits SARM1. Surprisingly, further studies showed that a derivative of nisoldipine, dehydronitrosonisoldipine (dHNN, present in the nisoldipine compound used ), is actually the inhibitor of SARM1. They then carried nice mechanistic studies (including mass spectrometry and cryo-EM structures) showing that dHNN inhibits SARM1 by covalently modify Cys311 residue in the ARM domain. The dHNN binding site is similar to the previously established NAD+ inhibitory site.

Overall, the probe is novel with many useful features, the study is rigorous and rather complete, and the conclusion is well supported. I believe the study will be important for the field and will be well received by the field.

The only minor thing is that the writing can be further improved, especially in the introduction section.

4. Evaluation Summary:

SARM1, an enzyme that can convert NAD+ to ADP-ribose or cyclic ADP-ribose, is implicated in axon degeneration. This manuscript describes the development of small molecule probes that can detect the activity of SARM1 in live cells. In the course of the work, a small molecule derived from an hypertension drug was discovered as an effective SARM1 inhibitor. Although the activity probes are novel, the mechanism of SARM1 inactivation by dHNN has not been established. The probe and the inhibitor described in the manuscript could lead to future therapeutic development targeting SARM1 to treat axon degeneration.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

#### URL

21. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

The work by Wang et al. examined how task-irrelevant, high-order rhythmic context could rescue the attentional blink effect via reorganizing items into different temporal chunks, as well as the neural correlates. In a series of behavioral experiments with several controls, they demonstrated that the detection performance of T2 was higher when occurring in different chunks from T1, compared to when T1 and T2 were in the same chunk. In EEG recordings, they further revealed that the chunk-related entrainment was significantly correlated with the behavioral effect, and the alpha-band power for T2 and its coupling to the low-frequency oscillation were also related to behavioral effect. They propose that the rhythmic context implements a second-order temporal structure to the first-order regularities posited in dynamic attention theory.

Overall, I find the results interesting and convincing, particularly the behavioral part. The manuscript is clearly written and the methods are sound. My major concerns are about the neural part, i.e., whether the work provides new scientific insights to our understanding of dynamic attention and its neural underpinnings.

1) A general concern is whether the observed behavioral related neural index, e.g., alpha-band power, cross-frequency coupling, could be simply explained in terms of ERP response for T2. For example, when the ERP response for T2 is larger for between-chunk condition compared to within-chunk condition, the alpha-power for T2 would be also larger for between-chunk condition. Likewise, this might also explain the cross-frequency coupling results. The authors should do more control analyses to address the possibility, e.g., plotting the ERP response for the two conditions and regressing them out from the oscillatory index.

Many thanks for the comment. In short, the enhancement in alpha power and cross-frequency coupling results in the between-cycle condition compared with those in the within-cycle condition cannot be accounted for by the ERP responses for T2.

In general, the rhythmic stimulation in the AB paradigm prevents EEG signals from returning to the baseline. Therefore, we cannot observe typical ERP components purely related to individual items, except for the P1 and N1 components related to the stream onset, which reveals no difference between the two conditions and are trailed by steady-state responses (SSRs) resonating at the stimulus rate (Fig. R1).

Fig. R1. ERPs aligned to stream onset. EEG signals were filtered between 1–30 Hz, baseline-corrected (-200 to 0 ms before stream onset) and averaged across the electrodes in left parieto-occipital area where 10-Hz alpha power showed attentional modulation effect.

To further inspect the potential differences in the target-related ERP signals between the within- and between-cycle conditions, we plotted the target-aligned waveforms for these experimental conditions. As shown in Fig. R2, a drop of ERP amplitude occurred for both conditions around T2 onset, and the difference between these two conditions was not significant (paired t-test estimated on mean amplitude every 20 ms from 0 to 700 ms relative to T1 onset, p > .05, FDR-corrected).

Fig. R2. ERPs aligned to T1 onset. EEG signals were filtered between 1–30 Hz, and baseline-corrected using signals -100 to 0 ms before T1 onset. The two dash lines indicate the onset of T1 and T2, respectively.

Since there is a trend of enhanced ERP response for the between-cycle relative to the within-cycle condition during the period of 0 to 100 ms after T2 onset (paired t-test on mean amplitude, p =.065, uncorrected), we then directly examined whether such post-T2 responses contribute to the behavioral attentional modulation effect and behavior-related neural indices. Crucially, we did not find any significant correlation of such T2-related ERP enhancement with the behavioral modulation index (BMI), or with the reported effects of alpha power and cross-frequency coupling (PAC). Furthermore, after controlling for the T2-related ERP responses, there still remains a significant correlation between the delta-alpha PAC and the BMI (rpartial = .596, p = .019), which is not surprising given that the PAC is calculated based on an 800-ms time window covering more pre-T2 than post-T2 periods (see the response to point #4 for details) rather than around the T2 onset. Taken together, these results clearly suggest that the T2-related ERP responses cannot explain the attentional modulation effect and the observed behavior-related neural indices.

2) The alpha-band increase for T2 is indeed contradictory to the well known inhibitory function of alpha-band in attention. How could a target that is better discriminated elicit stronger inhibitory response? Related to the above point, the observed enhancement in alpha-band power and its coupling to low-frequency oscillation might derive from an enhanced ERP response for T2 target.

Many thanks for the comment. We have briefly discussed this point in the revised manuscript (page 18, line 477).

A widely accepted function of alpha activity in attention is that alpha oscillations suppress irrelevant visual information during spatial selection (Kelly et al., 2006; Thut et al., 2006; Worden et al., 2000). However, it becomes a controversial issue when there exists rhythmic sensory stimulation at alpha-band, just like the situation in the current study where both the visual stream and the contextual auditory rhythm were emitted at 10 Hz. In such a case, alpha-band neural responses at the stimulation frequency can be interpreted as either passively evoked steady-state responses (SSR) or actively synchronized intrinsic brain rhythms. From the former perspective (i.e., the SSR view), an increase in the amplitude or power at the stimulus frequency may indicate an enhanced attentional allocation to the stimulus stream that may result in better target detection (Janson et al., 2014; Keil et al., 2006; Müller & Hübner, 2002). Conversely, the latter view of the inhibitory function of intrinsic alpha oscillations would produce the opposite prediction. In a previous AB study, Janson and colleagues (2014) investigated this issue by separating the stimulus-evoked activity at 12 Hz (using the same power analysis method as ours) from the endogenous alpha oscillations ranging from 10.35 to 11.25 Hz (as indexed by individual alpha frequency, IAF). Interestingly, they found a dissociation between these two alpha-band neural responses, showing that the RSVP frequency power was higher in non-AB trials (T2 detected) than in AB trials (T2 undetected) while the IAF power exhibited the opposite pattern. According to these findings, the currently observed increase in alpha power for the between-cycle condition may reflect more of the stimulus-driven processes related to attentional enhancement. However, we don’t negate the effect of intrinsic alpha oscillations in our study, as the current design is not sufficient to distinguish between these two processes. We have discussed this point in the revised manuscript (page 18, line 477). Also, we have to admit that “alpha power” may not be the most precise term to describe our findings of the stimulus-related results. Thus, we have specified it as “neural responses to first-order rhythms at 10 Hz” and “10-Hz alpha power” in the revised manuscript (see page 12 in the Results section and page 18 in the Discussion section).

As for the contribution of T2-related ERP response to the observed effect of 10 Hz power and cross-frequency coupling, please refer to our response to point #1.

References:

Janson, J., De Vos, M., Thorne, J. D., & Kranczioch, C. (2014). Endogenous and Rapid Serial Visual Presentation-induced Alpha Band Oscillations in the Attentional Blink. Journal of Cognitive Neuroscience, 26(7), 1454–1468. https://doi.org/10.1162/jocn_a_00551

Keil, A., Ihssen, N., & Heim, S. (2006). Early cortical facilitation for emotionally arousing targets during the attentional blink. BMC Biology, 4(1), 23. https://doi.org/10.1186/1741-7007-4-23

Kelly, S. P., Lalor, E. C., Reilly, R. B., & Foxe, J. J. (2006). Increases in Alpha Oscillatory Power Reflect an Active Retinotopic Mechanism for Distracter Suppression During Sustained Visuospatial Attention. Journal of Neurophysiology, 95(6), 3844–3851. https://doi.org/10.1152/jn.01234.2005

Müller, M. M., & Hübner, R. (2002). Can the Spotlight of Attention Be Shaped Like a Doughnut? Evidence From Steady-State Visual Evoked Potentials. Psychological Science, 13(2), 119–124. https://doi.org/10.1111/1467-9280.00422

Thut, G., Nietzel, A., Brandt, S., & Pascual-Leone, A. (2006). Alpha-band electroencephalographic activity over occipital cortex indexes visuospatial attention bias and predicts visual target detection. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 26(37), 9494–9502. https://doi.org/10.1523/JNEUROSCI.0875-06.2006

Worden, M. S., Foxe, J. J., Wang, N., & Simpson, G. V. (2000). Anticipatory Biasing of Visuospatial Attention Indexed by Retinotopically Specific α-Bank Electroencephalography Increases over Occipital Cortex. Journal of Neuroscience, 20(6), RC63–RC63. https://doi.org/10.1523/JNEUROSCI.20-06-j0002.2000

3) To support that it is the context-induced entrainment that leads to the modulation in AB effect, the authors could examine pre-T2 response, e.g., alpha-power, and cross-frequency coupling, as well as its relationship to behavioral performance. I think the pre-stimulus response might be more convincing to support the authors' claim.

Many thanks for the insightful suggestion. We have conducted additional analyses.

Following this suggestion, we have examined the 10-Hz alpha power within the time window of -100–0 ms before T2 onset and found stronger activity for the between-cycle condition than for the within-cycle condition. This pre-T2 response is similar to the post-T2 response except that it is more restricted to the left parieto-occipital cluster (CP3, CP5, P3, P5, PO3, PO5, POZ, O1, OZ, t(15) = 2.774, p = .007), which partially overlaps with the cluster that exhibits a delta-alpha coupling effect significantly correlated with the BMI. We have incorporated these findings into the main text (page 12, line 315) and the Fig. 5A of the revised manuscript.

As for the coupling results reported in our manuscript, the coupling index (PAC) was calculated based on the activity during the second and third cycles (i.e., 400 to 1200 ms from stream onset) of the contextual rhythm, most of which covers the pre-T2 period as T2 always appeared in the third cycle for both conditions. Together, these results on pre-T2 10-Hz alpha power and cross-frequency coupling, as well as its relationship to behavioral performance, jointly suggest that the observed modulation effect is caused by the context-induced entrainment rather than being a by-product of post-T2 processing.

4) About the entrainment to rhythmic context and its relation to behavioral modulation index. Previous studies (e.g., Ding et al) have demonstrated the hierarchical temporal structure in speech signals, e.g., emergence of word-level entrainment introduced by language experience. Therefore, it is well expected that imposing a second-order structure on a visual stream would elicit the corresponding steady-state response. I understand that the new part and main focus here are the AB effects. The authors should add more texts explaining how their findings contribute new understandings to the neural mechanism for the intriguing phenomena.

Many thanks for the suggestion. We have provided more discussion in the revised manuscript (page 17, line 447).

We have provided more discussion on this important issue in the revised manuscript (page 17, line 447). In brief, our study demonstrates how cortical tracking of feature-based hierarchical structure reframes the deployment of attentional resources over visual streams. This effect, distinct from the hierarchical entrainment to speech signals (Ding et al., 2016; Gross et al., 2013), does not rely on previously acquired knowledge about the structured information and can be established automatically even when the higher-order structure comes from a task-irrelevant and cross-modal contextual rhythm. On the other hand, our finding sheds fresh light on the adaptive value of the structure-based entrainment effect by expanding its role from rhythmic information (e.g., speech) perception to temporal attention deployment. To our knowledge, few studies have tackled this issue in visual or speech processing.

References:

Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10.1038/nn.4186

Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., & Garrod, S. (2013). Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain. PLoS Biol, 11(12). https://doi.org/10.1371/journal.pbio.1001752

Reviewer #2 (Public Review):

In cognitive neuroscience, a large number of studies proposed that neural entrainment, i.e., synchronization of neural activity and low-frequency external rhythms, is a key mechanism for temporal attention. In psychology and especially in vision, attentional blink is the most established paradigm to study temporal attention. Nevertheless, as far as I know, few studies try to link neural entrainment in the cognitive neuroscience literature with attentional blink in the psychology literature. The current study, however, bridges this gap.

The study provides new evidence for the dynamic attending theory using the attentional blink paradigm. Furthermore, it is shown that neural entrainment to the sensory rhythm, measured by EEG, is related to the attentional blink effect. The authors also show that event/chunk boundaries are not enough to modulate the attentional blink effect, and suggest that strict rhythmicity is required to modulate attention in time.

In general, I enjoyed reading the manuscript and only have a few relatively minor concerns.

. First, each epoch is from -600 ms before the stimulus onset to 1600 ms after the stimulus onset. Therefore, the epoch is 2200 s in duration. However, zero-padding is needed to make the epoch duration 2000 s (for 0.5-Hz resolution). This is confusing. Furthermore, for a more conservative analysis, I recommend to also analyze the response between 400 ms and 1600 ms, to avoid the onset response, and show the results in a supplementary figure. The short duration reduces the frequency resolution but still allows seeing a 2.5-Hz response.

Thanks for the comments. Each epoch was indeed segmented from -600 to 1600 ms relative to the stimulus onset, but in the spectrum analysis, we only used EEG signals from stream onset (i.e., time point 0) to 1600 ms (see the Materials and Methods section) to investigate the oscillatory characteristics of the neural responses purely elicited by rhythmic stimuli. The 1.6-s signals were zero-padded into a 2-s duration to achieve a frequency resolution of 0.5 Hz.

According to the reviewer’s suggestion, we analyzed the EEG signals from 400 ms to 1600 ms relative to stream onset to avoid potential influence of the onset response, and showed the results in Figure 4. Basically, we can still observe spectral peaks at the stimulus frequencies of 2.5, 5 (the harmonic of 2.5 Hz), and 10 Hz for both power and ITPC spectrum. However, the peak magnitudes were much weaker than those of 1.6-s signals especially for 2.5 Hz, and the 2.5-Hz power did not survive the multiple comparisons correction across frequencies (FDR threshold of p < .05), which might be due to the relatively low signal-to-noise ratio for the analysis based on the 1.2-s epochs (only three cycles to estimate the activity at 2.5 Hz). Importantly, we did identify a significant cluster for 2.5 Hz ITPC in the left parieto-occipital region showing a positive correlation with the individuals’ BMI (Fig. R3; CP5, TP7, P5, P7, PO5, PO7, O1; r = .538, p = .016), which is consistent with the findings based on the longer epochs.

Fig. R3. Neural entrainment to contextual rhythms during the period of 400–1600 ms from stream onset. (A) The spectrum for inter-trial phase coherence (ITPC) of EEG signals from 400 to 1600 ms after the stimulus onset. Shaded areas indicate standard errors of the mean. (B) The 2.5-Hz ITPC was significantly correlated with the behavioral modulation index (BMI) in a parieto-occipital cluster, as indicated by orange stars in the scalp topographic map.

Second, "The preprocessed EEG signals were first corrected by subtracting the average activity of the entire stream for each epoch, and then averaged across trials for each condition, each participant, and each electrode." I have several concerns about this procedure.

(A) What is the entire stream? It's the average over time?

Yes, as for the power spectrum analysis, EEG signals were first demeaned by subtracting the average signals of the entire stream over time from onset to offset (i.e., from 0 to 1600 ms) before further analysis. We performed this procedure following previous studies on the entrainment to visual rhythms (Spaak et al., 2014). We have clarified this point in the “Power analysis” part of the Materials and Methods section (page 25, line 677).

References:

Spaak, E., Lange, F. P. de, & Jensen, O. (2014). Local Entrainment of Alpha Oscillations by Visual Stimuli Causes Cyclic Modulation of Perception. The Journal of Neuroscience, 34(10), 3536–3544. https://doi.org/10.1523/JNEUROSCI.4385-13.2014

(B) I suggest to do the Fourier transform first and average the spectrum over participants and electrodes. Averaging the EEG waveforms require the assumption that all electrodes/participants have the same response phase, which is not necessarily true.

Thanks for the suggestion. In an AB paradigm, the evoked neural responses are sufficiently time-locked to the periodic stimulation, so it is reasonable to quantify power estimate with spectral decomposition performed on trial-averaged EEG signals (i.e., evoked power). Moreover, our results of inter-trial phase coherence (ITPC), which estimated the phase-locking value across trials based on single-trial decomposed phase values, also provided supporting evidence that the EEG waveforms were temporally locked across trials to the 2.5-Hz temporal structure in the context session.

Nevertheless, we also took the reviewer’s suggestion seriously and analyzed the power spectrum on the average of single-trial spectral transforms, i.e., the induced power, which puts emphasis on the intrinsic non-phase-locked activities. In line with the results of evoked power and ITPC, the induced power spectrum in context session also peaked at 2.5 Hz and was significantly stronger than that in baseline session at 2.5 Hz (t(15) = 4.186, p < .001, FDR-corrected with a p value threshold < .001). Importantly, Person correlation analysis also revealed a positive cluster in the left parieto-occipital region, indicating the induced power at 2.5 Hz also had strong relevance with the attentional modulation effect (P7, PO7, PO5, PO3; r = .606, p = .006). We have added these additional findings to the revised manuscript (page 11, line 288; see also Figure 4—figure supplement 1).

2) The sequences are short, only containing 16 items and 4 cycles. Furthermore, the targets are presented in the 2nd or 3rd cycle. I suspect that a stronger effect may be observed if the sequence are longer, since attention may not well entrain to the external stimulus until a few cycles. In the first trial of the experiment, they participant may not have a chance to realize that the task-irrelevant auditory/visual stimulus has a cyclic nature and it is not likely that their attention will entrain to such cycles. As the experiment precedes, they learns that the stimulus is cyclic and may allocate their attention rhythmically. Therefore, I feel that the participants do not just rely on the rhythmic information within a trial but also rely on the stimulus history. Please discuss why short sequences are used and whether it is possible to see buildup of the effect over trials or over cycles within a trial.

Thanks for the comments. Typically, to induce a classic pattern of AB effect, the RSVP stream should contain 3–7 distractors before the first target (T1), with varying lengths of distractors (0–7) between two targets and at least 2 items after the second target (T2). In our study, we created the RSVP streams following these rules, which allowed us to observe the typical AB effect that T2 performance was deteriorated at Lag 2 relative to that at Lag 8. Nevertheless, we agree with the reviewer that longer streams would be better for building up the attentional entrainment effect, as we did observe the attentional modulation effect ramped up as the stream proceeded over cycles, consistent with the reviewer’s speculation. In Experiments 1a (using auditory context) and 2a (using color-defined visual context), we adopted two sets of target positions—an early one where T2 appeared at the 6th or 8th position (in the 2nd cycle) of the visual stream, and a late one where T2 appeared at the 10th or 12th position (in the 3rd cycle) of the visual stream. In the manuscript, we reported T2 performance with all the target positions combined, as no significant interaction was found between the target positions and the experimental conditions (ps. > .1). However, additional analysis demonstrated a trend toward an increase of the attentional modulation effect over cycles, from the early to the late positions. As shown in Fig. R4, the modulation effect went stronger and reached significance for the late positions (for Experiment 1a, t(15) = 2.83, p = .013, Cohen’s d = 0.707; for Experiment 2a, t(15) = 3.656, p = .002, Cohen’s d = 0.914) but showed a weaker trend for the early positions (for Experiment 1a, t(15) = 1.049, p = .311, Cohen’s d = 0.262; for Experiment 2a, t(15) = .606, p = .553, Cohen’s d = 0.152).

Fig. R4. Attentional modulation effect built up over cycles in Experiments 1a & 2a. Error bars represent 1 SEM; p<0.05, * p<0.01.

However, we did not observe an obvious buildup effect across trials in our study. The modulation effect of contextual rhythms seems to be a quick process that the effect is evident in the first quarter of trials in Experiment 1a (for, t(15) = 2.703, p = .016, Cohen’s d = 0.676) and in the second quarter of trials in Experiment 2a (for, t(15) = 2.478, p = .026, Cohen’s d = 0.620.

3) The term "cycle" is used without definition in Results. Please define and mention that it's an abstract term and does not require the stimulus to have "cycles".

Thanks for the suggestion. By its definition, the term “cycle” refers to “an interval of time during which a sequence of a recurring succession of events or phenomena is completed” or “a course or series of events or operations that recur regularly and usually lead back to the starting point” (Merriam-Webster dictionary). In the current study, we stuck to the recurrent and regular nature of “cycle” in general while defined the specific meaning of “cycle” by feature-based periodic changes of the contextual stimuli in each experiment (page 5, line 101; also refer to Procedures in the Materials and Methods section for details). For example, in Experiment 1a, the background tone sequence changed its pitch value from high to low or vice versa isochronously at a rate of 2.5 Hz, thus forming a rhythmic context with structure-based cycles of 400 ms. Note that we did not use the more general term “chunk”, because arbitrary chunks without the regularity of cycles are insufficient to trigger the attentional modulation effect in the current study. Indeed, the effect was eliminated when we replaced the rhythmic cycles with irregular chunks (Experiments 1d & 1e).

4) Entrainment of attention is not necessarily related to neural entrainment to sensory stimulus, and there is considerable debate about whether neural entrainment to sensory stimulus should be called entrainment. Too much emphasis on terminology is of course counterproductive but a short discussion on these issues is probably necessary.

Thanks for the comments. As commonly accepted, entrainment is defined as the alignment of intrinsic neuronal activity to the temporal structure of external rhythmic inputs (Lakatos et al., 2019; Obleser & Kayser, 2019). Here, we are interested in the functional roles of cortical entrainment to the higher-order temporal structure imposed on first-order sensory stimulation, and used the term entrainment to describe the phase-locking neural responses to such hierarchical structure following literature on auditory and visual perception (Brookshire et al., 2017; Doelling & Poeppel, 2015). In our study, the consistent results of power and ITPC have provided strong evidence that neural entrainment at the structure level (2.5 Hz) is significantly correlated with the observed attentional modulation effect. However, this does not mean that the entrainment of attention is necessarily associated with neural entrainment to sensory stimulus in a broader context, as attention may also be guided by predictions based on non-isochronous temporal regularity without requiring stimulus-based oscillatory entrainment (Breska & Deouell, 2017; Morillon et al._2016).

On the other hand, there has been a debate about whether the neural alignment to rhythmic stimulation reflects active entrainment of endogenous oscillatory processes (i.e., induced activity) or a series of passively evoked steady-state responses (Keitel et al., 2019; Notbohm et al., 2016; Zoefel et al., 2018). The latter process is also referred to as “entrainment in a broad sense” by Obleser & Kayser (2019). Given that a presented rhythm always evokes event-related potentials, a better question might be whether the observed alignment reflects the entrainment of endogenous oscillations in addition to evoked steady-state responses. Here we attempted to tackle this issue by measuring the induced power, which emphasizes the intrinsic non-phase-locked activity, in addition to the phase-locked evoked power. Specifically, we quantified these two kinds of activities with the average of single-trial EEG power spectra and the power spectra of trial-averaged EEG signals, respectively, according to Keitel et al. (2019). In addition to the observation of evoked responses to the contextual structure, we also demonstrated an attention-related neural tracking of the higher-order temporal structure based on the induced power at 2.5 Hz (see Figure 4—figure supplement 1), suggesting that the observed attentional modulation effect is at least partially derived from the entrainment of intrinsic oscillatory brain activity. We have briefly discussed this point in the revised manuscript (page 17, line 460).

References:

Breska, A., & Deouell, L. Y. (2017). Neural mechanisms of rhythm-based temporal prediction: Delta phase-locking reflects temporal predictability but not rhythmic entrainment. PLOS Biology, 15(2), e2001665. https://doi.org/10.1371/journal.pbio.2001665

Brookshire, G., Lu, J., Nusbaum, H. C., Goldin-Meadow, S., & Casasanto, D. (2017). Visual cortex entrains to sign language. Proceedings of the National Academy of Sciences, 114(24), 6352–6357. https://doi.org/10.1073/pnas.1620350114

Doelling, K. B., & Poeppel, D. (2015). Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences, 112(45), E6233–E6242. https://doi.org/10.1073/pnas.1508431112

Henry, M. J., Herrmann, B., & Obleser, J. (2014). Entrained neural oscillations in multiple frequency bands comodulate behavior. Proceedings of the National Academy of Sciences, 111(41), 14935–14940. https://doi.org/10.1073/pnas.1408741111

Keitel, C., Keitel, A., Benwell, C. S. Y., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-Driven Brain Rhythms within the Alpha Band: The Attentional-Modulation Conundrum. The Journal of Neuroscience, 39(16), 3119–3129. https://doi.org/10.1523/JNEUROSCI.1633-18.2019

Lakatos, P., Gross, J., & Thut, G. (2019). A New Unifying Account of the Roles of Neuronal Entrainment. Current Biology, 29(18), R890–R905. https://doi.org/10.1016/j.cub.2019.07.075

Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal Prediction in lieu of Periodic Stimulation. Journal of Neuroscience, 36(8), 2342–2347. https://doi.org/10.1523/JNEUROSCI.0836-15.2016

Notbohm, A., Kurths, J., & Herrmann, C. S. (2016). Modification of Brain Oscillations via Rhythmic Light Stimulation Provides Evidence for Entrainment but Not for Superposition of Event-Related Responses. Frontiers in Human Neuroscience, 10. https://doi.org/10.3389/fnhum.2016.00010

Obleser, J., & Kayser, C. (2019). Neural Entrainment and Attentional Selection in the Listening Brain. Trends in Cognitive Sciences, 23(11), 913–926. https://doi.org/10.1016/j.tics.2019.08.004

Zoefel, B., ten Oever, S., & Sack, A. T. (2018). The Involvement of Endogenous Neural Oscillations in the Processing of Rhythmic Input: More Than a Regular Repetition of Evoked Neural Responses. Frontiers in Neuroscience, 12. https://doi.org/10.3389/fnins.2018.00095

Reviewer #3 (Public Review):

The current experiment tests whether the attentional blink is affected by higher-order regularity based on rhythmic organization of contextual features (pitch, color, or motion). The results show that this is indeed the case: the AB effect is smaller when two targets appeared in two adjacent cycles (between-cycle condition) than within the same cycle defined by the background sounds. Experiment 2 shows that this also holds for temporal regularities in the visual domain and Experiment 3 for motion. Additional EEG analysis indicated that the findings obtained can be explained by cortical entrainment to the higher-order contextual structure. Critically feature-based structure of contextual rhythms at 2.5 Hz was correlated with the strength of the attentional modulation effect.

This is an intriguing and exciting finding. It is a clever and innovative approach to reduce the attention blink by presenting a rhythmic higher-order regularity. It is convincing that this pulling out of the AB is driven by cortical entrainment. Overall, the paper is clear, well written and provides adequate control conditions. There is a lot to like about this paper. Yet, there are particular concerns that need to be addressed. Below I outline these concerns:

1) The most pressing concern is the behavioral data. We have to ensure that we are dealing here with a attentional blink. The way the data is presented is not the typical way this is done. Typically in AB designs one see the T2 performance when T1 is ignored relative to when T1 has to be detected. This data is not provided. I am not sure whether this data is collected but if so the reader should see this.

Many thanks for the suggestion. We appreciate the reviewer for his/her thoughtful comments. To demonstrate the AB effect, we did include two T2 lag conditions in our study (Experiments 1a, 1b, 2a, and 2b)—a short-SOA condition where T2 was located at the second lag of T1 (i.e., SOA = 200 ms), and a long-SOA condition where T2 appeared at the 8th lag of T1 (i.e., SOA = 800 ms). In a typical AB effect, T2 performance at short lags is remarkably impaired compared with that at long lags. In our study, we consistently replicated this effect across the experiments, as reported in the Results section of Experiment 1 (page 5, line 106). Overall, the T2 detection accuracy conditioned on correct T1 response was significantly impaired in the short-SOA condition relative to that in the long-SOA condition (mean accuracy > 0.9 for all experiments), during both the context session and the baseline session. More crucially, when looking into the magnitude of the AB effect as measured by (ACClong-SOA - ACCshort-SOA)/ACClong-SOA, we still obtained a significant attentional modulation effect (for Experiment 1a, t(15) = -2.729, p = .016, Cohen’s d = 0.682; for Experiment 2a, t(15) = -4.143, p <.001, Cohen’s d = 1.036) similar to that reflected by the short-SOA condition alone, further confirming that cortical entrainment effectively influences the AB effect.

Although we included both the long- and short-SOA conditions in the current study, we focused on T2 performance in the short-SOA condition rather than along the whole AB curve for the following reasons. Firstly, for the long-SOA conditions, the T2 performance is at ceiling level, making it an inappropriate baseline to probe the attentional modulation effect. We focused on Lag 2 because previous research has identified a robust AB effect around the second lag (Raymond et al., 1992), which provides a reasonable and sensitive baseline to probe the potential modulation effect of the contextual auditory and visual rhythms. Note that instead of using multiple lags, we varied the length of the rhythmic cycles (i.e., a cycle of 300 ms, 400 ms, and 500 ms corresponding to a rhythm frequency of 3.3 Hz, 2.5 Hz, and 2 Hz, respectively, all within the delta band), and showed that the attentional modulation effect could be generalized to these different delta-band rhythmic contexts, regardless of the absolute positions of the targets within the rhythmic cycles.

As to the T1 performance, the overall accuracy was very high, ranging from 0.907 to 0.972, in all of our experiments. The corresponding results have been added to the Results section of the revised manuscript (page 5, line 103). Notably, we did not find T1-T2 trade-offs in most of our experiments, except in Experiment 2a where T1 performance showed a moderate decrease in the between-cycle condition relative to that in the within-cycle condition (mean ± SE: 0.888 ± 0.026 vs. 0.933 ± 0.016, respectively; t(15) = -2.217, p = .043). However, by examining the relationship between the modulation effects (i.e., the difference between the two experimental conditions) on T1 and T2, we did not find any significant correlation (p = .403), suggesting that the better performance for T2 was not simply due to the worse performance in detecting T1.

Finally, previous studies have shown that ignoring T1 would lead to ceiling-level T2 performance (Raymond et al., 1992). Therefore, we did not include such manipulation in the current study, as in that case, it would be almost impossible for us to detect any contextual modulation effect.

References:

Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18(3), 849–860. https://doi.org/10.1037/0096-1523.18.3.849

2) Also, there is only one lag tested. The ensure that we are dealing here with a true AB I would like to see that more than one lag is tested. In the ideal situation a full AB curve should be presented that includes several lags. This should be done for at least for one of the experiments. It would be informative as we can see how cortical entrainment affects the whole AB curve.

Many thanks for the suggestion. Please refer to our response to the point #1 for “Reviewer #3 (Public Review)”. In short, we did include two T2 lag conditions in our study (Experiments 1a, 1b, 2a and 2b), and the results replicated the typical AB effect. We have clarified this point in the revised manuscript (page 5, line 106).

3) Also, there is no data regarding T1 performance. It is important to show that this the better performance for T2 is not due to worse performance in detecting T1. So also please provide this data.

Many thanks for the suggestion. Please refer to our response to the point #1 or “Reviewer #3 (Public Review)”. We have reported the T1 performance in the revised manuscript (page 5, line 103), and the results didn’t show obvious T1-T2 trade-offs.

4) The authors identify the oscillatory characteristics of EEG signals in response to stimulus rhythms, by examined the FFT spectral peaks by subtracting the mean power of two nearest neighboring frequencies from the power at the stimulus frequency. I am not familiar with this procedure and would like to see some justification for using this technique.

According to previous studies (Nozaradan, 2011; Lenc e al., 2018), the procedure to subtract the average amplitude of neighboring frequency bins can remove unrelated background noise, like muscle activity or eye movement. If there were no EEG oscillatory responses characteristic of stimulus rhythms, the amplitude at a given frequency bin should be similar to the average of its neighbors, and thus no significant peaks could be observed in the subtracted spectrum.

References:

Lenc, T., Keller, P. E., Varlet, M., & Nozaradan, S. (2018). Neural tracking of the musical beat is enhanced by low-frequency sounds. Proceedings of the National Academy of Sciences, 115(32), 8221–8226. https://doi.org/10.1073/pnas.1801421115

Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the Neuronal Entrainment to Beat and Meter. The Journal of Neuroscience, 31(28), 10234–10240. https://doi.org/10.1523/JNEUROSCI.0411-11.2011

2. Reviewer #3 (Public Review):

The current experiment tests whether the attentional blink is affected by higher-order regularity based on rhythmic organization of contextual features (pitch, color, or motion). The results show that this is indeed the case: the AB effect is smaller when two targets appeared in two adjacent cycles (between-cycle condition) than within the same cycle defined by the background sounds. Experiment 2 shows that this also holds for temporal regularities in the visual domain and Experiment 3 for motion. Additional EEG analysis indicated that the findings obtained can be explained by cortical entrainment to the higher-order contextual structure. Critically feature-based structure of contextual rhythms at 2.5 Hz was correlated with the strength of the attentional modulation effect.

This is an intriguing and exciting finding. It is a clever and innovative approach to reduce the attention blink by presenting a rhythmic higher-order regularity. It is convincing that this pulling out of the AB is driven by cortical entrainment. Overall, the paper is clear, well written and provides adequate control conditions. There is a lot to like about this paper. Yet, there are particular concerns that need to be addressed. Below I outline these concerns:

1) The most pressing concern is the behavioral data. We have to ensure that we are dealing here with a attentional blink. The way the data is presented is not the typical way this is done. Typically in AB designs one see the T2 performance when T1 is ignored relative to when T1 has to be detected. This data is not provided. I am not sure whether this data is collected but if so the reader should see this.

2) Also, there is only one lag tested. The ensure that we are dealing here with a true AB I would like to see that more than one lag is tested. In the ideal situation a full AB curve should be presented that includes several lags. This should be done for at least for one of the experiments. It would be informative as we can see how cortical entrainment affects the whole AB curve.

3) Also, there is no data regarding T1 performance. It is important to show that this the better performance for T2 is not due to worse performance in detecting T1. So also please provide this data.

4) The authors identify the oscillatory characteristics of EEG signals in response to stimulus rhythms, by examined the FFT spectral peaks by subtracting the mean power of two nearest neighboring frequencies from the power at the stimulus frequency. I am not familiar with this procedure and would like to see some justification for using this technique

3. Reviewer #2 (Public Review):

In cognitive neuroscience, a large number of studies proposed that neural entrainment, i.e., synchronization of neural activity and low-frequency external rhythms, is a key mechanism for temporal attention. In psychology and especially in vision, attentional blink is the most established paradigm to study temporal attention. Nevertheless, as far as I know, few studies try to link neural entrainment in the cognitive neuroscience literature with attentional blink in the psychology literature. The current study, however, bridges this gap.

The study provides new evidence for the dynamic attending theory using the attentional blink paradigm. Furthermore, it is shown that neural entrainment to the sensory rhythm, measured by EEG, is related to the attentional blink effect. The authors also show that event/chunk boundaries are not enough to modulate the attentional blink effect, and suggest that strict rhythmicity is required to modulate attention in time.

In general, I enjoyed reading the manuscript and only have a few relatively minor concerns.

First, each epoch is from -600 ms before the stimulus onset to 1600 ms after the stimulus onset. Therefore, the epoch is 2200 s in duration. However, zero-padding is needed to make the epoch duration 2000 s (for 0.5-Hz resolution). This is confusing. Furthermore, for a more conservative analysis, I recommend to also analyze the response between 400 ms and 1600 ms, to avoid the onset response, and show the results in a supplementary figure. The short duration reduces the frequency resolution but still allows seeing a 2.5-Hz response.

Second, "The preprocessed EEG signals were first corrected by subtracting the average activity of the entire stream for each epoch, and then averaged across trials for each condition, each participant, and each electrode." I have several concerns about this procedure.

(A) What is the entire stream? It's the average over time?

(B) I suggest to do the Fourier transform first and average the spectrum over participants and electrodes. Averaging the EEG waveforms require the assumption that all electrodes/participants have the same response phase, which is not necessarily true.

2) The sequences are short, only containing 16 items and 4 cycles. Furthermore, the targets are presented in the 2nd or 3rd cycle. I suspect that a stronger effect may be observed if the sequence are longer, since attention may not well entrain to the external stimulus until a few cycles. In the first trial of the experiment, they participant may not have a chance to realize that the task-irrelevant auditory/visual stimulus has a cyclic nature and it is not likely that their attention will entrain to such cycles. As the experiment precedes, they learns that the stimulus is cyclic and may allocate their attention rhythmically. Therefore, I feel that the participants do not just rely on the rhythmic information within a trial but also rely on the stimulus history. Please discuss why short sequences are used and whether it is possible to see buildup of the effect over trials or over cycles within a trial.

3) The term "cycle" is used without definition in Results. Please define and mention that it's an abstract term and does not require the stimulus to have "cycles".

4) Entrainment of attention is not necessarily related to neural entrainment to sensory stimulus, and there is considerable debate about whether neural entrainment to sensory stimulus should be called entrainment. Too much emphasis on terminology is of course counterproductive but a short discussion on these issues is probably necessary.

4. Reviewer #1 (Public Review):

The work by Wang et al. examined how task-irrelevant, high-order rhythmic context could rescue the attentional blink effect via reorganizing items into different temporal chunks, as well as the neural correlates. In a series of behavioral experiments with several controls, they demonstrated that the detection performance of T2 was higher when occurring in different chunks from T1, compared to when T1 and T2 were in the same chunk. In EEG recordings, they further revealed that the chunk-related entrainment was significantly correlated with the behavioral effect, and the alpha-band power for T2 and its coupling to the low-frequency oscillation were also related to behavioral effect. They propose that the rhythmic context implements a second-order temporal structure to the first-order regularities posited in dynamic attention theory.

Overall, I find the results interesting and convincing, particularly the behavioral part. The manuscript is clearly written and the methods are sound. My major concerns are about the neural part, i.e., whether the work provides new scientific insights to our understanding of dynamic attention and its neural underpinnings.

1) A general concern is whether the observed behavioral related neural index, e.g., alpha-band power, cross-frequency coupling, could be simply explained in terms of ERP response for T2. For example, when the ERP response for T2 is larger for between-chunk condition compared to within-chunk condition, the alpha-power for T2 would be also larger for between-chunk condition. Likewise, this might also explain the cross-frequency coupling results. The authors should do more control analyses to address the possibility, e.g., plotting the ERP response for the two conditions and regressing them out from the oscillatory index.

2) The alpha-band increase for T2 is indeed contradictory to the well known inhibitory function of alpha-band in attention. How could a target that is better discriminated elicit stronger inhibitory response? Related to the above point, the observed enhancement in alpha-band power and its coupling to low-frequency oscillation might derive from an enhanced ERP response for T2 target.

3) To support that it is the context-induced entrainment that leads to the modulation in AB effect, the authors could examine pre-T2 response, e.g., alpha-power, and cross-frequency coupling, as well as its relationship to behavioral performance. I think the pre-stimulus response might be more convincing to support the authors' claim.

4) About the entrainment to rhythmic context and its relation to behavioral modulation index. Previous studies (e.g., Ding et al) have demonstrated the hierarchical temporal structure in speech signals, e.g., emergence of word-level entrainment introduced by language experience. Therefore, it is well expected that imposing a second-order structure on a visual stream would elicit the corresponding steady-state response. I understand that the new part and main focus here are the AB effects. The authors should add more texts explaining how their findings contribute new understandings to the neural mechanism for the intriguing phenomena.

5. Evaluation Summary:

This study by Wang et al. used a series of carefully designed behavioral experiments to convincingly demonstrate that the attentional blink (AB) could be modulated by higher-order rhythmic regularity. EEG results further support the link between the elicited neural entrainment and the AB modulation effect. They propose that the rhythmic context implements a second-order temporal structure to the first-order regularities posited in dynamic attention theory.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

#### URL

22. Apr 2021
23. www.biorxiv.org www.biorxiv.org
1. Reviewer #4 (Public Review):

In this paper, the author uses an impressive comparative dataset of 172 species to investigate the relationship between intraspecific genetic diversity and census (actual) population size. They find that even when they use phylogenetic comparative methods, the relationship between neutral diversity and population size is much weaker than predicted by theory and that selection on linked sites is unlikely to explain this difference. The paper convincingly demonstrates that the paradox of variation first pointed out by Lewinton in the 70s remains paradoxical.

This paper is exceptionally strong in multiple ways. First, it is statistically rigorous; this is particularly impressive given that the paper uses methods and data from multiple fields (genomics, macroecology, conservation biology, macroevolution). This is the most robust estimate of the relationship between diversity and population size that has been published to date. Second, it is conceptually rigorous: the paper clearly lays out the various hypotheses that have been put forth over the years for this pattern as well as the logic behind these. The author has done a great job at synthesizing some complex debates and different types of data that are potentially relevant to resolving it. Third, it is exceptionally well-written. I sincerely enjoyed reading it. Overall, I think this is a major contribution to this field and though the paper does not resolve the challenge laid down by Lewinton, I think these analyses (and curated data/computational scripts) will inspire other researchers to dig into this question.

I do however, have some suggestions as to how this paper could be strengthened.

First, in phylogenetic comparative methods (PCMs) there has been a persistent confusion as to what phylogenetic signal is relevant -- when applying a phylogenetic generalized linear model with a phylogenetically structured residual structure (which the author does here), one is estimating the phylogenetic structure in the errors and not the traits themselves. The comparative analysis are well-done and properly interpreted but at some points in the text, particularly when addressing Lynch's conjecture that PCMs are irrelevant for coalescent times and comments/analysis on the appropriateness of Brownian motion as a model of evolution, that there is some conceptual slippage and I suggest that author take a close look and make sure their language is consistent. Strictly speaking the PGLM approach doesn't assume that the underlying traits are purely BM -- only that the phylogenetic component of the error model is Brownian. As such running the node-height test on the both the predictors and the response variable separately -- while interesting and informative about the phylogenetic patterns in the data (including the shift points you have observed) isn't really a test of the assumptions of the phylogenetic regression model. It is at least theoretically plausible (if not biologically) that both Y and X have phylogenetic structure but that the estimated lambda = 0 (if for instance, Y and X were perfectly correlated because changes in Y were only the result of changes in X). To be clear, I am fine with the PGLM analysis you've done and with the node-height test; I just don't think that the latter justifies the former.

One note about the ancestral character reconstruction: I think it is a fine visualization and realize you didn't put too much emphasis on it but strictly speaking the ASR's were done under a constant process model and therefore they wouldn't provide evidence for (a probably very real shift) between phyla. I think it was a good idea to run the analyses on the clade specific trees (particularly given how deep and uncertain the branches dividing the phyla are) but I just don't think you could have gotten there from the ASR.

I am not convinced that the IUCN RedList analysis helps that much here and in my view, you might consider dropping this from the main text. This is for two reasons: 1) species may be of conservation concern both because they have low abundance in general and/or that their abundance is known to have experienced a recent decline -- distinguishing these two scenarios is impossible to do with the data at hand; and 2) there is of course a huge taxonomic bias in which species are considered; I don't think we can infer anything ecologically relevant from whether a species is listed on the RedList or not (as you suggest regarding the lynx, wolverine, and Massasauga rattlesnake) except that people care about it.

This is not really a weakness but I find it notable that recombination map length is correlated with body size. I realize this is old news but I was left really curious as to a) why such a relationship exists; and b) whether the mechanism that generates this might help explain some of the patterns you've observed. I would be keen to read a bit more discussion on this point.

2. Reviewer #3 (Public Review):

This study is quite directly a follow-up study of the recent work of Corbett-Detig et al (2015) and the commentary by Coop (2016) which aimed to understand the relation between population size and diversity, and the degree to which the shape of the relation could be explained by the action of linked selection. The analysis here scales up the sample size for a large-scale focus on comparative analyses of animals, and introduces the application of phylogenetic correction to control for relatedness.

As the most comprehensive analysis of its type to date, and with the addition of phylogenetic correction, this work's strength primarily lies in confirming the conclusions laid out in the commentary by Coop, notably that linked selection is unable to fully explain the narrowness of the diversity across species with orders of magnitude variation in population sizes. Through an explicit model-fitting of the effects of linked selection, the main conclusions are essentially that Lewontin's Paradox remains unexplained. The Introduction and discussion provide a very nice accounting of the range of possible explanations. I also appreciated the connection of the population size inferences to IUCN status.

I wasn't so convinced that the assessment of phylogenetic inertia (Lambda>0) really provides a way to assess Lynch's argument that coalescent times are too short to have a phylogenetic effect. For reasons outlined by the author in the discussion, it could well be that any phylogenetic inertia signal is due to inertia of life history traits correlated with effective population size rather than with diversity itself. The discussion raises this important point, but I think leaves us with the difficulty of really assessing how important that phylogenetic correction really is: if diversity has no direct phylogenetic non-independence, I am a bit unsure how much we have learned through this analysis alone (i.e. what is lambda telling us), without an explicit assessment of how often divergence times may actually truly be on the same order as coalescent times.

That said, I think it's a very open question whether diversity actually has phylogenetic independence because of short split times relative to effective population sizes. The author mentions the possible effect of large Ne on causing this to be violated; but I also wondered whether many of the small Nc species are still retaining a fair bit of ancestral polymorphism, further homogenizing diversity levels.

Overall a number of possible explanations (such as the effect of variable selected site densities, and variable recombination) were raised, and rather quickly rejected as 'unlikely to explain the qualitative patterns'. In a number of cases these statements were fairly brief, and I wondered whether in aggregate how likely a combination of these COULD explain the patterns. Looking at Figure 5B, it seems like the major effect of phylogeny (or correlated life history) is also apparent for the discrepancy between observed and predicted diversity- Chordates seem to have the largest discrepancy. With that in mind, I do wonder whether some feature of genome structure in Cordates, including a combination of the effects discussed in the paper that could account for the discrepancy (e.g. the effects of variable recombination rates/genome size and functional densities, variation in mutation rates, etc.) could collectively account for the paradox, even though individually the author rules them out as being able to explain the 'qualitative pattern'. Could the genome structure of chordates lead to a major difference in linked selection that's unaccounted for here?

Mei et al (2018) (American Journal of Botany, Volume 105, Issue 1, p1-124) argued that species with larger genomes have greater 'functional space', implying a greater deleterious mutation rate in species with larger genomes. This could potentially be a factor driving those Chordates with intermediate Nc values furthest below the predicted line?

3. Reviewer #2 (Public Review):

This manuscript presents a thorough reanalysis of estimates of genetic heterozygosity pi, its distribution among animals, and its relationship with the census population size, here estimated from organism body mass and species range. A significant phylogenetic effect on pi is uncovered, and a formal model of linked selection is shown to be insufficient to explain the so-called Lewontin's paradox.

My first and maybe most important comment is that the introduction, discussion and overall writing of the manuscript are really excellent. This might be the most lucid, extensive, balanced overview of Lewontin's paradox and the associated literature I've ever read.

My second comment, somehow counterbalancing the first one, is that the major point made here, that linked selection alone cannot explain Lewontin's paradox, has been made before, e.g. by Coop (2016) and Ellegren & Galtier (2016) commenting on Corbett-Detig et al (2015). The material presented here substantiates this point further, but is perhaps not a major advance per se, so that the manuscript lies somewhere between a review and research article.

I have a few additional, more specific comments below. I think this is a great addition to the existing literature, which clarifies and synthetizes many aspects of a complex question.

1) Phylogenetic inertia

I am not sure I get the point of the phylogenetic inertia analysis. It seems to be intended as a response to Lynch 2011, who, responding to a criticism by Whitney & Garland, stated that the coalescence time is not inherited across the phylogeny. That quote from Lynch is mentioned several times, and as a motivation for performing this analysis. Yet the result reported here, i.e., that pi has some phylogenetic inertia, does not seem to contradict this specific statement, for at least two reasons. First pi might have some inertia via inertia on the mutation rate, not on coalescence time. Secondly, pi might have some inertia because it is in part determined by traits that have some inertia, such has body mass for instance. The text insightfully discusses these aspects (l399-407), but honestly I do think that this analysis invalidates Lynch's (somewhat trivial) point that coalescence time is not a trait that can be inherited.

I still agree that the analysis is worth doing and publishing, but I would suggest putting less emphasis on the Garland/Lynch controversy. Also it might be fair to mention that Leffler et al (2012) and Romiguier et al. (2014) did attempt to correct for phylogenetic inertia when correlating pi to various traits, although they did not analyse the phylogenetic effect as thoroughly as it is done here.

2) Range effect

I was surprised to read that species range alone has a significant effect on pi. The reason is that I suspected species range varied at a shorter time scale than coalescence time - e.g. think of what ranges were 20,000 years ago, when pi was probably, I thought, very similar to current pi; maybe worth discussing?

3) IUCN categories

I found the result that endangered species have a lower estimated Nc and a lower pi than non-endangered species a bit trivial, knowing that lare body sized vertebrates are typically more threatened, and more of concern, than small body sized invertebrates. What would be more relevant to conservation biology is an analysis that controls for body size, e.g., are endangered large mammals less polymorphic than non-endangered large mammals. There is a fairly large amount of literature on this topic.

4) The Methods section (l580-581) states that map length data are available in 41 species, but figure 5A shows a relationship with 131 data points; some clarification needed here

5) abstract line 10: "vary two orders of magnitude", word missing

4. Reviewer #1 (Public Review):

The standard neutral model, which is our null model for levels of genetic variation, predicts that they should be proportional to census population sizes. In reality census population sizes across metazoan species span several orders of magnitude more than the ~3 orders spanned by levels of genetic diversity. This discrepancy is referred to as Lewontin's paradox, and to resolve it would mean to explain how basic population genetic processes lead to the modest span of genetic diversity levels that we observe. This is a central question in population genetics (which is, after all, concerned with understanding patterns of genetic variation) and is of substantial general interest.

1) It derives novel estimates of census population size across metazoans, which alongside previous estimates of neutral diversity levels, enables a revised quantification of the relationship between diversity levels (\pi) and census populations sizes (Nc).

2) It quantifies the relationship between \pi and Nc controlling for phylogenetic relatedness.

3) It revisits the question of whether this relationship can be accounted for by the effects of selection at linked loci (e.g., sweeps and background selection). I address each of these analyses in turn.

Novel estimation of census population sizes in metazoans: The estimates are derived by: 1) estimating the density of individuals within their range, based on body size and a previously observed linear relationship between body size and density (Damuth 1981, 1987); 2) applying a geometric algorithm (finding the minimum alpha-shape computationally, sometimes adjusting alpha manually) to geographic occurrence data to estimate the area of the range; and 3) multiplying the two.

The results are sometimes surprising. For example, Drosophila melanogaster is estimated to have a population size > 10^17 (Fig. 1); if the volume of an individual is 1 mm3, this implies a total volume > 1km x 1km x 100 m. Additionally, some species classified as endangered have census estimates > 10^8 (Fig. 3). The author compares his area estimates with estimates for species in the IUCN Red List (focused on endangered species) to find that they largely correlate (although this is not quantified). I think further investigation of the quality of the census size estimates is warranted. Are there are other estimates of census size or biomass that can be used for validation, e.g., for species of economic and biomedical importance (e.g., herring and anopheles)?

If the proposed method proves to work well, I imagine that the estimates of census size may be of broad interest in other contexts. In the context of Lewontin's paradox, it may be interesting to quantify the difference in the relationship between \pi and Nc suggested by the new estimates vs the proxies used in previous work (e.g., Leffler et al. 2012).

Quantifying the relationship between \pi and Nc controlling for phylogenetic relatedness: I am unclear about the motivation for this analysis. As Lynch argued (and the author describes), if TMRCAs of neutral loci within a species are smaller than the split time from another species in the sample, its genetic diversity level was shaped after the split, and it could be considered an independent sample for the relationship between \pi and Nc. There may be underlying factors shaping this relationship that are not phylogenetically independent (e.g., similar life history traits) but it is unclear why that would justify down-weighting a sample. In that sense, I am not convinced by the authors argument that finding a 'phylogenetic signal' justifies the correction. Stated differently, it is not obvious what is the 'true' relationship being estimated and why relatedness biases it. One could imagine that the 'true' relationship is the one across extant species, in which case the correction is not needed (with the possible exception of species in which TMRCAs are on the same order or greater than split times). I don't know what an alternative 'true' relationship would be.

Moreover, I am not sure how a more precise 'quantification' of the relationship between diversity and census size serves us. Regardless of corrections, it is obvious that the null provided by the standard neutral model is off by orders of magnitude. Perhaps once we have alternative explanations for this relationship then testing them may require corrections, but presumably the corrections will depend on the explanations.

One context in which phylogenetic considerations and quantification may be relevant is the comparison of the \pi - Nc relationship among clades. Notably, one could imagine that different population genetic processes are important in different clades (e.g., due to reproductive strategy) and a comparative analysis may highlight such differences. It is less clear whether the corrections that are applied here are the relevant ones. Separating clades makes sense in this regard, but it is unclear why to correct for non-independence within a clade. Furthermore, it seems that in order to point to different processes one would like to control for the distribution of census population sizes in comparisons between clades (to the extent possible). Otherwise, one can imagine the same process shaping the relationship in different clades, but having a non-linear (in log-log scale) functional dependence on census population size (as in the case of genetic draft studied next). In this regard, I am not sure I follow the argument attributed to Gillespie (1991) and specifically how the current analysis supports it.

In summary, I find the ideas of clade level analyses and of using phylogenetic comparative methods (PCMs) to look at census population size (and possibly diversity levels) promising. For example, as the author alludes to in the Discussion (bottom of P. 13), PCMs may be informative about the hypothesis that species with large census sizes have a greater rate of speciation. Yet I find the current analyses difficult to interpret.

Analysis of the effects of linked selection: The author investigates whether the effect of selection at linked sites (e.g., selective sweeps and background selection) can account for the observed relationship between diversity levels and census population size. To this end, he assumes that different species have the same sweeps and background selection parameters inferred in Drosophila melanogaster, but differ in census size and genetic map length.

As justification for using selection parameters inferred in D. melanogaster, the author argues that this is a "generous" assumption in that the effects of linked selection in this species are on the high end. One issue with this argument is that among reasons for the strong effects in D. melanogaster is its short genetic map length. This is not a substantial caveat, given that the analysis is meant as an illustration and it can be resolved by using appropriate wording. Perhaps more troubling is that the author's estimate of the reduction in diversity level in D. melanogaster is much greater than the reduction estimated in the inference that he relies on (several orders of magnitude and less than one, respectively). This discrepancy is mentioned but should probably be addressed more substantially.

The results of the analysis are intriguing. The effects of linked selection `shrink' the ~13 orders of magnitude of census population sizes to ~3 orders of magnitude of diversity levels. This massive effect is largely due to the genetic draft (Gillespie 2001) and to a lesser extent to the decrease in map length with increasing census size: when the census population size becomes very large (Nc~10^9) and coalescence rates due to genetic drift decrease accordingly (~1/2Nc), coalescence rates due to sweeps, which increase owing to the smaller map lengths (and would otherwise remain constant), become dominant. In hindsight this is quite intuitive and aligns with Gillespie's original argument, but this is in hindsight, and using this argument in conjunction with data, specifically with census population size and map length estimates, is novel.

As the author points out, the resulting relationship between diversity levels and census population sizes does not fit the data well. Notably, predicted diversity levels are too high in the intermediate range of census population sizes. Nonetheless, their analysis suggests that linked selection may play a much greater role than previous studies suggested (i.e., the analyses of Corbett-Detig et al. (2015) and Coop (2016) suggests that it cannot account for more than 1 order of magnitude). Maybe the poor fit is due to the importance of other factors (e.g., bottlenecks) in species with intermediate census population sizes?

I also wonder whether the potential role of linked selection may be clearer if the different effects are shown separately, and perhaps with less reliance on the estimates from D. melanogaster. Namely, the effects of background selection can be shown for a few different values of Udel, e.g., between 0.3-3 (this range seems plausible based on many estimates). They can be shown both accounting and not accounting for the relationship between map length and census size. Similarly, the effect of sweeps can be shown for several values of corresponding parameters, and perhaps even for different models for how the number of beneficial substitutions varies with census size (see Gillespie's work to that effect). I believe that such illustrations will be fairly intuitive and less restrictive.

5. Evaluation Summary:

The manuscript revisits an enduring and central question in population genetics known as Lewontin's paradox: that in contrast to the prediction of the field's null model, which suggests that levels of neutral genetic diversity should be proportional to the census population size, in reality, census population sizes span several orders of magnitude more than the approximately three orders of magnitude spanned by levels of genetic diversity. The manuscript provides a nice review of previous work as well as thought-provoking novel analyses. There are also several issues that make it difficult to interpret the new results.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #4 agreed to share their names with the authors.)

#### URL

24. www.biorxiv.org www.biorxiv.org
1. Reviewer #2 (Public Review):

It is well established in diverse sensory modalities that fluctuating excitability of cortical regions is likely reflected in ongoing alpha activity in these respective areas. However, how this oscillatory activity relates to "intensities" of neural (~evoked) responses and perception following supra-threshold stimulation is not well established. Building up and extending also their own previous work in the somatosensory domain (Stephani et al., 2020), this is the main goal of the authors.

To achieve their goals the authors implement a straight-forward somatosensory discrimination task while recording EEG. The study builds up on very high quality data as well as analysis approaches and along with a decent sample size allows draw conclusions with respect to the aforementioned questions. Using CCA to analyse ongoing and stimulus (single-trial) evoked responses from a (for the non-invasive researcher world) well-circumscribed brain region is a clear strength, when studying the inter-relationships between these brain activity features. The displayed results of the structural equation model (Figure 4) is a great summary of the main effects of the results and an important contribution to the field. In particular, I really appreciate the inclusion of peripheral responses, that convincingly make the case that the non-trivial relationship between stimulus and perceptual intensity on the one hand side and early evoked response (N20) on the other hand side indeed emerges at a brain level.

However there are also some weaknesses that need to be mentioned:

• The main weaknesses of the manuscript becomes most apparent with respect to the stated impact that "The widespread belief that a larger brain response corresponds to a stronger percept of a stimulus may need to be revisited.". I am not really sure if there are many cognitive neuroscientists, that would actually subscribe to such a simplistic relationship between evoked responses and perception and that temporal differentiation (early vs late responses) and the biasing influence of prestimulus activity patterns are becoming increasingly recognized. So rather than actually changing a dominant paradigm, this work is an (excellent) contribution to a paradigm shift that is already taking place.

• Also it should be considered that with regards to the analysis approach using CCA, the claims are mainly restricted to BA3b: i.e. while I also think that this is a strength of the current study, one should refrain from over-interpreting the results in a very generalized manner. The authors do include some "thalamus" and "late" evoked response patterns as well, however that presentation of the results is somewhat changed now as compared to the N20 (e.g. using LMEs rather than comparison of extremes; not using SEMs). The readability of results and especially the comparison of effects would profit from a more coherent approach.

• I have some concerns whether the relationship between large alpha power and more negative N20s could be driven by more trivial factors rather than the model explanations the authors develop in the discussion. Concretely the question whether phase locking of large alpha power along with >30 Hz high pass filtering could produce a similar finding as shown e.g. in Figure 2c. This is an important issue, as prestimulus alpha influences the N20 amplitudes as well as the perceptual reports.

• It is important to emphasize that the model develop is a post-hoc one, i.e. the authors do not develop already in the discussion various alternative scenario results based on different model predictions. Therefore there is no strong evidence in support of the specific one advanced in the discussion.

2. Reviewer #1 (Public Review):

In this study, Stephani et al. addresses the question of how ongoing fluctuations in neuronal excitability, as well as stimulus strength, impact the perception of above-threshold tactile stimuli and the subsequent stimulus-evoked brain activity. Specifically, pre-stimulus alpha oscillation amplitude and the N20 component of the SEP are used as a readout of cortical excitability, while signal detection theory quantities - sensitivity and criterion - derived from participant response are used as the behavioral correlates. The authors find that 1) higher prestimulus alpha amplitude is associated with a higher criterion, i.e., participants tend to rate stimuli as "weaker" regardless of the actual intensity, while there was no effect on sensitivity; 2) larger N20 amplitude (more negative) is associated with stronger stimulus intensity; 3) conditioned on actual stimulus intensity, larger N20 amplitude is associated with a higher criterion, similar to prestim alpha; 4) the above effects are confirmed using a multi-level structural equation model while also accounting for peripheral control measures; and finally 5) that the thalamic response, as measured in very early components, have no association with perceptual response and previous findings on later SEP components (N140) is reproduced in this data. The authors offer a physiological interpretation that explains the seemingly contradictory result by accounting for the recruitment level of cortical neurons and their membrane depolarization in excitable stages.

Overall, I find this study to be very nicely done, well-written, and with informative figures. My expertise in signal detection theory and awareness of the SEP literature are limited, and the following comments will probably reflect that. Considering that, the introduction was very concise yet informative regarding the state of the field, and nicely motivates why suprathreshold stimulation is an interesting question to investigate, and was overall just a pleasure to read. The data and analyses seem convincing in supporting the authors' conclusions. The results are indeed puzzling (in an interesting way), and while the authors provide a nicely parsimonious explanation rooted in the underlying neurophysiology, I think this study has the potential to further motivate many lines of investigation, especially considering that the majority of works done in this field looks at the effect of ongoing neural activity on the detection of near-threshold sensory stimuli (as far as I know). I have some major concerns broadly regarding the interplay between alpha oscillation and the N20 (detailed below), the rest are mostly clarifying comments/questions that I believe may help the authors improve this paper, as well as other interesting points to consider in the discussion to relate to the broader literature.

-

N20 and alpha oscillation

My main technical concern lies in the choice of decomposition filter for SEP and alpha oscillations, and the conclusions the authors draw from that. Specifically, a CCA spatial filter is optimized here for the N20 component, which is then identically applied to isolate for alpha sources, with the logic being that this procedure extracts the alpha oscillation from the same sources (e.g., L359). I have no issues (or expertise) with using the CCA filter for the SEP, but if my understanding of the authors' intent is correct, then I don't agree with the logic that using the same filter isolate for alpha as well. The prestimulus alpha oscillation can have arbitrary source configurations that are different from the SEP sources, which may hypothetically have a different association with the behavioral responses when it's optimally isolated. In other words, just because one uses the same spatial filter, it does not imply that one is isolating alpha from the same source as the SEP, but rather simply projecting down to the same subspace - looking at a shadow on the same wall, if you will. To show that they are from the same sources, alpha should be isolated independently of the SEP (using CCA, ICA, or other methods), and compared against the SEP topology. If the topology is similar, then it would strengthen the authors' current claims, but ideally the same analyses (e.g., using the 1st and 5th quintile of alpha amplitude to partition the responses) is repeated using alpha derived from this procedure. Also, have the authors considered using individualized alpha filters given that alpha frequency vary across individuals? Why or why not?

In the same vein, both alpha and N20 amplitude relate to perceptual judgement, and to each other. I believe this is nicely accounted for in the multivariate analysis using the SEM, but the analysis that partitions the behavioral responses using the 20% and 80% are done separately, which means that different behavioral trials are used to compute the effect of N20 and alpha on sensitivity and criterion. While this is not necessarily an issue given that there IS a multivariate analysis, I would like to know how many of those trials overlap between the two analyses.

At multiple points, the authors comment that the covariation of N20 and alpha amplitude in the same direction is counterintuitive (e.g., L123-125), and it wasn't clear to me why that should be the case until much later on in the paper. My naive expectation (perhaps again being unfamiliar with the field) is that alpha amplitude SHOULD be positively correlated with SEP amplitude, due to the brain being in a general state of higher variability. It was explained later in the manuscript that lower alpha amplitude and higher SEP amplitude are associated with excitability, and hence should have the opposite directions. This could be explicitly stated earlier in the introduction, as well as the expected relationship between alpha amplitude and behavior.

Furthermore, I have a concern with the interpretation here that's rooted in the same issue as the assumption that they are from the same sources: the authors' physiological interpretation makes sense if alpha and N20 originated from the same sources, but that is not necessarily the case. In fact, the population driving the alpha oscillation could hypothetically have a modulatory effect on the (separate) population that eventually encodes the sensory representation of the stimulus, in which case the explanation the authors provide would not be wrong per se, just not applicable. A comment on this would be appreciated in the revision.

In addition, given how closely related the investigation of these two quantities are in this specific study, I think it would be relevant to discuss the perspective that SEPs are potentially oscillation phase resets. Even though the SEP is extracted using an entirely different filter range, it could nevertheless be possible that when averaged over many trials, small alpha residues (or other low freq components) do have a contribution in the SEP. If the authors are motivated enough, a simulation study could be done to check this, but is not necessary from my point of view if there is an adequate discussion on this point.

3. Evaluation Summary:

Stephani et al. address the question of how ongoing fluctuations in neuronal excitability, as well as stimulus strength, impact the perception of above-threshold tactile stimuli and the subsequent stimulus-evoked brain activity. The results are puzzling in an interesting way, and while the authors provide a nicely parsimonious explanation rooted in the underlying neurophysiology, editors and reviewers think this study has the potential to further motivate many lines of investigation. This manuscript will be of interest mainly to researchers using electrophysiological methods (EEG, MEG, ECoG etc.), as the authors have produced a very high-quality EEG data-set (including uncommon peripheral measurements).

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

#### URL

25. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

In the paper by Victorino et al., the authors describe the role for transcription factor HIF1a in NK cells during MCMV infection. They clearly demonstrate that HIF1a-deficiency results in impaired viral control, with a major effect visible in the impacted expansion of MCMV-specific NK cells. The paper brings novelty to the field as the role of HIF1a has not been addressed in NK cells in the course of viral infection.

The conclusions of the paper are mostly well supported by the data however there are still some aspects of the study that need clarification and extension.

i) It remains unclear what induces HIF1a expression during MCMV infection.

ii) The authors could speculate on the mechanisms of how HIF1a promotes repression of Bim during MCMV infection?

iii) The lack of expression of HIF1a glycolytic genes in HIF1a-deficient NK cells may not be surprising but it is very clear and convincing and supports the idea that HIF1a promotes survival of cells by promoting glycolysis. However, the study would benefit with a formal proof of this metabolic adaptation in the context of MCMV infection.

2. Reviewer #2 (Public Review):

In this manuscript, the authors analyzed the role of HIF1a in NK cells in a variety of settings, including viral infection. HIF1a deficient NK cells appear to be mostly functional in terms of effector functions and ability to proliferate with only subtle differences with WT NK cells. This was also observed in HIF1a deficient Ly49H+ NK cells, yet in vivo Ly49H expansion is reduced in HIF1a KO mice. Response to IL-2 demonstrate that despite similar proliferation rate NK cell numbers were reduced indicating to the authors an NK cell survival issue. This was confirmed by measuring Bim and Bcl2, which were respectively decreased and increased. Increased cell death of HIF1a deficient NK cells during MCMV was confirmed. Mechanistically, the authors found that cell death was autophagy independent but due to an impaired glycolytic activity. The author concluded that in the absence of HIF1a, NK cells had an increase apoptosis due to abnormal glucose metabolism. Overall, the experiments are well executed and are logical and the conclusions are supported by the data presented.

3. Reviewer #1 (Public Review):

The manuscript by Victorino et al. describes the role of the metabolic adaptor hypoxia inducible factor-1α (HIF1α) in NK cells during viral infection. They first showed that NK cells constitutively express HIF1α and it is upregulated by murine cytomegalovirus (MCMV) infection. Using HIF1α KO mice, they provided evidence that HIF1α is dispensable for normal NK cell development, but important for NK cell dependent virus control and morbidity, NK cell number and their expansion. Although the lack of HIF1α affects the NK cell dependent virus control, it appears that HIF1α is not required for NK cell effector functions. In spite of the fact that proliferation of NK cells in HIF1α KO was not affected, their ultimate number was reduced due to the upregulation of pro-apoptotic protein Bim coupled with increased caspase activity and impaired glucose metabolism. As authors pointed out, the data presented in this manuscript are in sharp contrast to previous finding on the role of HIF1α in NK cell responses to tumors, suggesting the impact of tumor microenvironment.

4. Evaluation Summary:

By using mice lacking the hypoxia inducible factor-1α (HIF1α) in NK cells, the study unravels a previously unknown function of this transcription factor in virus control by NK cells. Mechanistically, the authors provided evidence that HIF1α supports survival of NK cells through an efficient glucose metabolism required for optimal NK cell response to viral infection.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

#### URL

26. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

Sorrentino et al. utilise Magnetoencephalography (MEG) and diffusion MRI tractography to investigate the mapping between the structure and function of the human brain and any constrains imposed from this coupling. Their work builds upon a growing number of studies that use functional Magnetic Resonance Imaging (fMRI) to provide evidence of structure shaping neural functioning. In this case, the authors utilise the fine temporal resolution of MEG to explore the propagation of the neural signal and investigate how this can be linked to a structural connectome derived from deterministic diffusion MRI tractography. Following critical dynamics analysis pipelines, they identified neuronal avalanches in the MEG data and showed that their spread is more likely between pairs of grey matter regions with increased structural connectivity strengths, quantified by the streamline count among them. This result provides new evidence on how the structural architecture of the human brain can influence intrinsic neural dynamics and suggests a potential mechanism, based on scale invariant properties in space and time, for similar previous findings based on the slower temporal scales of fMRI.

The analyses presented are clear and concise. They highlight an efficient and clever way to combine MEG and diffusion data, maximising the benefits of both modalities, to explore structure-function associations. The authors have tested a number of different configurations, using multiple connectome mapping pipelines, atlases, as well as a replication sample from the Human Connectome Project and the results were robust both at the individual and the group level, which is reassuring and impressive.

Given the short report format of the manuscript, it is understandable that some additional information and results were described very briefly or omitted altogether. However, there are a few points that, I think, if discussed (even succinctly) could improve the strength of the presented evidence and increase the manuscript's impact to the field. For example:

Given that the foundations for all subsequent functional analyses are the time bin length and the branching parameter, it would be useful to have a couple of graphs showing their relationship. i.e. a graph showing the association between bin size and σ, for a wider range of bins (in addition to 1, 3, and 5 that are reported). Is bin size 3 the only bin size that σ = 1 and if not, how does this affect the rest of the results (especially the transition matrix). A second interesting graph dealing with avalanche dynamics would be to show the avalanche size distributions for a single subject and the group, for different bin lengths, highlighting whether they are following a power law, indicator of critical dynamics, and briefly discussing their power law exponents, α.

The correlation between the structural connectivity and randomised transition matrices still seems relatively high. It'd be of interest for the authors to provide a brief interpretation of this, along with a justification for keeping the spatial structure unchanged during their randomisation routine.

As the different size of parcels in the atlases can have an effect for both structural and functional analyses, it would be of interest to know if the authors controlled for that and how.

Given the varying SNR that the AAL parcels will have due to their location, it could be of interest to present some information about the avalanches' spatial distribution (i.e. but not limited to a whole-brain map, where each parcel's intensity could correspond to the number of times it goes supra-threshold on average). This could highlight any issues where avalanches involve some parcels more (or less) than others due to challenges in recording and localising their activity.

In addition to the above challenges with MEG, deterministic tractography analyses also present limitations on how accurately they can describe the underlying structural connectome. i.e. issues with crossing fibres (of varying degree among parcels due to their location), spurious tracts, and invalid, non-biologically plausible connections. A brief mention of these challenges both for MEG and DWI and how they might affect and impose limitations on the manuscript's results would be beneficial.

Finally, values in the scatter plots in Figure 2 are probably mean centered? For visualisation purposes it might be better if they were not, as it seems a bit odd to have negative values or numbers higher than 1 for structural connectivity and transition probabilities. Also, there seems to be lots of ROI pairs with 0 structural connectivity but high transition probabilities, which might justify a brief mention in the manuscript and an interpretation.

2. Reviewer #2 (Public Review):

Is this submission Sorrentino and co. are investigating the relationship between the structural and electrophysiological functional connectome. In particular they are asking whether the white matter structure is a large contributor to the patterns of function we see, and (importantly) whether this is or not a source-reconstruction artefact. The relationship between structure and the emergence of these functional networks is of interest to many, it has been previously shown in fMRI and I believe a lot of modelling work to match empirical observations of the electrophysiology has been previously done.

The paper is clear in its motivations, and I believe fairly clearly reported. The simplicity of this is definitely one of the strengths of the report. Conceptually I believe this is a plausible hypothesis and of interest and (assuming the technical methods are correct) I'd say this is an elegant approach to supporting this.

3. Reviewer #1 (Public Review):

Sorrentino et al explore the possible link between 'neuronal avalanches' in resting MEG signal and structural connectivity in the human brain. They estimate neuronal avalanches by applying a threshold to identify large perturbations in the source reconstructed MEG data before binarising the time-series to define 'active' and 'passive' windows in each voxel. Sequences of 'active' voxels are identified starting with any region becoming active and ending when all voxels become passive. The probability of an avalanche transitioning between any two voxels in the MEG data is compared to network structure identified from diffusion imaging in the same individuals. The authors show that brain regions with a high function transition probability are also likely to be structurally connected. Whilst the core finding is interesting, the results are undermined by a lack of controls for confounds.

Strengths

This paper utilises a straightforward and intuitive analysis approach to tackle a complex question - how does functional activity spread throughout the brain? The simple thresholding in the neuronal avalanches approach avoids a number of complex steps typically associated with electrophysiology connectivity estimation such as strong filtering and complex frequency transforms. Sorrentino et al are able to show that this simple time-domain measure is able to provide an interesting overview of functional network structure. Moreover, this method naturally works to explore networks structure in transient, aperiodic signals which are often overlooked in favour of an oscillatory perspective.

The authors consider a range of analysis pipelines to show that the core results are robust to key analysis decisions. Two different parcellations and methods for computing transition probabilities are considered and the results are shown to hold when using diffusion MR data from the HCP project.

Weaknesses

The authors claim that these results are unlikely to be caused or affected by linear mixing or volume conduction - however this is not clear to me based on the presented information. Specifically, if a perturbation arises in one region and is mixed by volume conduction into a second region, part of its shape will be preserved but this will be at a lower overall amplitude. Therefore, as the whole perturbation shape will be scaled down in the second mixed region, it is likely that its rising edge will reach the z-score threshold at a later time than in the original signal. In this way linear mixing by volume conduction has the potential to create spurious time-lagged in this analysis. Previous literature on neuronal avalanches in MEG have included extensive control analyses and discussions on linear signal mixing for this reason (10.1523/JNEUROSCI.4286-12.2013). This point is not tackled in the analysis and not clearly discussed in the paper.

The correlation in Figure 2 B and C is interesting but is not supported by control analyses to account for confounds. For example, ROI size could potentially lead to more apparent structural connectivity and stronger MEG signal driving an apparent correlation between the modalities. This authors conclusions would be better supported if such effects were ruled out.

The main results are not well developed from the available data. The group level correlations are visualised and the subject-specific correlations are brieflly shown but not described in detail. It is unclear which regions and connections show the highest correlations. Similarly, there is wide between subject variability in the structure<->function correlation which ranges betwee 0.1 and 0.35 but the analysis does not explore whether this is reproducible, neuronal variability or driven by differences in SNR.

4. Evaluation Summary:

The present paper addresses the relationship between the electrophysiological and the anatomical connectomes, utilising a method to describe avalances of activity. The editors feel that this work might be pushing the limits of MEG as a modality, since it implies more spatial precision that most would assume possible, which makes the manuscript particularly interesting to M/EEG researchers. While all reviewers agree that the paper has broad interest and the method is promising, some potential concerns have however been raised that compromise the validity of the results. Most importantly: the issue of volume conduction (proximity) driving the results as opposed to anatomical connectivity, which in the worst case could deemed the results trivial. Other confounds, such as the size of the parcels and their SNR, would also require major review.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

#### URL

27. www.biorxiv.org www.biorxiv.org
1. Reviewer #2 (Public Review):

Certain biological structures have evolved to attain certain forms that may enhance their function. The authors suggest that the shape of a cilium can enhance its sensory function in both quiescent fluid and shear flow, and compared the extent of this enhancement in a number of representative settings. This simple yet compelling possibility has not been explored in detail previously, and is deserving of further attention from both theoretical and experimental perspectives.

The present work is clearly a step in the right direction, proposing a quantitative framework and systematic approach to address this problem. The authors first extended the classical study by Berg and Purcell for spherical absorbers to prolate spheroids with slender aspect ratio, and compared this with a circular patch, showing the effectiveness of a cilium as a receptor. They then incorporated shear flow, showing that the cilium again outperforms a patch. Finally, they considered the case of an actively beating cilium or a motile bundle - a case which may be important for symmetry breaking in the vertebrate node.

However, a weakness of the current set-up is that it is highly idealised. To improve the overall impact and biological relevance of this work more careful analysis and simulations would be needed.

2. Reviewer #1 (Public Review):

The authors consider the effects of the cilium geometry and motility on its performance in detecting chemicals in the surrounding fluid. They begin by presenting a classic solution of the diffusion equation in an infinite fluid domain at rest, bounded internally by a single cilium. The cilium is modeled as a cylinder of finite length and perfectly absorbing boundary. They compare the capture rate of ambient chemicals at the cilium boundary to that of an absorbing circular patch on a reflecting wall of similar surface area. The latter is another classic solution of the diffusion equation. They find that the capture rate by the cilium exceeds the capture rate by the circular patch. Then, they solve the advection-diffusion equation around the cilium numerically, assuming perfectly absorbing boundary conditions along the cilium and reflecting boundary conditions on the wall. They apply this numerical framework to cases (i) where cilium is at rest in an external shear flow, (ii) where the cilium is actively beating, and (iii) where a bundle of hydrodynamically-interacting cilia are either at rest or actively beating. They observe an increase in capture rate when shear flows and motility are accounted for.

3. Evaluation Summary:

The authors consider how the geometry and motility of cilia affect their performance in detecting chemicals in the surrounding fluid. Based on a theoretical model, the authors suggest that the distinctive elongated shape of a cilium may be coupled to its sensory function. The conjectures presented in this work are likely to be of interest to a wide readership, but whether this actually applies to real biological systems requires more careful validation.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

#### URL

28. www.biorxiv.org www.biorxiv.org
1. ## Preprint Review

This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

### Summary:

This report examines the mechanism by which the KSHV KaposinB (KapB) protein causes disassembly of processing bodies (PBs) in HUVECs. The authors show that the oncogenic transcription factor YAP is an important component in the signaling pathway of KapB of the oncogenic herpesvirus Kaposi's Sarcoma herpesvirus, which involves the host cell GTPase RhoA, leading to disassembly of processing bodies (PBs).

#### URL

29. www.biorxiv.org www.biorxiv.org
1. Reviewer #2 (Public Review):

In this manuscript, McLeod and Gandon present a thorough mathematical modeling framework to describe the evolution of multi-drug resistance (MDR) in microbial populations. By expressing the model in terms of linkage disequilibrium, the equations take on a form that make it easier to identify the key drivers of MDR evolution and propagation. This work helps to unify and generalize previous studies and constitutes an important advance in our understanding of microbial population dynamics.

2. Reviewer #1 (Public Review):

In this manuscript, McLeod and Gandon propose a framework for understanding multidrug resistance (MDR) evolution in a structured population in terms of linkage disequilibrium (LD) dynamics, and apply this framework to three concrete examples of MDR evolution. I was asked to evaluate this manuscript, as well as the authors' response to comments from previous reviewers. My expertise is in epidemiological modelling of antibiotic resistance; I am not hugely familiar with population genetics.

Overall, I think the authors address an important and interesting question, and I think the approach has the potential to generate valuable insights. I also think the authors addressed the previous reviewers' comments well. However, I have substantial concerns about the modelling framework and the interpretation of the results. In particular: i) there are some problems with the interpretation that LD arises from variation in susceptible density; ii) presenting these results as a re-interpretation and generalisation of Lehtinen et al. 2019 is incorrect; and iii) the modelling of additive transmission costs needs further thought/explanation.

1) Interpretation of results and re-interpretation of Lehtinen et al. 2019.

The authors present their results as a generalisation of the effect observed in Lehtinen et al. 2019. Both models show that variation in the strength of selection for resistance between populations can give rise to LD in a model of multiple resistances. In Lehtinen et al., this variation in selection is attributed to variation in clearance rate. The authors re-interpreting the effect as arising from variation in susceptible density instead. This re-interpretation is incorrect: the change in how costs of resistance are modelled (additive here, multiplicative in Lehtinen et al.) changes the evolutionary dynamics, so the two models capture different evolutionary effects. (See points 2 and 3 for further discussion of additive vs multiplicative costs).

One way to see this is to consider a simple model of single resistance as presented in Lehtinen et al. eqn 1, in which resistance is selected for when: B_r/a_r > B_s/(a_s + tau), where "B" is the transmission rate, "a" the clearance rate and tau the treatment rate. Re-arranging for tau shows how the threshold of selection for resistance depends on the strain's properties (B and a) under different assumptions about cost. With an additive cost in transmission (i.e. B_r = B_s - c), this threshold depends on both transmission rate and clearance rate, predicting LD if populations vary in either transmissibility or duration of carriage. With an additive cost in clearance, this threshold is independent of the strain's properties, predicting no LD. These are precisely the results the authors describe lines 268-277 and Figure 3.

However, if the costs are multiplicative, this threshold depends on clearance rate only, whether costs are modelled as part of clearance or transmission rate. This is why the model in Lehtinen et al. 2019 predicts LD when populations vary in duration of carriage, even when there is no transmission cost. The author's re-interpretation of the effect in Lehtinen et al. as arising from variation in the density of susceptibles, contingent on an explicit transmission cost, is therefore not correct. More generally, representing one model as a generalisation of the other is misleading.

I am also not sure about the authors' interpretation that the effects in the model with additive costs arise from variation in susceptible density. Variation in the density of susceptibles can also be generated by variation in the overall population density, so if I understand correctly, this interpretation would predict that LD would arise if the population density was different between populations? And that the selective pressure on single resistance would also depend on overall population density (argument stating line 261)? I am not able to reproduce this dependence of population density in a simple model. I would instead interpret the effect the authors observe as arising because the same additive transmission cost is much more significant if the baseline transmission rate is low (e.g. with c = 1, a strain with B_s 1 would never evolve resistance because B_r would be 0, which would not be the case for a strain with baseline transmission rate B_s = 3).

The problem with the interpretation in terms of susceptible density is clear in the section on serotype dynamics. The main text refers to serotype-specific susceptibles (S^x) (line 303) and explains observed effects in terms of variation in S^x. In the supporting information however, the authors present a model of serotype dynamics which does not have serotype-specific susceptible classes and the pool of susceptibles is the same for all serotypes (eqn 43). While I absolutely agree this is a better model to study transient effects than introducing a serotype-specific susceptible class, I don't understand what the authors mean by serotype-specific susceptible density in the main text.

2) The use of an additive transmission cost

The use of an additive transmission cost requires further consideration/discussion. An additive transmission cost is difficult to interpret epidemiologically and can lead to implausible consequences. For example, if costs are high enough compared to baseline transmission rate, additive costs with no epistasis would lead to a negative transmission rate for the dually resistant strains, which does not make sense (say B_ab = 2 and B_Ab = B_aB = 0.5, then B_AB = -1).

3) Why is epistasis defined in terms of an additive rather than multiplicative expectation?

I also have quite a basic question about the overall framework (eqn. 2). In the modelling framework, epistasis is the difference between the actual per capita growth rate of the dually-resistant infections and the expected growth rate, defined as the sum of the difference between the growth rates of the singly-resistant infections and the baseline rate. It was not obvious to me whether the expectation needs to be additive, or whether this is a question of definition (could the expectation be defined, for example, as a multiplicative rather than additive effect?). In particular, I was wondering about this in the context of the authors' suggestion that multiplicative costs are problematic because they give rise to epistasis - this seemed a little tautological to me because epistasis has been specifically defined as deviation from an additive expectation. I think a discussion about why epistasis is defined in terms of additive effects, and the implications for the derivation of the dynamics of D, would be very interesting and also helpful in making the paper more accessible.

3. Evaluation Summary:

This paper addresses the important question of multidrug resistance evolution, which is of both theoretical and applied interest. The authors efforts to carefully distinguish population and metapopulation linkage disequilibrium and to develop a framework to rigorously analyze the relationship between the two has promise, although we have noted concerns about the modeling framework used and results interpretation. If these concerns can be sufficiently addressed, then this paper has the potential to represent a clear advance in our understanding of microbial population dynamics.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

#### URL

30. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

In this manuscript Rao et al. describe an interesting relationship between KSR1 and the translation regulation of EPSTI1 (a regulator of EMT). They identified this relationship by polysome RNAseq of CRC cells in the context of KSR1 knockdown (KD) which they confirm by polysome QPCR. They then go on to show that KSR KD and add back influences EPSTI1 expression at the protein but not mRNA level and impacts cell viability, anchorage-independent growth, and possibly cell migration. They focus on the cell migration phenotype and show that it is associated with changes in EMT-related genes including E-cad and N-cad. Interestingly, add back of EPSTI1 can reverse the phenotype elicited by KSR1 deletion. Overall, this story is interesting and translation regulation by KSR1 has not been described previously. However, Rao et al. do not provide a mechanism for how KSR1 regulates the translation of EPSTI1, and it is unclear whether this occurs through eIF4E, as the authors suggest.

We agree completely that our observation that KSR1-dependent ERK regulation of EPSTI1 to promote an EMT-like phenotype raises new questions regarding how the translation of EPSTI1 mRNA is regulated. An additional intriguing question that arises from out work is how this relatively nondescript protein enhances the E- to N-cadherin switch in the colon cancer cells. Multiple possibilities (e.g., altered RNA processing or ribosome heterogeneity) may mediate ERK-dependent regulation of EPSTI1 translation and induction of the cadherin switch. RNA-binding proteins affect discrete cell behaviors, including motility and invasion, by selectively regulating pre-mRNA splicing, mRNA stability, and localization. However, it is hard to imagine a general mechanism involving ERK-mediated regulation of 4E-BP1 and eIF4E, which would affect global mRNA translation, as responsible for a selective effect EPSTI1 mRNA translation and discrete components of EMT-like behavior. Indeed, while KSR1 disruption and ERK inhibition potently suppressed EPSTI1 translation, robust inhibition of mTOR signaling had little effect on EPSTI1. Further development of the detailed cellular mechanisms and critical regulators mediating translation- dependent EMT-like behavior should now be possible.

Reviewer #2 (Public Review):

KSR1 functions as a critical rheostat to fine-tune MAPK signalling, and identifying modes by which its over-expression promotes tumor progression is clinically important and potentially druggable. Ras is highly mutated in CRC and unfortunately inhibitors of Ras have been challenging to develop. However, small molecules which stabilize an inactive form of the KSR are actively being developed in an attempt to repress RAS signaling. Thus, this study, which seeks to identify how KSR1 promotes oncogenic mRNA translation, is potentially highly clinically relevant, as it may identify novel druggable targets.

In this manuscript the authors performed polysome profiling in colorectal cancer (CRC) cells and proposed that KSR1 and ERK regulate the translation of EPSTI1 mRNA. They go on to characterize the phenotypes associated with knock-down or knock-out of KSR1 in CRC, and show that their defects in invasion, anchorage-independent growth and switch to a less EMT-like phenotype are all EPSTI1-dependent.

The authors succeeded in providing ample in vitro data that KSR1 and EPSTI1 are potential therapeutic targets in CRC. However, the data demonstrating that KSR1 and ERK regulate EPSTI1 mRNA translation is tenuous. Although the authors state that "EPSTI1 is necessary and sufficient for EMT in CRC cells", the data presented are consistent with a more restrained conclusion of a partial-EMT and not EMT per se. Finally, without an in vivo model it is difficult to glean novel insight into the mechanism by which KSR1 and/or EPSTI1 control the invasive and metastatic behaviour of cells.

We greatly appreciate your comments and are excited about the implications of KSR1-EPSTI1 signaling in promoting the EMT-like phenotype in colon cancer cell lines. We have corrected the use of term ‘EMT’ to ‘EMT-like phenotype’ within the text of the manuscript. We recognize the limitations of using only in vitro data to demonstrate the role of KSR1 and EPSTI1 in promoting motility and invasion in colon cancer cells. In vivo studies will be invaluable to our future efforts to determine the extent to which EPSTI1 promotes metastatic behavior in colon tumors.

Reviewer #3 (Public Review):

It is established that Kinase suppressor of Ras 1 (KSR1) contributes to the oncogenic actions of Ras by promoting ERK activation. However, the downstream actions of this pathway are poorly understood. Here Rao et al. demonstrate that this KSR1-dependent pathway increases translation of Epithelial-Stromal Interaction-1 (EPSTI1) mRNA and expression of EPSTI1 protein. This is significant because EPSTI1 drives aspects of EMT, including expression of ZEB1, SLUG, and N-Cadherin. The analysis is thorough and includes both loss-of-function and gain-of-function studies. Overall, the conclusions of this study are convincing and advance our understanding of cancer development.

We appreciate the positive feedback, and we are excited on implications of our findings on translation regulation of KSR1 on EPSTI1.

#### URL

31. www.biorxiv.org www.biorxiv.org
1. Reviewer #3 (Public Review):

Meier et al. used electroencephalography (EEG) to test the mechanism underlying a well-known phenomenon where stress induces subjects to behave in a more habitual way during decision-making, as opposed to using a more deliberative goal-directed strategy. The authors tested two groups of human subjects who were randomly assigned to a stress manipulation or a similar control manipulation. These participants then carried out a reinforcement learning task where they had to choose between two alternative responses to a stimulus. On some blocks the value of one response would be 'devalued' such that the alternative action would be more appropriate. Participants who went through the stress manipulation were more likely to persist with an action that previously yielded a high reward outcome even when this response had been devalued - indicative of a failure in goal-directed decision-making. Critically, the authors associated responses and outcomes with stimuli that were decodable from EEG signals, making it possible to evaluate whether participants were prospectively considering the correct response or outcome prior to committing a response or receiving feedback. Meier et al. find that, over time, the stressed participants came to prospectively represent the coming response more and the outcome less, while the control group showed reduced prospective representation of the response. The degree of this change toward greater representation of responses versus outcomes across participants was also correlated with a more habit-based decision strategy in devaluation trials.

Overall, this is a well-designed and sophisticated study that makes an important contribution to our understanding of the mechanism by which stress promotes more habit-like behavior, with broad implications for our understanding of how maladaptive behaviors might be formed in many clinical conditions. The conclusions are well supported by the data and confidence in the results is bolstered by several additional control measurements. However, I would have appreciated more effort to link this work to other related literature, as well as some more detail in some parts of the methods and additional control analyses to rule out alternative explanations for some of the main results of interest.

2. Reviewer #2 (Public Review):

A number of psychological states and traits have been demonstrated to render behavior under goal-directed or habitual control, stress being one of them. In this paper, using electroencephalography, the authors investigated the neural representations of stimulus, responses and outcomes in a task whose aim was to distinguish between the two types of behavioral control. By training a classifier to distinguish between neural signals related to the representations of instrumental responses and the outcomes produced by those responses, the authors found that during the last block of the experiment (after more extended training in the task), signals for outcome representations were weaker and response representations stronger in a stress-induced group compared to a control group. This is consistent with the idea that habits are performed when there is a stronger link between stimuli and responses that does not require a representation of the outcomes that follow from behavior. Although the methods of this paper are sound and the idea interesting and relevant for the current state of the art in habit research, it is not clear if the underlying theoretical contribution it should motivate is supported by the data produced by the experimental design employed by the authors.

3. Reviewer #1 (Public Review):

The authors used EEG-based multivariate pattern analysis and acute stress induction to assess the neural representations mediating a previously demonstrated influence of stress on the balance between goal-directed and habitual responding. They found that stress reduced neural outcome representations and enhanced response representations - results that are consistent with associative structures thought to mediate goal-directed and habitual response strategies, respectively. The study addresses an important and open question, and the combination of clinical, neural and behavioral assays is appealing. However, the interpretability, and thus impact, is threatened by an apparent lack of temporal synchrony between relevant measures, and by the potential effects of social feedback.

Specifically, it is hard to understand how neural and behavioral devaluation differences between groups can be stress related given that they emerge at a point when differences in stress measures (e.g., cortisol) are no longer present. It seems more likely that, at the time when devaluation insensitivity became more pronounced in the stress group, this group was being released from stress, perhaps experiencing corollary fatigue or buoyancy.

Another concern is that it is unclear whether the "Error" feedback screen was being employed during devaluation blocks. This is important, because most human psychology experiments use accuracy as the only incentive, and it appears to be a pretty effective motivator. Given that participants in the stress group had just been subjected to an aversive social stressor, they might have found the socially relevant error feedback more painful than the relatively minor response cost.

4. Evaluation Summary:

The authors used EEG-based multivariate pattern analysis and acute stress induction to assess the neural representations mediating a previously demonstrated influence of stress on the balance between goal-directed and habitual responding. While the results should be of interest to a wide range of neuroscientists, the temporal alignment of clinical, behavioral, and neural measures somewhat obscures the underlying causal mechanisms.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

#### URL

32. www.biorxiv.org www.biorxiv.org
1. Author Response:

Evaluation Summary:

Since DBS of the habenula is a new treatment, these are the first data of its kind and potentially of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap. This paper is of interest to neuroscientists studying emotions and clinicians treating psychiatric disorders. Specifically the paper shows that the habenula is involved in processing of negative emotions and that it is synchronized to the prefrontal cortex in the theta band. These are important insights into the electrophysiology of emotion processing in the human brain.

The authors are very grateful for the reviewers’ positive comments on our study. We also thank all the reviewers for the comments which has helped to improve the manuscript.

Reviewer #1 (Public Review):

The study by Huang et al. report on direct recordings (using DBS electrodes) from the human habenula in conjunction with MEG recordings in 9 patients. Participants were shown emotional pictures. The key finding was a transient increase in theta/alpha activity with negative compared to positive stimuli. Furthermore, there was a later increase in oscillatory coupling in the same band. These are important data, as there are few reports of direct recordings from the habenula together with the MEG in humans performing cognitive tasks. The findings do provide novel insight into the network dynamics associated with the processing of emotional stimuli and particular the role of the habenula.

Recommendations:

How can we be sure that the recordings from the habenula are not contaminated by volume conduction; i.e. signals from neighbouring regions? I do understand that bipolar signals were considered for the DBS electrode leads. However, high-frequency power (gamma band and up) is often associated with spiking/MUA and considered less prone to volume conduction. I propose to also investigate that high-frequency gamma band activity recorded from the bipolar DBS electrodes and relate to the emotional faces. This will provide more certainty that the measured activity indeed stems from the habenula.

We thank the reviewer for the comment. As the reviewer pointed out, bipolar macroelectrode can detect locally generated potentials, as demonstrated in the case of recordings from subthalamic nucleus and especially when the macroelectrodes are inside the subthalamic nucleus (Marmor et al., 2017). However, considering the size of the habenula and the size of the DBS electrode contacts, we have to acknowledge that we cannot completely exclude the possibility that the recordings are contaminated by volume conduction of activities from neighbouring areas, as shown in Bertone-Cueto et al. 2019. We have now added extra information about the size of the habenula and acknowledged the potential contamination of activities from neighbouring areas through volume conduction in the ‘Limitation’:

"Another caveat we would like to acknowledge that the human habenula is a small region. Existing data from structural MRI scans reported combined habenula (the sum of the left and right hemispheres) volumes of ~ 30–36 mm3 (Savitz et al., 2011a; Savitz et al., 2011b) which means each habenula has the size of 2~3 mm in each dimension, which may be even smaller than the standard functional MRI voxel size (Lawson et al., 2013). The size of the habenula is also small relative to the standard DBS electrodes (as shown in Fig. 2A). The electrodes used in this study (Medtronic 3389) have electrode diameter of 1.27 mm with each contact length of 1.5 mm, and contact spacing of 0.5 mm. We have tried different ways to confirm the location of the electrode and to select the contacts that is within or closest to the habenula: 1.) the MRI was co-registered with a CT image (General Electric, Waukesha, WI, USA) with the Leksell stereotactic frame to obtain the coordinate values of the tip of the electrode; 2.) Post-operative CT was co-registered to pre-operative T1 MRI using a two-stage linear registration using Lead-DBS software. We used bipolar signals constructed from neighbouring macroelectrode recordings, which have been shown to detect locally generated potentials from subthalamic nucleus and especially when the macroelectrodes are inside the subthalamic nucleus (Marmor et al., 2017). Considering that not all contacts for bipolar LFP construction are in the habenula in this study, as shown in Fig. 2, we cannot exclude the possibility that the activities we measured are contaminated by activities from neighbouring areas through volume conduction. In particular, the human habenula is surrounded by thalamus and adjacent to the posterior end of the medial dorsal thalamus, so we may have captured activities from the medial dorsal thalamus. However, we also showed that those bipolar LFPs from contacts in the habenula tend to have a peak in the theta/alpha band in the power spectra density (PSD); whereas recordings from contacts outside the habenula tend to have extra peak in beta frequency band in the PSD. This supports the habenula origin of the emotional valence related changes in the theta/alpha activities reported here."

We have also looked at gamma band oscillations or high frequency activities in the recordings. However, we didn’t observe any peak in high frequency band in the average power spectral density, or any consistent difference in the high frequency activities induced by the emotional stimuli (Fig. S1). We suspect that high frequency activities related to MUA/spiking are very local and have very small amplitude, so they are not picked up by the bipolar LFPs measured from contacts with both the contact area for each contact and the between-contact space quite large comparative to the size of the habenula.

A

B

Figure S1. (A) Power spectral density of habenula LFPs across all time period when emotional stimuli were presented. The bold blue line and shadowed region indicates the mean ± SEM across all recorded hemispheres and the thin grey lines show measurements from individual hemispheres. (B) Time-frequency representations of the power response relative to pre-stimulus baseline for different conditions showing habenula gamma and high frequency activity are not modulated by emotional

References:

Savitz JB, Bonne O, Nugent AC, Vythilingam M, Bogers W, Charney DS, et al. Habenula volume in post-traumatic stress disorder measured with high-resolution MRI. Biology of Mood & Anxiety Disorders 2011a; 1(1): 7.

Savitz JB, Nugent AC, Bogers W, Roiser JP, Bain EE, Neumeister A, et al. Habenula volume in bipolar disorder and major depressive disorder: a high-resolution magnetic resonance imaging study. Biological Psychiatry 2011b; 69(4): 336-43.

Lawson RP, Drevets WC, Roiser JP. Defining the habenula in human neuroimaging studies. NeuroImage 2013; 64: 722-7.

Marmor O, Valsky D, Joshua M, Bick AS, Arkadir D, Tamir I, et al. Local vs. volume conductance activity of field potentials in the human subthalamic nucleus. Journal of Neurophysiology 2017; 117(6): 2140-51.

Bertone-Cueto NI, Makarova J, Mosqueira A, García-Violini D, Sánchez-Peña R, Herreras O, et al. Volume-Conducted Origin of the Field Potential at the Lateral Habenula. Frontiers in Systems Neuroscience 2019; 13:78.

Figure 3: the alpha/theta band activity is very transient and not band-limited. Why refer to this as oscillatory? Can you exclude that the TFRs of power reflect the spectral power of ERPs rather than modulations of oscillations? I propose to also calculate the ERPs and perform the TFR of power on those. This might result in a re-interpretation of the early effects in theta/alpha band.

We agree with the reviewer that the activity increase in the first time window with short latency after the stimuli onset is very transient and not band-limited. This raise the question that whether this is oscillatory or a transient evoked activity. We have now looked at this initial transient activity in different ways: 1.) We quantified the ERP in LFPs locked to the stimuli onset for each emotional valence condition and for each habenula. We investigated whether there was difference in the amplitude or latency of the ERP for different stimuli emotional valence conditions. As showing in the following figure, there is ERP with stimuli onset with a positive peak at 402 ± 27 ms (neutral stimuli), 407 ± 35 ms (positive stimuli), 399 ± 30 ms (negative stimuli). The flowing figure (Fig. 3–figure supplement 1) will be submitted as figure supplement related to Fig. 3. However, there was no significant difference in ERP latency or amplitude caused by different emotional valence stimuli. 2.) We have quantified the pure non-phase-locked (induced only) power spectra by calculating the time-frequency power spectrogram after subtracting the ERP (the time-domain trial average) from time-domain neural signal on each trial (Kalcher and Pfurtscheller, 1995; Cohen and Donner, 2013). This shows very similar results as we reported in the main manuscript, as shown in Fig. 3–figure supplement 2. These further analyses show that even though there were event related potential changes time locked around the stimuli onset, and this ERP did NOT contribute to the initial broad-band activity increase at the early time window shown in plot A-C in Figure 3. The figures of the new analyses and following have now been added in the main text:

"In addition, we tested whether stimuli-related habenula LFP modulations primarily reflect a modulation of oscillations, which is not phase-locked to stimulus onset, or, alternatively, if they are attributed to evoked event-related potential (ERP). We quantified the ERP for each emotional valence condition for each habenula. There was no significant difference in ERP latency or amplitude caused by different emotional valence stimuli (Fig. 3–figure supplement 1). In addition, when only considering the non phase-locked activity by removing the ERP from the time series before frequency-time decomposition, the emotional valence effect (presented in Fig. 3–figure supplement 2) is very similar to those shown in Fig.3. These additional analyses demonstrated that the emotional valence effect in the LFP signal is more likely to be driven by non-phase-locked (induced only) activity."

A

B

Fig. 3–figure supplement 1. Event-related potential (ERP) in habenula LFP signals in different emotional valence (neutral, positive and negative) conditions. (A) Averaged ERP waveforms across patients for different conditions. (B) Peak latency and amplitude (Mean ± SEM) of the ERP components for different conditions.

Fig. 3–figure supplement 2. Non-phase-locked activity in different emotional valence (neutral, positive and negative) conditions (N = 18). (A) Time-frequency representation of the power changes relative to pre-stimulus baseline for three conditions. Significant clusters (p < 0.05, non-parametric permutation test) are encircled with a solid black line. (B) Time-frequency representation of the power response difference between negative and positive valence stimuli, showing significant increased activity the theta/alpha band (5-10 Hz) at short latency (100-500 ms) and another increased theta activity (4-7 Hz) at long latencies (2700-3300 ms) with negative stimuli (p < 0.05, non-parametric permutation test). (C) Normalized power of the activities at theta/alpha (5-10 Hz) and theta (4-7 Hz) band over time. Significant difference between the negative and positive valence stimuli is marked by a shadowed bar (p < 0.05, corrected for multiple comparison).

References:

Kalcher J, Pfurtscheller G. Discrimination between phase-locked and non-phase-locked event-related EEG activity. Electroencephalography and Clinical Neurophysiology 1995; 94(5): 381-4.

Cohen MX, Donner TH. Midfrontal conflict-related theta-band power reflects neural oscillations that predict behavior. Journal of Neurophysiology 2013; 110(12): 2752-63.

Figure 4D: can you exclude that the frontal activity is not due to saccade artifacts? Only eye blink artifacts were reduced by the ICA approach. Trials with saccades should be identified in the MEG traces and rejected prior to further analysis.

We understand and appreciate the reviewer’s concern on the source of the activity modulations shown in Fig. 4D. We tried to minimise the eye movement or saccade in the recording by presenting all figures at the centre of the screen, scaling all presented figures to similar size, and presenting a white cross at the centre of the screen preparing the participants for the onset of the stimuli. Despite this, participants my still make eye movements and saccade in the recording. We used ICA to exclude the low frequency large amplitude artefacts which can be related to either eye blink or other large eye movements. However, this may not be able to exclude artefacts related to miniature saccades. As shown in Fig. 4D, on the sensor level, the sensors with significant difference between the negative vs. positive emotional valence condition clustered around frontal cortex, close to the eye area. However, we think this is not dominated by saccades because of the following two reasons:

1.) The power spectrum of the saccadic spike artifact in MEG is characterized by a broadband peak in the gamma band from roughly 30 to 120 Hz (Yuval-Greenberg et al., 2008; Keren et al., 2010). In this study the activity modulation we observed in the frontal sensors are limited to the theta/alpha frequency band, so it is different from the power spectra of the saccadic spike artefact.

2.) The source of the saccadic spike artefacts in MEG measurement tend to be localized to the region of the extraocular muscles of both eyes (Carl et al., 2012).We used beamforming source localisation to identify the source of the activity modulation reported in Fig. 4D. This beamforming analysis identified the source to be in the Broadmann area 9 and 10 (shown in Fig. 5). This excludes the possibility that the activity modulation in the sensor level reported in Fig. 4D is due to saccades. In addition, Broadman area 9 and 10, have previously been associated with emotional stimulus processing (Bermpohl et al., 2006), Broadman area 9 in the left hemisphere has also been used as the target for repetitive transcranial magnetic stimulation (rTMS) as a treatment for drug-resistant depression (Cash et al., 2020). The source localisation results, together with previous literature on the function of the identified source area suggest that the activity modulation we observed in the frontal cortex is very likely to be related to emotional stimuli processing.

References:

Yuval-Greenberg S, Tomer O, Keren AS, Nelken I, Deouell LY. Transient induced gamma-band response in EEG as a manifestation of miniature saccades. Neuron 2008; 58(3): 429-41.

Keren AS, Yuval-Greenberg S, Deouell LY. Saccadic spike potentials in gamma-band EEG: characterization, detection and suppression. NeuroImage 2010; 49(3): 2248-63.

Carl C, Acik A, Konig P, Engel AK, Hipp JF. The saccadic spike artifact in MEG. NeuroImage 2012; 59(2): 1657-67.

Bermpohl F, Pascual-Leone A, Amedi A, Merabet LB, Fregni F, Gaab N, et al. Attentional modulation of emotional stimulus processing: an fMRI study using emotional expectancy. Human Brain Mapping 2006; 27(8): 662-77.

Cash RFH, Weigand A, Zalesky A, Siddiqi SH, Downar J, Fitzgerald PB, et al. Using Brain Imaging to Improve Spatial Targeting of Transcranial Magnetic Stimulation for Depression. Biological Psychiatry 2020.

The coherence modulations in Fig 5 occur quite late in time compared to the power modulations in Fig 3 and 4. When discussing the results (in e.g. the abstract) it reads as if these findings are reflecting the same process. How can the two effect reflect the same process if the timing is so different?

As the reviewer pointed out correctly, the time window where we observed the coherence modulations happened quite late in time compared to the initial power modulations in the frontal cortex and the habenula (Fig. 4). And there was another increase in the theta band activities in the habenula area even later, at around 3 second after stimuli onset when the emotional figure has already disappeared. Emotional response is composed of a number of factors, two of which are the initial reactivity to an emotional stimulus and the subsequent recovery once the stimulus terminates or ceases to be relevant (Schuyler et al., 2014). We think these neural effects we observed in the three different time windows may reflect different underlying processes. We have discussed this in the ‘Discussion’:

"These activity changes at different time windows may reflect the different neuropsychological processes underlying emotion perception including identification and appraisal of emotional material, production of affective states, and autonomic response regulation and recovery (Phillips et al., 2003a). The later effects of increased theta activities in the habenula when the stimuli disappeared were also supported by other literature showing that, there can be prolonged effects of negative stimuli in the neural structure involved in emotional processing (Haas et al., 2008; Puccetti et al., 2021). In particular, greater sustained patterns of brain activity in the medial prefrontal cortex when responding to blocks of negative facial expressions was associated with higher scores of neuroticism across participants (Haas et al., 2008). Slower amygdala recovery from negative images also predicts greater trait neuroticism, lower levels of likability of a set of social stimuli (neutral faces), and declined day-to-day psychological wellbeing (Schuyler et al., 2014; Puccetti et al., 2021)."

References:

Schuyler BS, Kral TR, Jacquart J, Burghy CA, Weng HY, Perlman DM, et al. Temporal dynamics of emotional responding: amygdala recovery predicts emotional traits. Social Cognitive and Affective Neuroscience 2014; 9(2): 176-81.

Phillips ML, Drevets WC, Rauch SL, Lane R. Neurobiology of emotion perception I: The neural basis of normal emotion perception. Biological Psychiatry 2003a; 54(5): 504-14.

Haas BW, Constable RT, Canli T. Stop the sadness: Neuroticism is associated with sustained medial prefrontal cortex response to emotional facial expressions. NeuroImage 2008; 42(1): 385-92.

Puccetti NA, Schaefer SM, van Reekum CM, Ong AD, Almeida DM, Ryff CD, et al. Linking Amygdala Persistence to Real-World Emotional Experience and Psychological Well-Being. Journal of Neuroscience 2021: JN-RM-1637-20.

Be explicit on the degrees of freedom in the statistical tests given that one subject was excluded from some of the tests.

We thank the reviewers for the comment. The number of samples used for each statistics analysis are stated in the title of the figures. We have now also added the degree of freedom in the main text when parametric statistical tests such as t-test or ANOVAs have been used. When permutation tests (which do not have any degrees of freedom associated with it) are used, we have now added the number of samples for the permutation test.

Reviewer #2 (Public Review):

In this study, Huang and colleagues recorded local field potentials from the lateral habenula in patients with psychiatric disorders who recently underwent surgery for deep brain stimulation (DBS). The authors combined these invasive measurements with non-invasive whole-head MEG recordings to study functional connectivity between the habenula and cortical areas. Since the lateral habenula is believed to be involved in the processing of emotions, and negative emotions in particular, the authors investigated whether brain activity in this region is related to emotional valence. They presented pictures inducing negative and positive emotions to the patients and found that theta and alpha activity in the habenula and frontal cortex increases when patients experience negative emotions. Functional connectivity between the habenula and the cortex was likewise increased in this band. The authors conclude that theta/alpha oscillations in the habenula-cortex network are involved in the processing of negative emotions in humans.

Because DBS of the habenula is a new treatment tested in this cohort in the framework of a clinical trial, these are the first data of its kind. Accordingly, they are of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap.

In terms of community impact, I see the strengths of this paper in basic science rather than the clinical field. The authors demonstrate the involvement of theta oscillations in the habenula-prefrontal cortex network in emotion processing in the human brain. The potential of theta oscillations to serve as a marker in closed-loop DBS, as put forward by the authors, appears less relevant to me at this stage, given that the clinical effects and side-effects of habenula DBS are not known yet.

We thank the reviewers for the favourable comments about the implication of our study in basic science and about the value of our study in closing a knowledge gap. We agree that further studies would be required to make conclusions about the clinical effects and side-effects of habenula DBS.

The group-average MEG power spectrum (Fig. 4B) suggests that negative emotions lead to a sustained theta power increase and a similar effect, though possibly masked by a visual ERP, can be seen in the habenula (Fig. 3C). Yet the statistics identify brief elevations of habenula theta power at around 3s (which is very late), a brief elevation of prefrontal power a time 0 or even before (Fig. 4C) and a brief elevation of Habenula-MEG theta coherence around 1 s. It seems possible that this lack of consistency arises from a low signal-to-noise ratio. The data contain only 27 trails per condition on average and are contaminated by artifacts caused by the extension wires.

With regard to the nature of the activity modulation with short latency after stimuli onset: whether this is an ERP or oscillation? We have now investigated this. In summary, by analysing the ERP and removing the influence of the ERP from the total power spectra, we didn’t observe stimulus emotional valence related modulation in the ERP, and the modulation related to emotional valence in the pure induced (non-phase-locked) power spectra was similar to what we have observed in the total power shown in Fig. 3. Therefore, we argue that the theta/alpha increase with negative emotional stimuli we observed in both habenula and prefrontal cortex 0-500 ms after stimuli onset are not dominated by visual or other ERP.

With regard to the signal-to-noise ratio from only 27 trials per condition on average per participant: We have tried to clean the data by removing the trials with obvious artefacts characterised by increased measurements in the time domain over 5 times the standard deviation and increased activities across all frequency bands in the frequency domain. After removing the trials with artefacts, we have 27 trials per condition per subject on average. We agree that 27 trials per condition on average is not a high number, and increasing the number of trials would further increase the signal-to-noise ratio. However, our studies with EEG recordings and LFP recordings from externalised patients have shown that 30 trials was enough to identify reduction in the amplitude of post-movement beta oscillations at the beginning of visuomotor adaption in the motor cortex and STN (Tan et al., 2014a; Tan et al., 2014b). These results of motor error related modulation in the post-movement beta have been repeated by other studies from other groups. In Tan et al. 2014b, with simultaneous EEG and STN LFP measurements and a similar number of trials (around 30), we also quantified the time-course of STN-motor cortex coherence during voluntary movements. This pattern has also been repeated in a separate study from another group with around 50 trials per participant (Talakoub et al., 2016). In addition, similar behavioural paradigm (passive figure viewing paradigm) has been used in two previous studies with LFP recordings from STN from different patient groups (Brucke et al., 2007; Huebl et al., 2014). In both studies, a similar number of trials per condition around 27 was used. The authors have identified meaningful activity modulation in the STN by emotional stimuli. Therefore, we think the number of trials per condition was sufficient to identify emotional valence induced difference in the LFPs in the paradigm.

We agree that the measurement of coherence can be more susceptible to noise and suffer from the reduced signal-to-noise ratio in MEG recording. In Hirschmann et al. 2013, 5 minutes of resting recording and 5 minutes of movement recording from 10 PD patients were used to quantify movement related changes in STN-cortical coherence and how this was modulated by levodopa (Hirschmann et al., 2013). Litvak et al. (2012) have identified movement-related changes in the coherence between STN LFP and motor cortex with recording with simultaneous STN LFP and MEG recordings from 17 PD patients and 20 trials in average per participant per condition (Litvak et al., 2012). With similar methods, van Wijk et al. (2017) used recordings from 9 patients and around on average in 29 trials per hand per condition, and they identified reduced cortico-pallidal coherence in the low-beta decreases during movement (van Wijk et al., 2017). So the trial number per condition participant we used in this study are comparable to previous studies.

The DBS extension wires do reduce signal-to-noise ratio in the MEG recording. therefore the spatiotemporal Signal Space Separation (tSSS) method (Taulu and Simola, 2006) implemented in the MaxFilter software (Elekta Oy, Helsinki, Finland) has been applied in this study to suppress strong magnetic artifacts caused by extension wires. This method has been proved to work well in de-noising the magnetic artifacts and movement artifacts in MEG data in our previous studies (Cao et al., 2019; Cao et al., 2020). In addition, the beamforming method proposed by several studies (Litvak et al., 2010; Hirschmann et al., 2011; Litvak et al., 2011) has been used in this study. In Litvak et al., 2010, the artifacts caused by DBS extension wires was detailed described and the beamforming was demonstrated to effectively suppress artifacts and thereby enable both localization of cortical sources coherent with the deep brain nucleus. We have now added more details and these references about the data cleaning and the beamforming method in the main text. With the beamforming method, we did observe the standard movement-related modulation in the beta frequency band in the motor cortex with 9 trials of figure pressing movements, shown in the following figure for one patient as an example (Figure 5–figure supplement 1). This suggests that the beamforming method did work well to suppress the artefacts and help to localise the source with a low number of trials. The figure on movement-related modulation in the motor cortex in the MEG signals have now been added as a supplementary figure to demonstrate the effect of the beamforming.

Figure 5–figure supplement 1. (A) Time-frequency maps of MEG activity for right hand button press at sensor level from one participant (Case 8). (B) DICS beamforming source reconstruction of the areas with movement-related oscillation changes in the range of 12-30 Hz. The peak power was located in the left M1 area, MNI coordinate [-37, -12, 43].

References:

Tan H, Jenkinson N, Brown P. Dynamic neural correlates of motor error monitoring and adaptation during trial-to-trial learning. Journal of Neuroscience 2014a; 34(16): 5678-88.

Tan H, Zavala B, Pogosyan A, Ashkan K, Zrinzo L, Foltynie T, et al. Human subthalamic nucleus in movement error detection and its evaluation during visuomotor adaptation. Journal of Neuroscience 2014b; 34(50): 16744-54.

Talakoub O, Neagu B, Udupa K, Tsang E, Chen R, Popovic MR, et al. Time-course of coherence in the human basal ganglia during voluntary movements. Scientific Reports 2016; 6: 34930.

Brucke C, Kupsch A, Schneider GH, Hariz MI, Nuttin B, Kopp U, et al. The subthalamic region is activated during valence-related emotional processing in patients with Parkinson's disease. European Journal of Neuroscience 2007; 26(3): 767-74.

Huebl J, Spitzer B, Brucke C, Schonecker T, Kupsch A, Alesch F, et al. Oscillatory subthalamic nucleus activity is modulated by dopamine during emotional processing in Parkinson's disease. Cortex 2014; 60: 69-81.

Hirschmann J, Ozkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Differential modulation of STN-cortical and cortico-muscular coherence by movement and levodopa in Parkinson's disease. NeuroImage 2013; 68: 203-13.

Litvak V, Eusebio A, Jha A, Oostenveld R, Barnes G, Foltynie T, et al. Movement-related changes in local and long-range synchronization in Parkinson's disease revealed by simultaneous magnetoencephalography and intracranial recordings. Journal of Neuroscience 2012; 32(31): 10541-53.

van Wijk BCM, Neumann WJ, Schneider GH, Sander TH, Litvak V, Kuhn AA. Low-beta cortico-pallidal coherence decreases during movement and correlates with overall reaction time. NeuroImage 2017; 159: 1-8.

Taulu S, Simola J. Spatiotemporal signal space separation method for rejecting nearby interference in MEG measurements. Physics in Medicine and Biology 2006; 51(7): 1759-68.

Cao C, Huang P, Wang T, Zhan S, Liu W, Pan Y, et al. Cortico-subthalamic Coherence in a Patient With Dystonia Induced by Chorea-Acanthocytosis: A Case Report. Frontiers in Human Neuroscience 2019; 13: 163.

Cao C, Li D, Zhan S, Zhang C, Sun B, Litvak V. L-dopa treatment increases oscillatory power in the motor cortex of Parkinson's disease patients. NeuroImage Clinical 2020; 26: 102255.

Litvak V, Eusebio A, Jha A, Oostenveld R, Barnes GR, Penny WD, et al. Optimized beamforming for simultaneous MEG and intracranial local field potential recordings in deep brain stimulation patients. NeuroImage 2010; 50(4): 1578-88.

Litvak V, Jha A, Eusebio A, Oostenveld R, Foltynie T, Limousin P, et al. Resting oscillatory cortico-subthalamic connectivity in patients with Parkinson's disease. Brain 2011; 134(Pt 2): 359-74.

Hirschmann J, Ozkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Distinct oscillatory STN-cortical loops revealed by simultaneous MEG and local field potential recordings in patients with Parkinson's disease. NeuroImage 2011; 55(3): 1159-68.

I doubt that the correlation between habenula power and habenula-MEG coherence (Fig. 6C) is informative of emotion processing. First, power and coherence in close-by time windows are likely to to be correlated irrespective of the task/stimuli. Second, if meaningful, one would expect the strongest correlation for the negative condition, as this is the only condition with an increase of theta coherence and a subsequent increase of theta power in the habenula. This, however, does not appear to be the case.

The authors included the factors valence and arousal in their linear model and found that only valence correlated with electrophysiological effects. I suspect that arousal and valence scores are highly correlated. When fed with informative yet highly correlated variables, the significance of individual input variables becomes difficult to assess in many statistical models. Hence, I am not convinced that valence matters but arousal not.

For the correlation shown in Fig. 6C, we used a linear mixed-effect modelling (‘fitlme’ in Matlab) with different recorded subjects as random effects to investigate the correlations between the habenula power and habenula-MEG coherence at an earlier window, while considering all trials together. Therefore the reported value in the main text and in the figure (k = 0.2434 ± 0.1031, p = 0.0226, R2 = 0.104) show the within subjects correlation that are consistent across all measured subjects. The correlation is likely to be mediated by emotional valence condition, as negative emotional stimuli tend to be associated with both high habenula-MEG coherence and high theta power in the later time window tend to happen in the trials with.

The arousal scores are significantly different for the three valence conditions as shown in Fig. 1B. However, the arousal scores and the valence scores are not monotonically correlated, as shown in the following figure (Fig. S2). The emotional neutral figures have the lowest arousal value, but have the valence value sitting between the negative figures and the positive figures. We have now added the following sentence in the main text:

"This nonlinear and non-monotonic relationship between arousal scores and the emotional valence scores allowed us to differentiate the effect of the valence from arousal."

Table 2 in the main text show the results of the linear mixed-effect modelling with the neural signal as the dependent variable and the valence and arousal scores as independent variables. Because of the non-linear and non-monotonic relationship between the valence and arousal scores, we think the significance of individual input variables is valid in this statistical model. We have now added a new figure (shown below, Fig. 7) with scatter plots showing the relationship between the electrophysiological signal and the arousal and emotional valence scores separately using Spearman’s partial correlation analysis. In each scatter plot, each dot indicates the average measurement from one participant in one emotional valence condition. As shown in the following figure, the electrophysiological measurements linearly correlated with the valence score, but not with the arousal scores. However, the statistics reported in this figure considered all the dots together. The linear mixed effect modelling taking into account the interdependency of the measurements from the same participant. So the results reported in the main text using linear mixed effect modelling are statistically more valid, but supplementary figure here below illustrate the relationship.

Figure S2. Averaged valence and arousal ratings (mean ± SD) for figures of the three emotional condition. (B) Scatter plots showing the relationship between arousal and valence scores for each emotional condition for each participant.

Figure 7. Scatter plots showing how early theta/alpha band power increase in the frontal cortex (A), theta/alpha band frontal cortex-habenula coherence (B) and theta band power increase in habenula stimuli (C) changed with emotional valence (left column) and arousal (right column). Each dot shows the average of one participant in each categorical valence condition, which are also the source data of the multilevel modelling results presented in Table 2. The R and p value in the figure are the results of partial correlation considering all data points together.

Page 8: "The time-varying coherence was calculated for each trial". This is confusing because coherence quantifies the stability of a phase difference over time, i.e. it is a temporal average, not defined for individual trials. It has also been used to describe the phase difference stability over trials rather than time, and I assume this is the method applied here. Typically, the greatest coherence values coincide with event-related power increases, which is why I am surprised to see maximum coherence at 1s rather than immediately post-stimulus.

We thank the reviewer for pointing out this incorrect description. As the reviewer pointed out correctly, the method we used describe the phase difference stability over trials rather than time. We have now clarified how coherence was calculated and added more details in the methods:

"The time-varying cross trial coherence between each MEG sensor and the habenula LFP was first calculated for each emotional valence condition. For this, time-frequency auto- and cross-spectral densities in the theta/alpha frequency band (5-10 Hz) between the habenula LFP and each MEG channel at sensor level were calculated using the wavelet transform-based approach from -2000 to 4000 ms for each trial with 1 Hz steps using the Morlet wavelet and cycle number of 6. Cross-trial coherence spectra for each LFP-MEG channel combination was calculated for each emotional valence condition for each habenula using the function ‘ft_connectivityanalysis’ in Fieldtrip (version 20170628). Stimulus-related changes in coherence were assessed by expressing the time-resolved coherence spectra as a percentage change compared to the average value in the -2000 to -200 ms (pre-stimulus) time window for each frequency."

In the Morlet wavelet analysis we used here, the cycle number (C) determines the temporal resolution and frequency resolution for each frequency (F). The spectral bandwidth at a given frequency F is equal to 2F/C while the wavelet duration is equal to C/F/pi. We used a cycle number of 6. For theta band activities around 5 Hz, we will have the spectral bandwidth of 25/6 = 1.7 Hz and the wavelet duration of 6/5/pi = 0.38s = 380ms.

As the reviewer noticed, we observed increased activities across a wide frequency band in both habenula and the prefrontal cortex within 500 ms after stimuli onset. But the increase of cross-trial coherence starts at around 300 ms. The increase of coherence in a time window without increase of power in either of the two structures indicates a phase difference stability across trials in the oscillatory activities from the two regions, and this phase difference stability across trials was not secondary to power increase.

Reviewer #3 (Public Review):

This paper describes the oscillatory activity of the habenula using local field potentials, both within the region and, through the use of MEG, in connection to the prefrontal cortex. The characteristics of this activity were found to vary with the emotional valence but not with arousal. Sheding light on this is relevant, because the habenula is a promising target for deep brain stimulation.

In general, because I am not much on top of the literature on the habenula, I find difficult to judge about the novelty and the impact of this study. What I can say is that I do find the paper is well-written and very clear; and the methods, although quite basic (which is not bad), are sound and rigourous.

We thank the reviewer for the positive comments about the potential implication of our study and on the methods we used.

On the less positive side, even though I am aware that in this type of studies it is difficult to have high N, the very low N in this case makes me worry about the robustness and replicability of the results. I'm sure I have missed it and it's specified somewhere, but why is N different for the different figures? Is it because only 8 people had MEG? The number of trials seems also a somewhat low. Therefore, I feel the authors perhaps need to make an effort to make up for the short number of subjects in order to add confidence to the results. I would strongly recommend to bootstrap the statistical analysis and extract non-parametric confidence intervals instead of showing parametric standard errors whenever is appropriate. When doing that, it must be taken into account that each two of the habenula belong to the same person; i.e. one bootstraps the subjects not the habenula.

We do understand and appreciate the concern of the reviewer on the low sample numbers due to the strict recruitment criteria for this very early stage clinical trial: 9 patients for bilateral habenula LFPs, and 8 patients with good quality MEGs. Some information to justify the number of trials per condition for each participant has been provided in the reply to the Detailed Comments 1 from Reviewer 2. The sample number used in each analysis was included in the figures and in the main text.

We have used non-parametric cluster-based permutation approach (Maris and Oostenveld, 2007) for all the main results as shown in Fig. 3-5. Once the clusters (time window and frequency band) with significant differences for different emotional valence conditions have been identified, parametric statistical test was applied to the average values of the clusters to show the direction of the difference. These parametric statistics are secondary to the main non-parametric permutation test.

In addition, the DICS beamforming method was applied to localize cortical sources exhibiting stimuli-related power changes and cortical sources coherent with deep brain LFPs for each subject for positive and negative emotional valence conditions respectively. After source analysis, source statistics over subjects was performed. Non-parametric permutation testing with or without cluster-based correction for multiple comparisons was applied to statistically quantify the differences in cortical power source or coherence source between negative and positive emotional stimuli.

References:

Maris E, Oostenveld R. Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods 2007; 164(1): 177-90.

Related to this point, the results in Figure 6 seem quite noisy, because interactions (i.e. coherence) are harder to estimate and N is low. For example, I have to make an effort of optimism to believe that Fig 6A is not just noise, and the result in Fig 6C is also a bit weak and perhaps driven by the blue point at the bottom. My read is that the authors didn't do permutation testing here, and just a parametric linear-mixed effect testing. I believe the authors should embed this into permutation testing to make sure that the extremes are not driving the current p-value.

We have now quantified the coherence between frontal cortex-habenula and occipital cortex-habenula separately (please see more details in the reply to Reviewer 2 (Recommendations for the authors 6). The new analysis showed that the increase in the theta/alpha band coherence around 1 s after the negative stimuli was only observed between prefrontal cortex-habenula and not between occipital cortex-habenula. This supports the argument that Fig. 6A is not just noise.

2. Reviewer #3 (Public Review):

This paper describes the oscillatory activity of the habenula using local field potentials, both within the region and, through the use of MEG, in connection to the prefrontal cortex. The characteristics of this activity were found to vary with the emotional valence but not with arousal. Sheding light on this is relevant, because the habenula is a promising target for deep brain stimulation.

In general, because I am not much on top of the literature on the habenula, I find difficult to judge about the novelty and the impact of this study. What I can say is that I do find the paper is well-written and very clear; and the methods, although quite basic (which is not bad), are sound and rigourous.

On the less positive side, even though I am aware that in this type of studies it is difficult to have high N, the very low N in this case makes me worry about the robustness and replicability of the results. I'm sure I have missed it and it's specified somewhere, but why is N different for the different figures? Is it because only 8 people had MEG? The number of trials seems also a somewhat low. Therefore, I feel the authors perhaps need to make an effort to make up for the short number of subjects in order to add confidence to the results. I would strongly recommend to bootstrap the statistical analysis and extract non-parametric confidence intervals instead of showing parametric standard errors whenever is appropriate. When doing that, it must be taken into account that each two of the habenula belong to the same person; i.e. one bootstraps the subjects not the habenula.

Related to this point, the results in Figure 6 seem quite noisy, because interactions (i.e. coherence) are harder to estimate and N is low. For example, I have to make an effort of optimism to believe that Fig 6A is not just noise, and the result in Fig 6C is also a bit weak and perhaps driven by the blue point at the bottom. My read is that the authors didn't do permutation testing here, and just a parametric linear-mixed effect testing. I believe the authors should embed this into permutation testing to make sure that the extremes are not driving the current p-value.

3. Reviewer #2 (Public Review):

In this study, Huang and colleagues recorded local field potentials from the lateral habenula in patients with psychiatric disorders who recently underwent surgery for deep brain stimulation (DBS). The authors combined these invasive measurements with non-invasive whole-head MEG recordings to study functional connectivity between the habenula and cortical areas. Since the lateral habenula is believed to be involved in the processing of emotions, and negative emotions in particular, the authors investigated whether brain activity in this region is related to emotional valence. They presented pictures inducing negative and positive emotions to the patients and found that theta and alpha activity in the habenula and frontal cortex increases when patients experience negative emotions. Functional connectivity between the habenula and the cortex was likewise increased in this band. The authors conclude that theta/alpha oscillations in the habenula-cortex network are involved in the processing of negative emotions in humans.

Because DBS of the habenula is a new treatment tested in this cohort in the framework of a clinical trial, these are the first data of its kind. Accordingly, they are of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap.

In terms of community impact, I see the strengths of this paper in basic science rather than the clinical field. The authors demonstrate the involvement of theta oscillations in the habenula-prefrontal cortex network in emotion processing in the human brain. The potential of theta oscillations to serve as a marker in closed-loop DBS, as put forward by the authors, appears less relevant to me at this stage, given that the clinical effects and side-effects of habenula DBS are not known yet.

The group-average MEG power spectrum (Fig. 4B) suggests that negative emotions lead to a sustained theta power increase and a similar effect, though possibly masked by a visual ERP, can be seen in the habenula (Fig. 3C). Yet the statistics identify brief elevations of habenula theta power at around 3s (which is very late), a brief elevation of prefrontal power a time 0 or even before (Fig. 4C) and a brief elevation of Habenula-MEG theta coherence around 1 s. It seems possible that this lack of consistency arises from a low signal-to-noise ratio. The data contain only 27 trails per condition on average and are contaminated by artifacts caused by the extension wires.

I doubt that the correlation between habenula power and habenula-MEG coherence (Fig. 6C) is informative of emotion processing. First, power and coherence in close-by time windows are likely to to be correlated irrespective of the task/stimuli. Second, if meaningful, one would expect the strongest correlation for the negative condition, as this is the only condition with an increase of theta coherence and a subsequent increase of theta power in the habenula. This, however, does not appear to be the case.

The authors included the factors valence and arousal in their linear model and found that only valence correlated with electrophysiological effects. I suspect that arousal and valence scores are highly correlated. When fed with informative yet highly correlated variables, the significance of individual input variables becomes difficult to assess in many statistical models. Hence, I am not convinced that valence matters but arousal not.

Page 8: "The time-varying coherence was calculated for each trial". This is confusing because coherence quantifies the stability of a phase difference over time, i.e. it is a temporal average, not defined for individual trials. It has also been used to describe the phase difference stability over trials rather than time, and I assume this is the method applied here. Typically, the greatest coherence values coincide with event-related power increases, which is why I am surprised to see maximum coherence at 1s rather than immediately post-stimulus.

4. Reviewer #1 (Public Review):

The study by Huang et al. report on direct recordings (using DBS electrodes) from the human habenula in conjunction with MEG recordings in 9 patients. Participants were shown emotional pictures. The key finding was a transient increase in theta/alpha activity with negative compared to positive stimuli. Furthermore, there was a later increase in oscillatory coupling in the same band. These are important data, as there are few reports of direct recordings from the habenula together with the MEG in humans performing cognitive tasks. The findings do provide novel insight into the network dynamics associated with the processing of emotional stimuli and particular the role of the habenula.

Recommendations:

How can we be sure that the recordings from the habenula are not contaminated by volume conduction; i.e. signals from neighbouring regions? I do understand that bipolar signals were considered for the DBS electrode leads. However, high-frequency power (gamma band and up) is often associated with spiking/MUA and considered less prone to volume conduction. I propose to also investigate that high-frequency gamma band activity recorded from the bipolar DBS electrodes and relate to the emotional faces. This will provide more certainty that the measured activity indeed stems from the habenula.

Figure 3: the alpha/theta band activity is very transient and not band-limited. Why refer to this as oscillatory? Can you exclude that the TFRs of power reflect the spectral power of ERPs rather than modulations of oscillations? I propose to also calculate the ERPs and perform the TFR of power on those. This might result in a re-interpretation of the early effects in theta/alpha band.

Figure 4D: can you exclude that the frontal activity is not due to saccade artifacts? Only eye blink artifacts were reduced by the ICA approach. Trials with saccades should be identified in the MEG traces and rejected prior to further analysis.

The coherence modulations in Fig 5 occur quite late in time compared to the power modulations in Fig 3 and 4. When discussing the results (in e.g. the abstract) it reads as if these findings are reflecting the same process. How can the two effect reflect the same process if the timing is so different?

Be explicit on the degrees of freedom in the statistical tests given that one subject was excluded from some of the tests.

5. Evaluation Summary:

Since DBS of the habenula is a new treatment, these are the first data of its kind and potentially of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap. This paper is of interest to neuroscientists studying emotions and clinicians treating psychiatric disorders. Specifically the paper shows that the habenula is involved in processing of negative emotions and that it is synchronized to the prefrontal cortex in the theta band. These are important insights into the electrophysiology of emotion processing in the human brain.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

#### URL

33. www.medrxiv.org www.medrxiv.org
1. Reviewer #3 (Public Review):

The authors tested HIV-1 DNA and RNA levels in two large cohorts of ART-treated HIV-1 patient to evaluate possible differences in HIV-1 reservoir cell markers between NNRTI- and PI-based ART regimens, this question is relevant since millions of people living with HIV are currently receiving HIV treatment with these agents. Their major finding is that NNRTI-based treatment is associated with reduced cell-associated HIV-1 RNA and DNA levels; this finding is not entirely novel and well in line with a number of previous observations. The strengths of the study are the large clinical cohorts for which detailed clinical and demographical data are available. The analysis of HIV-1 DNA and RNA is informative, but the assays used do not distinguish between replication-competent and defective proviral species; this is appropriately identified as a limitation of this work. The authors do not address possible immunological consequences of higher HIV DNA levels in PI-treated patients - is this associated with higher levels of inflammatory markers? In addition, it is possible that higher levels of cell-associated HIV-1 RNA may stimulate cell-intrinsic innate (type I IFN-mediated) immunity in PI-treated patients - an aspect that the authors do not address. In the absence of such additional immunological data, it is difficult to assess the true significance and importance of the described observations.

2. Reviewer #2 (Public Review):

This is a well-written study that will be of interest to many investigators working in the field of HIV persistence during ART. The strengths of the study include the analysis of samples from two relatively large cohorts of individuals (n= 100 and 124) and the use of multivariable models to adjust for numerous parameters. One weakness is the fact that the authors do not consider alternative models that may explain their results. The data is important but should not be overinterpreted, because it does not demonstrate that NNRTI have a better ability to suppress HIV replication. It shows that NNRTI usage is associated with lower levels of HIV persistence markers but does not provide a mechanistic explanation for that (and should not attempt to do so, at least not in the abstract). Overall, this is a well-conducted and important study, with new findings that have potential clinical implications.

3. Reviewer #1 (Public Review):

The authors examine measures of viral reservoir to understand how different antiviral treatment regimens impact residual virus in HIV infection. They find that NNRTI-based treatments are associated with lower viral reservoirs than PI-based regimens, suggesting they may have some advantage at reducing HIV levels long term.

4. Evaluation Summary:

This study addresses how antiviral treatment regimens impact persistence of an HIV reservoir in individuals who are treated for a long period. The authors examine measures of viral reservoir to understand how different antiviral treatment regimens impact residual virus in HIV infection. They find that NNRTI-based treatments are associated with lower viral reservoirs and better viral suppression than PI-based regimens, suggesting they may have some advantage at reducing HIV levels long term.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

#### URL

34. www.biorxiv.org www.biorxiv.org
1. Reviewer #2 (Public Review):

As the most common reason for infertility, the underlying mechanism of endometrial fibrosis remains largely unknown. Although some progress has been made about the pathogenesis of endometrial fibrosis, the role of cirRNAs during this process remains elusive. In this investigation, Song et al. propose a novel mechanism that increased epithelial circPTPN12 reduces miR-21-5p, which contributes to upregulation of ΔNp63α to induce the epithelial mesenchymal transition (EMT) of EECs (EEC-EMT). There are several interesting findings in this manuscript including 1) there are hundreds of miRNAs are differentially expressed between control and endometrial fibrosis; 2) miR21-5p is mainly located in epithelial cells in normal endometrium 3) There are also some circRNAs are significantly changed between control and IUA; 4) Moreover, functional studies reveal that circPTPN12 is a critical ceRNA for miR21-5p; 5) Different in vivo evidence from the established animal model also unravels that circPTPN12-miR21-5p participate EMT process. Although the author provides comprehensive evidence to support their hypothesis, there are still some minor concerns raised during reviewing this manuscript.

2. Reviewer #1 (Public Review):

The study by Song and colleagues explores the role of circRNAs in fibrosis of the endometrium. Endometrial cells for patients with and without fibrosis were subjected to expression profiling analysis, and circPTPN12 and miR-21-5p were strongly separate in fibrosis in endometrial, with circPTPN12 acting as an inhibitory factor for miR-21-5p. Through the use of various molecular approaches, the authors further that miR-21-5p inhibition results in upregulation of ΔNp63α, and transcription factor that induces EMT. The role of circPTPN12 was also confirmed in vivo using a mouse model of mechanically induced endometrial fibrosis. The authors concluded that targeting the path circPTPN12/miR-21-5p/∆Np63α may be a therapeutic strategy for endometrial fibrosis.

The authors clearly and convincingly show the involvement of the circPTPN12/miR-21-5p/∆Np63α in EMT and its potential involvement in endometrial fibrosis. Whether or not this can be a therapeutic target is too preliminary at this point. First because the in vivo experiments confirm the link between circPTPN12/miR-21-5p/∆Np63α at the RNA level only (p63) and it would be more convincing to see protein data as well. The involvement of p63 in the process remains a little elusive in this paper. In addition, if the authors believe this pathway can be a real future target to treat endometrial fibrosis, they could better contextualise such a statement, specifically describe what kinds of therapeutic intervention they think of, like regression or prevention of fibrosis. These should be tested in vitro and in vivo. More evidence of the involvement of circPTPN12/miR-21-5p/∆Np63α and the correlation between the three players using clinical material is also necessary.

3. Evaluation Summary:

The study by Song and colleagues explores the role of circRNAs in fibrosis of the endometrium. The paper is of interest for scientists working in the field of endometrial fibrosis and most likely can have implications for other endometrial disorders characterised by fibrotic tissues. The study unravel the molecular mechanism underlying the disease and the thorough experimental part fully support the author's claim.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

#### URL

35. www.medrxiv.org www.medrxiv.org
1. Author Response:

Evaluation Summary:

The paper describes an algorithm that combines epidemiological and sequence data to provide a rapid assessment of the probability of healthcare-associated infections among hospital onset SARS-CoV-2 infections, that also may be associated with outbreak events. There is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals. It is questionable to what extent the tool presented is generalisable to medical facilities outside of the specific data rich settings considered here, or if the tool is useful for prospective analyses. This study would be of interest to specialists working in hospital infection prevention, with more limited further interest.

We thank eLife for the commentary on our work. We agree that there is a need for robust prospective evaluation of routine viral sequencing of SARS-CoV-2 for Infection Prevention and Control and of this tool specifically. Our research group is conducting such work within a multi- centre prospective study that is currently ongoing https://clinicaltrials.gov/ct2/show/NCT04405934, https://doi.org/10.1101/2021.04.13.21255342.

Reviewer #1 (Public Review):

-In the present paper the authors have attempted to develop a novel statistical method and sequence reporting tool that combines epidemiological and sequence data to provide a rapid assessment of the probability of HCAI among HOCI cases (defined as first positive test >48 hours following admission) and to identify infections that could plausibly constitute outbreak events.

-As healthcare-associated infections in hospitals present a significant health risk to both vulnerable patients and healthcare workers, significant improvements to provide a rapid assessment of the probability of HCAI among HOCI cases is of utmost importance in a pandemic setting.

-The strength of the paper is that they have successfully used a large number of virus sequence data from two UK cities with selected hospitals and developed a statistical method to bring these together with classical epidemiological data, which has resulted in a sequence reporting tool (SRT) that was evaluated in relation to:

-The IPC classification system recommended by PHE,

-The PHE definition of healthcare-associated COVID-19 outbreaks (using a 2 SNP threshold).

-They show the added value of combining the two systems. Obviously, this can only work prospectively in a setting like in the UK, where indeed a system like the COVID-19 Genomics (COG) UK initiative is effectively in place. They conclude that through their retrospective application to clinical datasets, to have demonstrated that the methodology is able to provide confirmatory evidence for most PHE-defined definite and probable HCAIs and provide further information regarding indeterminate HCAIs. Therefor, the SRT may allow IPC teams to optimise their use of resources on areas with likely nosocomial acquisition events.

-The acquisition of the extensive prospective datasets necessary to use the system requires a non-negligible investment that is possible in a setting in which sequencing routine and phylogenetic analyses can be carried out in real time. The added value of the methodology should eventually justify the investment.

We thank the reviewer for their summary and commentary on our work. We agree that full evaluation of the use of viral sequencing for clinical practice requires health economic analysis of the associated costs relative to potential gains, and this is planned within our ongoing research program on this topic.

Reviewer #2 (Public Review):

Since early 2020, the SARS-CoV-2 pandemic has presented numerous challenges to healthcare facilities around the world. Given the highly transmissible nature of SARS-CoV-2 virus, and the confined nature of most hospital settings, hospital acquired infections with SARS-CoV-2 are a frequent occurrence and pose major challenges for hospital infection prevention teams. The increasing use of genomic epidemiology, facilitated by cheaper/faster genetic sequencing tools and user-friendly algorithms for data analysis, creates new opportunities for using virus sequencing to track virus spread in healthcare facilities. While opportunities are increasing, there remain two important bottlenecks to meaningful and widespread use of genomic epidemiology in well-resourced healthcare settings - 1. the turnaround time from sample collection to delivery of sequenced and analysed result; 2. a lack of training among many infection prevention personnel in interpreting genomic epidemiology output.

The study by Stirrup et al tries to alleviate these issues through the development of an algorithm that synthesises inferences from virus genetic sequences and hospital epidemiological data to provide easy to interpret information about whether or not there is likely to be ongoing virus transmission within a medical facility. In general, these kinds of approaches are highly worthwhile and can have important translational value as they facilitate the use of powerful new technologies without necessarily requiring extensive professional training to interpret the results. Indeed, there is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals.

In this study, the authors describe their new algorithm and apply it in two retrospective cases to evaluate its potential value to provide valuable information to infection control teams. While it seems clear that the algorithm reliably detects nosocomial transmission in situations where there are obvious hospital outbreaks, it is much less clear that it performs meaningfully in situations where nosocomial transmission is more questionable. To this end, it is not clear if the algorithm provides useful or meaningful information that would help to reduce the burden of hospital acquired SARS-CoV-2 infections. Towards the end of the discussion section, the authors mention that analyses on the utility of the algorithm in prospective use cases were ongoing from late 2020 to early 2021. These analyses will provide essential information on the value of this tool.

While the development of these sorts of tools is important, it is unclear from this study if the tool has value in prospective use or if it would be useful in settings where virus genetic sequencing is less frequent and/or slower than the retrospective use cases considered here. Additionally, in many infection prevention scenarios the existence of an outbreak is clear but tracing the routes of transmission is the primary object of investigation. Because the algorithm does not include phylogenetic information infection tracing potential transmission routes is not possible.

We thank the Reviewer for their commentary on our work. Our ongoing prospective study on implementation of the reporting tool includes intervention phases both with a ‘rapid’ target turnaround of 48 hours from sampling and with a ‘slow’ target turnaround of 5-10 days, and this will generate data on the relative utility of viral sequencing within these timeframes. We acknowledge that the reporting tool developed does not evaluate evidence of direct transmission between case pairs, although it should also be noted that phylogenetic investigation alone cannot be used to confidently infer direct transmission linkage for SARS-CoV-2. We feel that the algorithm and report format can flag potential transmission routes to IPC teams, through the identification of close sequence matches within the hospital as a whole and highlighting of any matching previous ward locations (although the latter is not used in the probability calculations).

2. Reviewer #2 (Public Review):

Since early 2020, the SARS-CoV-2 pandemic has presented numerous challenges to healthcare facilities around the world. Given the highly transmissible nature of SARS-CoV-2 virus, and the confined nature of most hospital settings, hospital acquired infections with SARS-CoV-2 are a frequent occurrence and pose major challenges for hospital infection prevention teams. The increasing use of genomic epidemiology, facilitated by cheaper/faster genetic sequencing tools and user-friendly algorithms for data analysis, creates new opportunities for using virus sequencing to track virus spread in healthcare facilities. While opportunities are increasing, there remain two important bottlenecks to meaningful and widespread use of genomic epidemiology in well-resourced healthcare settings - 1. the turnaround time from sample collection to delivery of sequenced and analysed result; 2. a lack of training among many infection prevention personnel in interpreting genomic epidemiology output.

The study by Stirrup et al tries to alleviate these issues through the development of an algorithm that synthesises inferences from virus genetic sequences and hospital epidemiological data to provide easy to interpret information about whether or not there is likely to be ongoing virus transmission within a medical facility. In general, these kinds of approaches are highly worthwhile and can have important translational value as they facilitate the use of powerful new technologies without necessarily requiring extensive professional training to interpret the results. Indeed, there is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals.

In this study, the authors describe their new algorithm and apply it in two retrospective cases to evaluate its potential value to provide valuable information to infection control teams. While it seems clear that the algorithm reliably detects nosocomial transmission in situations where there are obvious hospital outbreaks, it is much less clear that it performs meaningfully in situations where nosocomial transmission is more questionable. To this end, it is not clear if the algorithm provides useful or meaningful information that would help to reduce the burden of hospital acquired SARS-CoV-2 infections. Towards the end of the discussion section, the authors mention that analyses on the utility of the algorithm in prospective use cases were ongoing from late 2020 to early 2021. These analyses will provide essential information on the value of this tool.

While the development of these sorts of tools is important, it is unclear from this study if the tool has value in prospective use or if it would be useful in settings where virus genetic sequencing is less frequent and/or slower than the retrospective use cases considered here. Additionally, in many infection prevention scenarios the existence of an outbreak is clear but tracing the routes of transmission is the primary object of investigation. Because the algorithm does not include phylogenetic information infection tracing potential transmission routes is not possible.

3. Reviewer #1 (Public Review):

-In the present paper the authors have attempted to develop a novel statistical method and sequence reporting tool that combines epidemiological and sequence data to provide a rapid assessment of the probability of HCAI among HOCI cases (defined as first positive test >48 hours following admission) and to identify infections that could plausibly constitute outbreak events.

-As healthcare-associated infections in hospitals present a significant health risk to both vulnerable patients and healthcare workers, significant improvements to provide a rapid assessment of the probability of HCAI among HOCI cases is of utmost importance in a pandemic setting.

-The strength of the paper is that they have successfully used a large number of virus sequence data from two UK cities with selected hospitals and developed a statistical method to bring these together with classical epidemiological data, which has resulted in a sequence reporting tool (SRT) that was evaluated in relation to:

-The IPC classification system recommended by PHE,

-The PHE definition of healthcare-associated COVID-19 outbreaks (using a 2 SNP threshold).

-They show the added value of combining the two systems. Obviously, this can only work prospectively in a setting like in the UK, where indeed a system like the COVID-19 Genomics (COG) UK initiative is effectively in place. They conclude that through their retrospective application to clinical datasets, to have demonstrated that the methodology is able to provide confirmatory evidence for most PHE-defined definite and probable HCAIs and provide further information regarding indeterminate HCAIs. Therefor, the SRT may allow IPC teams to optimise their use of resources on areas with likely nosocomial acquisition events.

-The acquisition of the extensive prospective datasets necessary to use the system requires a non-negligible investment that is possible in a setting in which sequencing routine and phylogenetic analyses can be carried out in real time. The added value of the methodology should eventually justify the investment.

4. Evaluation Summary:

The paper describes an algorithm that combines epidemiological and sequence data to provide a rapid assessment of the probability of healthcare-associated infections among hospital onset SARS-CoV-2 infections, that also may be associated with outbreak events. There is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals. It is questionable to what extent the tool presented is generalisable to medical facilities outside of the specific data rich settings considered here, or if the tool is useful for prospective analyses. This study would be of interest to specialists working in hospital infection prevention, with more limited further interest.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

#### URL

36. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #3 (Public Review):

About 30 million years ago the ancestors of Old World primates lost the ability to produce the glycan a-gal due to the fixation of several loss-of-function mutations in the GGTA1 gene. The evolutionary advantage of such loss remains elusive. The current study builds upon previous work by the authors showing (i) that the presence of a-gal expressing bacteria in ggta1 deficient mice led to production of antibodies capable of clearance of malaria-causing plasmodia carrying a-gal (Yilmaz et al., 2014), and (ii) that ggta1 deficiency is associated with increased resistance to sepsis via the enhancement of IgG effector function (Sigh et al., 2021). Here they expand on these findings to show that ggta1 deletion in mice is associated with altered composition of the gut microbiome due to the action of IgA targeting of a-Gal expressing bacteria. In addition, they show that the absence of a-gal results in a microbiome that is less pathogenic (i.e., less likely to induce sepsis in their experimental model). Although some aspects of the work are not very novel (e.g., the fact that ggta1 is associated with a remodeled microbiome had already been shown in their previous publications) the work does provide additional insights into the pleiotropic role of ggta1 in immune function, susceptibility to sepsis, and eventual fitness advantage. The work is extremely well done and all conclusions are supported by solid data. Indeed, I felt that the authors were reading my mind every step of the way. Each time I questioned one of the conclusions the next paragraph would address that exact concern. There are, however, a few points that I think would deserve additional clarification.

1 - I was a little surprised that they found no difference in the microbiome of F2 mice between a-gal deficient and wild-type mice. Although I understand that this might be due to antibodies received by the mom, the fact that the divergence in only seen in F3 to F5 would also be compatible with drift and not necessarily a genotype-driven phenotype. Are the microbiome differences detected in F3-F5 overlapping to those observed at F0? If the original differences were controlled by host genetics - the hypothesis being tested - we would expect to see some convergent (at least at the level of specific taxa)

We agree essentially with the comment: “… would also be compatible with drift and not necessarily a genotype-driven phenotype” and have addressed this issue by adding the following statement in the Discussion section:

“On the basis of this observation alone (Figure 1), one cannot exclude the observed divergence in the microbiota bacterial population frequencies of wild type vs. Ggta1-deleted mice (Figure 1) from being a stochastic event. However, the observation that these changes occur via an Ig-dependent mechanism that differs in wild type vs. Ggta1- deleted mice (Figure 3) does support that loss of αGal contributes critically to shape the microbiota composition of Ggta1- deficient mice.”

We have previously shown that homogenization of the microbiota occurs between the littermates in the F2 generation (Singh et al., 2021). Having confirmed this finding in this manuscript (Figure 1C, Figure 3-figure supplement 7A-B), we find that the effect of the genotype and Ig is seen only from the F3 generation onwards (Figure 1D-F, Figure 3). Presumably, the inability of F1 Ggta1+/- mothers to produce anti-αGal antibodies accounts for the absence of overt shaping of the F2 microbiota. In these experiments, anti-αGal antibodies can only be generated from αGal-deficient F2 Ggta1-/- mice, being vertically transferred and shaping the microbiota from F3 Ggta1-/- mice onwards. We propose that the differences in the microbiota composition of the two F3 genotypes onwards are driven by a cumulative effect of maternal anti-αGal antibodies over the offspring microbiota composition.

2 - I was really surprised that ggta1 deficient mice lacking a functional adaptive immune system (Figure S8) were equally resistant to systemic infection with the cecal inoculum isolated from ggta1 deficient mice. In the previous work they show that the increases resistance to sepsis comes from increases effector function of IgG. If that is the case, how come mice not having an adaptive system (hence no IgG) are equally protected? Is the pathogenicity of the microbiome of ggta1 deficient mice that reduced? It seems unlikely. More generally, I would like to have seen a better discussion about how these new findings connect to their past work. In the context of increased resistance to sepsis what seems to be more important - the remodeling of the microbiome by IgA or the increased effector function of IgG?

The data reported in our manuscript does indeed support the conclusion that shaping of the microbiota composition of Ggta1-deficient mice is associated with an overall reduction of the microbiome pathogenicity. This finding is in keeping with host-microbe commensal interactions not being hard- wired but instead oscillating from pathogenic to symbiotic (Ayres, 2016; Vonaesch et al., 2018). Our findings suggest that the loss of Ggta1 function can modify the nature of host-microbiota interactions, through a mechanism whereby the absence of host αGal and the emergence of antibodies targeting this glycan in microbes, shapes and reduces the microbiome pathogenicity.

We have shown that loss of αGal can enhance resistance to bacterial sepsis via a mechanism that increases IgG effector function (Singh et al., 2021). This was demonstrated by systemically infecting Ggta1-deficient mice with a “non-shaped” microbiota inoculum, isolated from Ggta1-deficient mice lacking adaptive immunity (Rag2-/-Ggta1-/- mice). As discussed in the manuscript “the gut microbiota of Rag2-/-Ggta1-/- mice, lacking adaptive immunity, is highly enriched in pathobionts such as Proteobacteria, including Helicobacter (Singh et al., 2021)”. Under these experimental conditions, resistance to infection is IgG dependent, explaining why modulation of IgG effector function by αGal impacts on the outcome of sepsis.

In the current manuscript we describe another survival advantage against bacterial sepsis associated with Ggta1 deletion in mice. Namely, antibodies generated by Ggta1-deficient mice can shape and reduce the microbiota pathogenicity. This was demonstrated by infecting systemically Ggta1-deficient mice lacking adaptive immunity (Rag2-/-Ggta1-/- mice) with a “shaped- microbiota” inoculum isolated from Ggta1-deficient mice. While the mechanism underlying microbiota shaping is antibody-dependent, the effector mechanism conferring resistance against the shaped microbiota acts irrespectively of adaptive immunity, including IgG. This conclusion is supported by the observation that systemic infection by the shaped microbiota (isolated from Ggta1-deficient mice) failed to induce sepsis in Rag2-/-Ggta1-/- mice, which was not the case upon systemic infection with a non-shaped microbiota (isolated from Rag2-/-Ggta1-/- mice). We conclude that Ggta1 deletion in mice increases resistance to bacterial sepsis via two interrelated antibody-dependent mechanisms: i) Increased IgG effector function (Singh et al., 2021) and ii) Antibody shaping and reduction of microbiota pathogenicity (current manuscript). To what extent these two traits are related remains to be established.

It is possible that similarly to what was demonstrated for IgG (Singh et al., 2021), the absence of αGal from glycan structures in other Ig isotypes, including IgA, might modify their effector function. We do not yet know if this is the case, as in our manuscript, what we find is an altered antibody response targeting immunogenic bacteria in the microbiota of Ggta1-deficient mice. This is associated with modulation of the microbiota bacterial composition, i.e. antibody shaping of the microbiota, and with a reduction of the microbiome pathogenicity. The latter explains why the Ggta1-deficient mice do not rely on circulating antibodies to prevent the development of sepsis upon systemic infection by bacteria emanating for their own “shaped” microbiota.

2. Reviewer #3 (Public Review):

About 30 million years ago the ancestors of Old World primates lost the ability to produce the glycan a-gal due to the fixation of several loss-of-function mutations in the GGTA1 gene. The evolutionary advantage of such loss remains elusive. The current study builds upon previous work by the authors showing (i) that the presence of a-gal expressing bacteria in ggta1 deficient mice led to production of antibodies capable of clearance of malaria-causing plasmodia carrying a-gal (Yilmaz et al., 2014), and (ii) that ggta1 deficiency is associated with increased resistance to sepsis via the enhancement of IgG effector function (Sigh et al., 2021). Here they expand on these findings to show that ggta1 deletion in mice is associated with altered composition of the gut microbiome due to the action of IgA targeting of a-Gal expressing bacteria. In addition, they show that the absence of a-gal results in a microbiome that is less pathogenic (i.e., less likely to induce sepsis in their experimental model). Although some aspects of the work are not very novel (e.g., the fact that ggta1 is associated with a remodeled microbiome had already been shown in their previous publications) the work does provide additional insights into the pleiotropic role of ggta1 in immune function, susceptibility to sepsis, and eventual fitness advantage. The work is extremely well done and all conclusions are supported by solid data. Indeed, I felt that the authors were reading my mind every step of the way. Each time I questioned one of the conclusions the next paragraph would address that exact concern. There are, however, a few points that I think would deserve additional clarification.

1 - I was a little surprised that they found no difference in the microbiome of F2 mice between a-gal deficient and wild-type mice. Although I understand that this might be due to antibodies received by the mom, the fact that the divergence in only seen in F3 to F5 would also be compatible with drift and not necessarily a genotype-driven phenotype. Are the microbiome differences detected in F3-F5 overlapping to those observed at F0? If the original differences were controlled by host genetics - the hypothesis being tested - we would expect to see some convergent (at least at the level of specific taxa)

2 - I was really surprised that ggta1 deficient mice lacking a functional adaptive immune system (Figure S8) were equally resistant to systemic infection with the cecal inoculum isolated from ggta1 deficient mice. In the previous work they show that the increases resistance to sepsis comes from increases effector function of IgG. If that is the case, how come mice not having an adaptive system (hence no IgG) are equally protected? Is the pathogenicity of the microbiome of ggta1 deficient mice that reduced? It seems unlikely. More generally, I would like to have seen a better discussion about how these new findings connect to their past work. In the context of increased resistance to sepsis what seems to be more important - the remodeling of the microbiome by IgA or the increased effector function of IgG?

3. Reviewer #2 (Public Review):

The authors aimed to examine the impact of GGTA1 deletion on host-microbial interactions using a mouse model of a primate-specific mutation. This is a very informative model system that provided interesting insights into the consequences of aGal elimination from host glycoproteins, with subsequent 'release' of immune tolerance breaks and generation of antibody responses agains bacterial aGal epitopes.

The study is well executed and and the conclusions are well supported by the provided evidence. The findings are interesting for a broad audience of biologists.

The identity of IgA targeted bacteria in GGTA1 vs WT mice would be interesting to investigate in the future studies.

4. Reviewer #1 (Public Review):

This work is a powerful example of thinking across silos. It combines much knowledge of innate and adaptive immunity, with primate evolution of certain antigens lost only in certain primate lineages and tests an important idea about host-mediated, antibody dependent shaping of gut microbiota using laboratory mice with different engineered genetic alterations. Gut microbiota are all the rage these days, but is often forgotten that these microbial communities represent formidable danger that is really too close (one epithelial layer away) for comfort. The authors demonstrate in laboratory mice, how antibodies against non-self sugar molecules present on bacteria can shape the microbiome. Claims and conclusions seem justified by the data presented.

5. Evaluation Summary:

30 million years ago the ancestors of Old World primates lost the ability to produce alpha-gal due to the fixation of several loss-of-function mutations in the GGTA1 gene. The evolutionary advantage of such loss remains elusive. Here, the authors provide additional insights into the pleiotropic role of ggta1 in shaping the gut microbiota, immune function, susceptibility to sepsis, and eventual fitness advantage.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

#### URL

37. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

Facial muscles control the execution of essential tasks like eating, drinking, breathing and (in most mammals) tactile exploration. The activity of motor neurons targeting different muscles are coordinated by premotor regions distributed throughout brainstem. The precise identity of these cells and regions in adults is presently unclear, largely due to technical challenges. In the current work, Takaoh and colleagues develop an elegant strategy to label premotor neurons that target select muscles and register these cells on a common digital atlas. Their work confirms and also extends previous studies in neonates and provides a useful resource for the field.

We thank Reviewer 1 for the positive evaluation.

Reviewer #2 (Public Review):

The authors describe a variant of retrograde monosynaptic rabies tracing from skeletal muscle. They make use of AAV2-retro-Cre to infect brainstem motoneurons projecting to muscles involved in regulation of orofacial movements (whisking, genioglossus, masseter motoneurons). The strategy that worked most efficiently and with specificity was to inject AAV2-retro-Cre intramuscularly at P17, followed 3 weeks thereafter by central injection of Cre-dependent AAVs expressing TVA and oG, and 2 weeks thereafter followed by central injection of EnvA(M21)-ΔG-RV-GFP. Five days after this final injection, experiments were terminated to analyse the distribution of premotor neurons. This allowed the authors to reconstruct and compare the distribution of premotor neurons to the whisking (lateral 7N), tongue protruding genioglossus (12N), and jaw-closing masseter (5N) motoneurons. To do so, they used the Allen Brain Atlas as a reference for 3D reconstruction, into which they integrated all data. Notably, the authors found that for all three injection types, the highest density of neurons was found in the IRt and PCRt, but the precise peak of highest density was consistently distinct for the three different injection types. The peak for whisker premotor neurons was most caudal-ventral, for masseter premotor neurons most rostro-dorsal, and jaw-closing genioglossal premotor neurons in between these. The authors also make use of the strong expression of fluorescent proteins through rabies virus to analyse collateralization to other motor nuclei. Interestingly, they found cross-talk to other motor nuclei in selective patterns, supporting a model whereby some premotor neurons to one brainstem motor pool also interact with other output circuits, perhaps to coordinate orofacial behaviors. Using a split-Cre retrograde approach from motor nuclei, dual-projecting premotor neurons were identified to be located in dorsal IRt and SupV.

This is a high-quality study making use of several methods not previously brought together in one study. Particularly interesting is the 3-way virus strategy in wild-type mice allowing visualization of premotor neurons in the adult. Second, alignment in a common reference brain is also very useful. And finally, the beginning of understanding dynamics of premotor circuit distribution between development and adult is also a value of this paper. Overall, the study is very interesting for the field.

We thank Reviewer 2 for the positive evaluation.

Reviewer #3 (Public Review):

Orofacial actions show exquisite coordination among many muscles, yet the pools of motor neurons exciting each of these muscles is specific to that muscle. The coordination of activity across muscles therefore relies on circuits of premotor neurons that excite the motor neurons. Work by the authors and others has produced major progress in delineating these complex premotor circuits. Recent work using transsynaptic viral tracing has overcome limitations associated with traditional retrograde tracing methods, such as a lack of adequate specificity. However, these transsynaptic viral methods have been unsuccessful in animals older than approximately postnatal day 8 (P8). This is a problem because circuits continue to develop far beyond P8 in mice. Here, the authors overcome this limitation by introducing a novel viral transsynaptic tracing method that can be applied in adult mice.

The authors apply their method to trace premotor circuits for whisking, licking, and jaw movements. They align their anatomical data to the Allen Mouse Brain Common Coordinate Framework and make it available with the manuscript, greatly facilitating its quantitative use by other laboratories. The authors find premotor circuits in adult mice that are almost entirely consistent with results from younger mice, with some important exceptions that they highlight and discuss. The authors quantify overlap of premotor circuits for whisking, licking and jaw movements and discuss the implications of interactions among these circuits.

The experiments and analysis are carefully performed, and the results put into proper context. Overall, this is a straightforward and valuable contribution to our knowledge of the premotor circuits that coordinate orofacial behaviors. It will be of wide interest to neuroscientists.

Suggestions:

-The methods applied in neonatal mice (Takatoh et al. 2013; Stanek et al. 2014), while obviously different, are similar enough that it may be worth including discussion of any possible ways that differences between the neonatal and adult results could be due to methods, rather than age. I defer to the authors about whether such discussion is worthwhile, but readers may benefit from knowing what was considered.

Now we added the technical considerations that may cause the difference in the tracing patterns: Line 505-517.

-Spatial correlation in Figure 5C. To interpret this properly it's important to know the degree of smoothing. I could not find this in the relevant methods section describing the kernel density estimation or elsewhere.

Same as the above: The cells detected in each mouse were first registered into the standard three-dimensional brain model. The (x, y, z) coordinates of each cell were then extracted, and the multivariate kernel smoothing density estimation was applied (bandwidth = 1). The resulting kernel density estimation was then vectorized, and the cosine similarity between any two of the mice were calculated to form the correlogram.

2. Reviewer #3 (Public Review):

Orofacial actions show exquisite coordination among many muscles, yet the pools of motor neurons exciting each of these muscles is specific to that muscle. The coordination of activity across muscles therefore relies on circuits of premotor neurons that excite the motor neurons. Work by the authors and others has produced major progress in delineating these complex premotor circuits. Recent work using transsynaptic viral tracing has overcome limitations associated with traditional retrograde tracing methods, such as a lack of adequate specificity. However, these transsynaptic viral methods have been unsuccessful in animals older than approximately postnatal day 8 (P8). This is a problem because circuits continue to develop far beyond P8 in mice. Here, the authors overcome this limitation by introducing a novel viral transsynaptic tracing method that can be applied in adult mice.

The authors apply their method to trace premotor circuits for whisking, licking, and jaw movements. They align their anatomical data to the Allen Mouse Brain Common Coordinate Framework and make it available with the manuscript, greatly facilitating its quantitative use by other laboratories. The authors find premotor circuits in adult mice that are almost entirely consistent with results from younger mice, with some important exceptions that they highlight and discuss. The authors quantify overlap of premotor circuits for whisking, licking and jaw movements and discuss the implications of interactions among these circuits.

The experiments and analysis are carefully performed, and the results put into proper context. Overall, this is a straightforward and valuable contribution to our knowledge of the premotor circuits that coordinate orofacial behaviors. It will be of wide interest to neuroscientists.

Suggestions:

-The methods applied in neonatal mice (Takatoh et al. 2013; Stanek et al. 2014), while obviously different, are similar enough that it may be worth including discussion of any possible ways that differences between the neonatal and adult results could be due to methods, rather than age. I defer to the authors about whether such discussion is worthwhile, but readers may benefit from knowing what was considered.

-Spatial correlation in Figure 5C. To interpret this properly it's important to know the degree of smoothing. I could not find this in the relevant methods section describing the kernel density estimation or elsewhere.

3. Reviewer #2 (Public Review):

The authors describe a variant of retrograde monosynaptic rabies tracing from skeletal muscle. They make use of AAV2-retro-Cre to infect brainstem motoneurons projecting to muscles involved in regulation of orofacial movements (whisking, genioglossus, masseter motoneurons). The strategy that worked most efficiently and with specificity was to inject AAV2-retro-Cre intramuscularly at P17, followed 3 weeks thereafter by central injection of Cre-dependent AAVs expressing TVA and oG, and 2 weeks thereafter followed by central injection of EnvA(M21)-ΔG-RV-GFP. Five days after this final injection, experiments were terminated to analyse the distribution of premotor neurons. This allowed the authors to reconstruct and compare the distribution of premotor neurons to the whisking (lateral 7N), tongue protruding genioglossus (12N), and jaw-closing masseter (5N) motoneurons. To do so, they used the Allen Brain Atlas as a reference for 3D reconstruction, into which they integrated all data. Notably, the authors found that for all three injection types, the highest density of neurons was found in the IRt and PCRt, but the precise peak of highest density was consistently distinct for the three different injection types. The peak for whisker premotor neurons was most caudal-ventral, for masseter premotor neurons most rostro-dorsal, and jaw-closing genioglossal premotor neurons in between these. The authors also make use of the strong expression of fluorescent proteins through rabies virus to analyse collateralization to other motor nuclei. Interestingly, they found cross-talk to other motor nuclei in selective patterns, supporting a model whereby some premotor neurons to one brainstem motor pool also interact with other output circuits, perhaps to coordinate orofacial behaviors. Using a split-Cre retrograde approach from motor nuclei, dual-projecting premotor neurons were identified to be located in dorsal IRt and SupV.

This is a high-quality study making use of several methods not previously brought together in one study. Particularly interesting is the 3-way virus strategy in wild-type mice allowing visualization of premotor neurons in the adult. Second, alignment in a common reference brain is also very useful. And finally, the beginning of understanding dynamics of premotor circuit distribution between development and adult is also a value of this paper. Overall, the study is very interesting for the field.

4. Reviewer #1 (Public Review):

Facial muscles control the execution of essential tasks like eating, drinking, breathing and (in most mammals) tactile exploration. The activity of motor neurons targeting different muscles are coordinated by premotor regions distributed throughout brainstem. The precise identity of these cells and regions in adults is presently unclear, largely due to technical challenges. In the current work, Takaoh and colleagues develop an elegant strategy to label premotor neurons that target select muscles and register these cells on a common digital atlas. Their work confirms and also extends previous studies in neonates and provides a useful resource for the field.

#### URL

38. www.medrxiv.org www.medrxiv.org
1. Author Response:

Reviewer #3 (Public Review):

1) The authors seem to assume a somewhat random sample throughout Washington state. They state that given a low sampling proportion they do not expect to have captured infection pairs, which seems reasonable. However, they then go onto assume that their sample is primarily comprised of samples from long, successful transmission chains. This is a reasonable assumption if there is no major difference in accessibility of samples from long transmission chains and shorter ones (for example, decreased access to healthcare). Could this impact the assumption of sampling primarily from long transmission chains? It seems from the data collected in this outbreak that this was not the case for mumps in Washington but addressing this assumption clearly (and potential ways to interrogate it) could make their methodology more applicable to other pathogen studies.

2) There are many examples of phylogenetic analyses that have led to conclusions about pathogen sources and sinks that were later shown to be wrong because of oversampling or other sampling biases. The authors address unequal sampling between clades, but additional contextualization of the problem and how this approach is different may help strengthen the methodology presented in the paper.

We thank the reviewer for these important points. We have attempted to address these by including an additional paragraph about different types of sampling and their impacts on phylodynamic studies.

We agree that this is a helpful addition, and have added a new paragraph devoted to a discussion of sampling bias to the discussion on lines 458-484. This paragraph reads:

“Sampling bias presents a persistent problem for phylodynamic studies that can complicate inference of source-sink dynamics (De Maio et al., 2015; Dudas et al., 2018; Frost et al., 2015; Kühnert et al., 2011; Lemey et al., 2020; Stack et al., 2010). Sampling bias can arise from unequal case detection or from curating a dataset that poorly represents the underlying outbreak. Washington State uses a passive surveillance system for mumps detection and case acquisition, which is known to result in underreporting. Because the WA Department of Health did not perform active mumps surveillance, it is difficult to assess whether different epidemiologic groups have different likelihoods of being sampled. Marshallese individuals are less likely to seek healthcare (Towne et al., 2020), which may have resulted in particularly high rates of underreporting in this group. If the number of cases within the Marshallese community were in fact higher than reported, this would increase the magnitude of the patterns we describe, making our estimates conservative. Given a distribution of cases, composing a dataset for analysis also requires sampling decisions. Uniform sampling regimes in which sampling probability is equal across groups have been shown to perform well for source- sink inferences (Hall et al., 2016). By selecting sequences that matched the overall attributes of the outbreak, including a near 50:50 split between Marshallese and non- Marshallese cases, we adhere to this recommendation. We then specifically employed structured coalescent approaches which have been shown to be robust to sampling differences (Dudas et al., 2018; Müller et al., 2018; Vaughan et al., 2014), rather than using other common approaches that treat sampling intensity as informative of population size (Lemey et al., 2009). Within this framework, we further explore the possibility that unequal sampling within Washington clades could skew internal node reconstruction by forcing the sampling within each Washington clade to be equal between Marshallese and non-Marshallese tips. In doing so, differences within each clade must necessarily be driven by differences in transmission dynamics, rather than sampling. By combining careful sample selection with overlapping approaches to evaluate sampling bias, we were able to mitigate concerns that our source-sink reconstructions are driven by sampling artifacts.”

3) The authors present compelling evidence that the mumps outbreak in Washington state was sustained by the Marshallese community, and state that mumps did not transmit efficiently among the general Washington populace. That said, there were several other mumps outbreaks in the United States in the same 2016-2017 time period. Was there something different about Washington state that prevented mumps transmission outside of the Marshallese community? Were there no other close-knit communities (universities, prisons, other cultural communities, etc.) affected? It just seems surprising that the Marshallese community was the only community sustaining transmission at a time where many different types of communities were affected across the United States.

We thank the reviewers and editor for this comment, and agree that further contextualization would be helpful. We did not make it clear in the initial submission that in 2016/2017, the vast majority of mumps outbreaks in the US were associated with either universities or ethnic communities. We have re-organized a few paragraphs in the discussion section and added information about other 2016/2017 outbreaks. This new paragraph is on lines 499-519, and reads:

“Our finding that most introductions sparked short transmission chains suggests that mumps did not transmit efficiently among the general Washington populace. We suspect that more diffuse contact patterns may help explain this. Mumps has historically caused outbreaks in communities with strong, interconnected contact patterns (Barskey et al., 2012; Fields et al., 2019; Nelson et al., 2013), and in dense housing environments (Snijders et al., 2012), highlighted most recently by outbreaks in US detention centers (Lo et al., 2021). In 2016, most outbreaks in the US were associated with university settings (Albertson et al., 2016; Bonwitt et al., 2017; Donahue et al., 2017; Golwalkar et al., 2018; Shah et al., 2018; Wohl et al., 2020), including a separate, smaller outbreak in Washington State associated with Greek housing (Bonwitt et al., 2017). Outside of university settings, other outbreaks in 2016 were reported within close-knit ethnic communities (Fields et al., 2019; Marx et al., 2018). We speculate that while waning immunity may promote outbreaks by increasing susceptibility among young adults, outbreaks in younger age groups may be possible in sufficiently high-contact settings. Provision of an outbreak dose of mumps-containing vaccine to high-risk groups may therefore be especially effective for limiting mumps transmission in future outbreaks. Others have reported success in using outbreak dose mumps vaccinations to reduce mumps transmission on college campuses (Cardemil et al., 2017; Shah et al., 2018) and in the US army (Arday et al., 1989; Eick et al., 2008; Green, 2006; Kelley et al., 1991), and the CDC currently recommends providing outbreak vaccine doses to individuals with increased risk due to an outbreak (Marlow et al., 2020). Future work to quantify the interplay between contact rates and vaccine-induced immunity among different age and risk groups should be used to guide updated vaccine recommendations.”

We also amended lines 42-46 in the introduction to highlight that most other US outbreaks in 2016/2017 were university-associated:

“Like with other recent mumps outbreaks, most Washington cases in 2016/17 were vaccinated. Unusually though, while most US outbreaks in 2016/2017 were associated with university settings (Albertson et al., 2016; Bonwitt et al., 2017; Donahue et al., 2017; Golwalkar et al., 2018; Shah et al., 2018; Wohl et al., 2020), incidence in Washington was highest among children aged 10-18 years, younger than expected given waning immunity.”

#### URL

39. www.biorxiv.org www.biorxiv.org
1. Author Response:

Reviewer #1 (Public Review):

The manuscript by Schrieber et al., explores whether inbreeding affects floral attractiveness to pollinators with additional factors of sex and origin in play, in male and female plants of Silene latifolia. The authors use a combination of spatial sampling, floral volatiles, flower color, and floral rewards coupled with the response of a specialized pollinator to these traits. Their results show that females are more affected by inbreeding and in general inbreeding negatively impacts the "composite nature" of floral traits. The manuscript is well written, the experiments are detailed and quite elaborate. For example., the methodology for flower color estimation is the most detailed effort in this area that I can remember. All the experiments in the manuscript show meticulous planning, with extensive data collection addressing minute details, including the statistics used. However, I do have some concerns that need to be addressed.

Core strengths: Detailed experimental design, elaborate data collection methods, well-defined methodology that is easy to follow. There is a logical flow for the experiments, and no details are missing in most of the experiemnts.

Weaknesses: A recent study has addressed some of the questions detailed in the manuscript. So, introduction needs to be tweaked to reflect this.

Thank you very much for bringing this excellent article to our attention! We adjusted the writing in the introduction and the discussion accordingly. Please consider that this article was first published at the 15th of January 21, while our manuscript was submitted at the 9th of January. Hence, we were not able to account for this study in the first submission. Introduction pp 4-5, ll 48-54: “Although in a few cases inbreeding has been shown to alter single components of flower attractiveness (Ivey and Carr, 2005; Ferrari et al., 2006; Haber et al., 2019), insight into syndrome-wide effects is restricted to a single study. Kariyat et al. (2021) demonstrated that inbred Solanum carolinense L. display reduced flower size, pollen and scent production and receive fewer visits from diurnal generalists. It is necessary to broaden such integrated methodological approaches to other plant-pollinator systems (e.g., nocturnal specialist pollinators) and further floral traits (i.e., flower colour).” Discussion p 19, ll 535-542: “In summary, our research on S. latifolia suggests that in addition to inbreeding disrupting interactions with herbivores by changing plant leaf chemistry (Schrieber et al., 2018) it affects plant interactions with pollinators by altering flower chemistry. Our observations are in line with studies on other plant species (Ivey and Carr, 2005; Kariyat et al., 2012, 2021) and highlight that inbreeding has the potential to reset the equilibrium of species interactions by altering functional traits that have developed in a long history of co-evolution. These threats to antagonistic and symbiotic plant-insect interactions may mutually magnify in reducing plant individual fitness and altering the dynamics of natural plant populations under global change.”

Some details and controls are missing in floral scent estimation. Flower age, a pesticide treatment of plants that could affect chemistry..needs to be better refined.

We clarified this issue at different occasions in the methods section. Previous studies (and our study) on S. latifolia have shown no clear differences in the quality of floral scent between sexes. However, one study found higher total emission of VOC in males, while others found no differences. Hence, females produce no specific VOC that are used as oviposition cues but may be differentiated from males by the total amount of emitted VOC and pronounced differences in spatial flower traits. We highlight this at p 6, ll 111-116: “Silene latifolia exhibits various sexual dimorphisms with male plants producing more and smaller flowers that excrete lower volumes of nectar with higher sugar concentrations as compared to females (Gehring et al., 2004; Delph et al., 2010). The quality of floral scent exhibits no clear sex-specific patterns, while male plants have been shown to emit higher or equal total amounts of VOC as compared to females in different studies (Dötterl & Jürgens 2005, Waelti et al. 2009)”.

Both male and female moths show pronounced behavioural responses to lilac aldehyde isomers and other VOC in the floral scent of S. latifolia (Dötterl et al., 2006). We therefore treated these VOC as typical floral scent compounds. We clarified this at p 7, ll 125-126: “A substantial fraction of floral VOC produced by S. latifolia triggers antennal and behavioural responses in male and female H. bicruris moths (Dötterl et al., 2006).” and p 9, ll 2010-218:” For targeted statistical analyses, we focused on those VOC that evidently mediate communication with H. bicruris according to Dötterl et al. (2006). We analysed the Shannon diversity per plant (calculated with R-package: vegan v.2.5-5, Oksanen et al. 2019) for 20 floral VOC in our data set that were shown to elicit electrophysiological responses in the antennae of H. bicruris (Supplementary File 1). Moreover, we analysed the intensities of three lilac aldehyde isomers, which trigger oriented flight and landing behaviour in both male and female H. bicruris most efficiently when compared to other VOC in the floral scent of S. latifolia. Furthermore, H. bicruris is able to detect the slightest differences in the concentration of these three compounds at very low dosages (Dötterl et al. 2006).”

We used biological pest control agents in a preventive manner because S. latifolia is often infested by thrips and aphids under greenhouse conditions. The writing in the previous manuscript version was not clear with this regard and we changed the text at p 8, ll 157-161: ” Plants received water and fertilisation (UniversolGelb 12-30-12, Everris-Headquarters, NL) when necessary for the entire experimental period and were prophylactically treated with biological pest control agents under greenhouse conditions to prevent thrips (agent Amblyseius barkeri and Amblyseius cucumeris) and aphid (agent Chrysoperla carnea) infestation (Katz Biotech GmbH, GE) .”

Indeed, flower size and scent emission can be correlated. Although the question whether differences in scent emission were based on a difference in flower size is an interesting one, it seemed less relevant to us because it is unlikely that our pollinators correct their perception of a scent for the size of a flower (see also p 19, 520-526). We were rather interested in whether scent emission differs between the plant treatments and thus pollinators may chemically perceive such differences. Moreover, we found it problematic to correct our models for flower size by including it as a covariate, which is the reason why we have not assessed this trait during scent collection. In this case, we would have corrected our scent responses for the effects of inbreeding, sex and population origin (i.e., the predictors we are interested in) because all of them determine the size of a flower (Figure 2 c,d). Hence, the inbreeding, sex and origin effects on flower scent would likely vanish. However, it is highly unlikely that the set of genes contributing to sex-, breeding treatment- and origin-based variation in flower size is exactly the same one that determines variation in scent emission per flower, which is basically the assumption underlying the model that includes flower size as a covariate. We critically mentioned the trade-off relationships and our reasoning to not correct for flower size at 9p ll 208-210: “The intensities of VOC were not corrected for flower size because we wanted to capture all variation in scent emission that is relevant for the receiver i.e., the pollinator.”

While the study is laser-focused on floral traits, as the authors are aware inbreeding affects the total phenotype of the plants including fitness and defense traits. For example, there are quite a few studies that have shown how inbreeding affects the plant defense phenotype. This could be addressed in the introduction and discussion.

We agree that this aspect is important and therefore addressed it in further detail in the introduction at p 4 ll 34-38: “While it is well established that inbreeding can increase a plant’s susceptibility to herbivores by diminishing morphological and chemical defences (Campbell et al., 2013; Kariyat et al., 2012; Kalske et al., 2014), its effects on plant-pollinator interactions are less well understood. Inbreeding may reduce a plant’s attractiveness to pollinating insects by compromising the complex set of floral traits involved in interspecific communication.” Since other referees suggested to rather tone down than increase the discussion based on floral scent results, we stick to the general feedback relationship among of herbivory and pollination, rather than relating it specifically to volatiles in the discussion at p 19, ll 535-544: “In summary, our research on S. latifolia suggests that in addition to inbreeding disrupting interactions with herbivores by changing plant leaf chemistry (Schrieber et al., 2018) it affects plant interactions with pollinators by altering flower chemistry. Our observations are in line with studies on other plant species (Ivey and Carr, 2005; Kariyat et al., 2012, 2021) and highlight that inbreeding has the potential to reset the equilibrium of species interactions by altering functional traits that have developed in a long history of co-evolution. These threats to antagonistic and symbiotic plant-insect interactions may mutually magnify in reducing plant individual fitness and altering the dynamics of natural plant populations under global change. As such, our study adds to a growing body of literature supporting the need to maintain or restore sufficient genetic diversity in plant populations during conservation programs.”

Reviewer #2 (Public Review):

A summary of what the authors were trying to achieve. This interesting and data-rich paper reports the results of several detailed experiments on the pollination biology of the dioceus plant Silene latfolia. The authors uses multiple accessions from several European (native range) and North American (introduced range) populations of S. latifolia to generate an experimental common garden. After one generation of within-population crosses, each cross included either two (half-)siblings or two unrelated individuals, they compared the effects of one-generation of inbreeding on multiple plant traits (height, floral size, floral scent, floral color), controlling for population origin. Thereby, they set out to test the hypothesis that inbreeding reduces plant attractiveness. Furthermore, they ask if the effect is more pronounced in female than male plants, which may be predicted from sexual selection and sex-chromosome-specific expression, and if the effect of inbreeding larger in native European populations than in North American populations, that may have already undergone genetic purging during the bottleneck that inbreeding reduces plant attractiveness. Finally, the authors evaluate to what extent the inbreeding-related trait changes affect floral attractiveness (measured as visitation rates) in field-based bioassays.

An account of the major strengths and weaknesses of the methods and results. The major strength of this paper is the ambitious and meticulous experimental setup and implementation that allows comparisons of the effect of multiple predictors (i.e. inbreeding treatment, plant origin, plant sex) on the intraspecific variation of floral traits. Previous work has shown direct effects of plant inbreeding on floral traits, but no previous study has taken this wholesale approach in a system where the pollination ecology is well known. In particular, very few studies, if any, has tested the effects of inbreeding on floral scent or color traits. Moreover, I particularly appreciate that the authors go the extra mile and evaluate the biological importance of the inbreeding-induced trait variation in a field bioassay. I also very much appreciate that the authors have taken into account the biological context by using a relevant vision model in the color analyses and by focusing on EAD-active compounds in the floral scent analyses.

The results are very interesting and shows that the effects of inbreeding on trait variation is both origin- and sex-dependent, but that the strongest effects were not always consistent with the hypothesis that North American plants would have undergone genetic purging during a bottleneck that would make these plants less susceptible to inbreeding effects. The authors made a large collection effort, securing seeds from eight populations from each continent, but then only used population origin and seed family origin as random factors in the models, when testing the overall effect of inbreeding on floral traits. It would have been very interesting with an analysis that partition the variance both in the actual traits under study and in the response to inbreeding to determine whether to what extent there is variation among populations within continents. Not the least, because it is increasingly clear that the ecological outcome of species interactions (mutualistic/antagonistic) in nursery pollination systems often vary among populations (cf. Thompson 2005, The geographic mosaic of coevolution), and some results suggest that this is the case also in Hadena-Silene interactions (e.g. Kephardt et al. 2006, New Phytologist). Furthermore, some plants involved in nursery pollination systems both show evidence of distinct canalization across populations of floral traits of importance for the interaction (e.g. Svensson et al. 2005), whereas others show unexpected and fine-grained variation in floral traits among populations (e.g. Suinyuy et al. 2015, Proceedings B, Thompson et al. 2017 Am. Nat., Friberg et al. 2019, PNAS). Hence, it is possible that the local population history and local variation in the interactions between the plants and their pollinators may be more important predictors for explaining variation in floral trait responses to inbreeding, than the larger-scale continental analyses. Not the least, because North American S. latifolia probably has multiple origins, with subsequent opportunity for admixture in secondary contact.

Yes, it is necessary to put populations from the same continent into one category, since native and invasive plant populations differ significantly in their evolutionary history (p 5, ll 74-81, http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2012.05751.x). Origin explained sufficient amounts of variation in several traits including flower number, corolla expansion, VOC diversity, lilac aldehyde A intensity, and pollinator visitation rates (see Figures 2-3; and Table 2) and some variation in in the magnitude of inbreeding effects (Figure 2e, f; Figure 3). Even if we would not be interested in differences among native and invasive populations, we would have to include origin as a fixed effect in our models because:

i) populations within a distribution range are no independent samples,

ii) origin explains sufficient variation in many responses,

iii) origin cannot be fitted as a random factor, since it has only two levels (the minimum number of levels for random effect is 4). We agree that it would be very interesting to specifically assess differences in the magnitude of breeding and sex effects among populations within origins. We now discuss this as important future research direction at p 18, ll 500-507: “As such, the precise mechanisms underlying variation in inbreeding effects on different scent traits across population origins of S. latifolia can only be explored based on comprehensive genomic resources, which are currently not available. Future studies should also incorporate field-data on the abundance of specialist pollinators and extend the focus from variation in the magnitude of inbreeding effects among geographic origins to variation among populations within geographic origins and individuals within populations. This would allow a detailed quantification of geographic variation in inbreeding effects and elaborating on the causes and ecological consequences of such variation (Thompson, 2005; Schrieber and Lachmuth, 2017; Thompson et al., 2017)”.

To empirically address within-origin variation of inbreeding effects with our data, we would have to i) fit correlated random intercepts and slopes for the interaction breeding-sex on the population random factor (models consume min. 22 DF); or ii) include population as a fixed effect in our models (models consume min. 67 DF). We have tried both of these approaches when preparing the revision, but unfortunately it turned out that our study is not designed to address this question. The models for both variants only partially converge (see R-script ll. 1568-1580), and even if they do this does not imply that one can draw solid inference from them. Approach i often results in multiple singular convergence warning messages implying that no variance is explained by population-specific reaction norms to the fixed effects specified in the random effects structure. Approach ii results in odd rank- deficient models (I was seriously worried about type I errors). We simply have too few replicates (5) per population-breeding treatment-sex combination for both approaches. For solid inference we would need 10approach i-40approach ii replicates = 640-2600 individuals. However, our experimental design is sufficient to address the hypothesis we have raised in the introduction as well as general differences in response variables among populations. We now provide information on variance partitioning for all models that include population as a random effect in S9. As you will see, population explains lower amounts of variation in our responses as the fixed effects in 9 out of 12 models. The random effects maternal and paternal genotype (mother&father) explain more variation than the random effect population in 6 of 12 cases. Thus, these data do not make a strong case for an extensive discussion of population-based differences in floral traits and this was also not a question or hypotheses we wanted to address with our study.

I see no major weaknesses in the study, and but in my detailed response, I have made a few questions and suggestions about the floral scent analyses. In short, the authors have used a technique that is not the standard method used for making quantitative floral scent analyses, and I am curious about how it was made sure that the results obtained from the static headspace sampling using PDMS adsorbents could be used as a quantitative measure. I would suggest the authors to validate the use of this method more thoroughly in the manuscript, and have detailed this comment in my response to the authors.

Also, and this may seem like a nit-picky comment, I am not convinced that the best way to describe the traits under study is "plant attractiveness", because in the experimental bioassays, most of the traits under study that are affected by the inbreeding treatment, did not result in a reduced pollinator visitation. Most (or all) of these traits may also be involved in other plant functions and important for other interactions, so I suggest potentially using a term like "floral traits" or "(putative) signalling traits".

We now avoid the term floral attractiveness throughout the manuscript and instead refer to “floral traits”.

An appraisal of whether the authors achieved their aims, and whether the results support their conclusions: By and large, the authors achieved the aims of this study, and drew conclusions based in these results. One interesting aspect of this work that I think could be discussed a bit deeper is the lack of congruence between the effects of inbreeding on floral traits and the variation in visitation pattern in the bioassay. In fact, the only large effect of inbreeding on a floral trait that may play a role as an explanatory factor is the reduction of emission of lilac aldehyde A in inbred female S. latifolia from North America, which correspond to a reduced visitation rate in this group in the pollinator visitation bioassay. I have made some specific suggestions in my comments to the authors.

We agree that this aspect required deeper discussion and revised the section at p 19, ll 520-526 accordingly. We believe that the limited spatial vision of H. bicruris in combination with our experimental setup for pollinator observations increased the relative importance of floral scent for pollinator visitation rates (suggested by referee #3).

A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community: I think that one important aspect of this work that may broaden the impact of this study further is the link between these experiment, and our expectations from the evolution of selfing. Selfing plant species most often conform to the selfing syndrome, presenting smaller, less scented flowers than outcrossing relatives. Traditionally, the selfing syndrome is explained by natural selection against individuals that invest energy into floral signalling, when attracting pollinators is no longer crucial for reproduction. Some studies (for example Andersson, 2012, Am. J. Bot), however, have shown that only one, or a few, generations of inbreeding may reduce floral size as much as quite strong selection for reduced signalling. Here, at least for some populations and sexes, similar results are obtained in this paper regarding several traits (including floral scent), and one way to put this paper in context is by discussing the results in the light of these previous papers.

We now address this issue at p 16, ll 417-420: “However, our findings highlight that even weak degrees of biparental inbreeding (i.e., one generation sib-mating) can result in a severe reduction of spatial flower trait and scent trait values that is detectable against the background of natural variation among multiple plant populations from a broad geographic region. This observation indirectly supports that the selfing syndrome (i.e., smaller, less scented flowers observed in selfing relative to outcrossing populations of hermaphroditic plant species) may not merely be a result of natural selection against resource investment into floral traits, but also a direct negative consequence of inbreeding (Andersson, 2012).”

Reviewer #3 (Public Review):

Schrieber et al. studied the effects of biparental inbreeding in the dioecious plant Silene latifolia, focusing specifically on traits important for floral attractiveness and pollinator attraction. These traits are especially important for dioecious species with separate sexes as they are obligate outcrossers. The authors find that inbreeding mostly decreases floral attractiveness, but that this effect tended to be stronger in the female flowers, which the authors suspect to result from the trade-off with larger investment in the sexual functions in the female plants. The authors then go on to couple the changes in visual and olfactory floral traits to pollinator attraction which allows them to conclude or at least speculate that differences in pollinator behavior are mostly driven by the changes in olfactory traits. The study is robust in its broad and well-balanced sampling of populations, rigorous and in large part meticulously documented experimental designs and linking of the effects on mechanisms to ecological function. The hypothesis are clearly stated and the study is able to address them mostly convincingly. However, some of the aspects of the decisions the authors made and possible caveats need to be addressed and elaborated on.

A major caveat, in my opinion, is that while the authors find stronger effects of inbreeding on pollinator visitation rates in the plants from the North American (Na) origin, these plants were tested in an environment that was foreign to them, which could have important consequences for the results of this study. This is specifically because the main pollinator Hadena bicruris moth is completely absent from the populations in Na, and yet, was the main pollinator observed in the pollinator attraction experiment. As this pollinator is also a seed predator, the Na populations are released from the selection pressure to avoid attracting the females of this species and thus risking the loss of seeds and fitness. In fact, some of the results suggest that the release from the specialist pollinator and seed predator in Na has led to increase in the attractiveness of the female flowers based on the higher number of flowers visited in the outcrossed females compared to outcrossed males in the plant from the Na origin and the similar, though not statistically significant, pattern in the olfactory cue. While ideally this pollinator attraction experiment should be repeated within the local range of the Na plants, this is of course is not feasible. Instead I suggest the problem should be addressed in the discussion explicitly and its consequences for the interpretation of the results should be considered.

Indeed, North American populations are tested in their “away”- habitat only and the observed plant performance and pollinator visitation rates can thus provide no direct implications for their “home”-habitat. We state this now more clearly at pp 11-12, ll 283-285. However, our design is appropriate for investigating inbreeding effects on plant-pollinator interactions in multiple plant populations in a common environment. Given the close taxonomic relationship of H. bicruris (main pollinator in Europe) and H. ectypa (main pollinator in North America), the behavioural responses of the former species to variation in the quality of its host plant was considered to overlap sufficiently with responses of the latter species as outlined at pp 11-12, ll 285-291.

The hypothesis that North American (NA) S. latifolia evolved higher attractiveness to female Hadena moths because H. ectypa is not able to oviposit on female plants in contrast to H. bicruris is indeed a highly interesting one. However, as you have outlined correctly, our study is not designed to elaborate on questions related to adaptive evolutionary differentiation among North American and European plants. Instead of addressing this hypothesis based on our data, we thus take reference to previous studies in the discussion p 17, ll 482-487: “As discussed in detail in previous studies, higher flower numbers in North American S. latifolia plants (Figure 1b) may result from changes in the selective regimes for numerous abiotic factors (Keller et al., 2009) or from the release of seed predation. As opposed to H. bicruris, H. ectypa pollinates North American S. latifolia without incurring costs for seed predation, which may result in the evolution of higher flower numbers, specifically in female plants (Elzinga and Bernasconi, 2009).”

The incorporation of the VOC data in the actual manuscript was quite limited and I found the reasoning for picking only the three lilac aldehydes (in addition to the Shannon diversity index) for the univariate statistical tests insufficient. How much more efficient was the effect of the lilac aldehydes compared to the other 17 compounds deemed important in the previous study? While the data on this one aldehyde matches the pollinator attraction results, having one compound out of 70 (or out of 20 if only considering the ones identified important for the main pollinator) seems, perhaps, fortuitous lest there is a good reason for focusing on these particular compounds.

We adapted the text to increase clarity but sticked to our previous choice for the analyses of VOC data.

i) We now explain our choice of analysing lilac aldehydes with more detail p9, ll 210-218: “For targeted statistical analyses, we focused on those VOC that evidently mediate communication with H. bicruris according to Dötterl et al. (2006). We analysed the Shannon diversity per plant (calculated with R-package: vegan v.2.5-5, Oksanen et al. 2019) for 20 floral VOC in our data set that were shown to elicit electrophysiological responses in the antennae of H. bicruris (Supplementary File 1). Moreover, we analysed the intensities of three lilac aldehyde isomers, which trigger oriented flight and landing behaviour in both male and female H. bicruris most efficiently when compared to other VOC in the floral scent of S. latifolia. Furthermore, H. bicruris is able to detect the slightest differences in the concentration of these three compounds at very low dosages (Dötterl et al. 2006).”

ii) If one analyses 20 compounds with zero-inflation models (actually two models in one) + 8 floral trait models + 2 pollinator visitation models (zi-models with two component models), one ends up with 52 models investigating complex fixed and random effect structures. To keep type-1 errors as low as possible (see also comment 2.12.b from Referee#2), we approached the more comprehensive VOC data sets with multivariate analyses or Shannon diversity.

iii) We tested the effect of sexoriginbreeding treatment on the Shannon diversity of 20 active VOC as well as in the random forest analyses with the 20 VOC and 70 VOC dataset and transparently reported the results from all of these analyses in the manuscript. Hence, the incorporation of VOC data was not limited. However, we agree that we have taken too little reference to these results and now changed the text accordingly. Results section p 13 ll 351-354: ”Multivariate statistical analyses of 20 H. bicruris active VOC and all 70 VOC detected in S. latifolia revealed no clear separation of floral headspace VOC patterns for any of the treatments (Figure 2-figure supplement 2). In summary, the combined effects of breeding treatment, sex and range on floral scent were rather week.”

Sampling time of VOCs is reported ambiguously. Was it from 21:00 to 17:00 the next day or in fact from 9pm to 5AM (instead of 5 pm as reported)? Please be more specific in the text as this is quite important. If sampling tubes were left in place during the daytime, some of the compounds could have evaporated due to heating of the tubes in the summer. It would also be important to mention whether all of the headspace VOCs were sampled on the same day and whether there could be variation in i.e. temperature.

Thank you very much for identifying this typo! It is from 9 pm to 5 am (p 9, l 186).

Considering the experimental setup for the pollinator attraction observations and the pooling of the data at the block level (which I think is the right choice) it seems possible the authors were more likely to get a result where pollinator behavior matches the long-distance cue, the VOCs. Short-distance cues such a subtle difference in flower size would perhaps not be distinguished with the current setup. I would be interested to know if the authors agree, and if so, mention this in the discussion.

Thank you very much for this excellent suggestion! We agree and discuss this aspect in detail at p 19, ll 520-526. Indeed, one would need two different experimental setups to assess the contributions of long and short distance cues. Our setup (large distances among plots) is optimal for long distance cues, while a setup for short distance cues should have all plants in close spatial proximity. However, the latter approach does then not allow to address long-distance cues and to exclude competition/facilitation for pollinators among plants from different treatment groups.